Artificial Intelligence (Topic archive) - 80,000 Hours https://80000hours.org/topic/world-problems/most-pressing-problems/artificial-intelligence/ Thu, 08 May 2025 18:38:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.7.1 Emergency pod: Did OpenAI give up, or is this just a new trap? (with Rose Chan Loui) https://80000hours.org/podcast/episodes/rose-chan-loui-openai-nonprofit-control/ Thu, 08 May 2025 18:13:40 +0000 https://80000hours.org/?post_type=podcast&p=90318 The post Emergency pod: Did OpenAI give up, or is this just a new trap? (with Rose Chan Loui) appeared first on 80,000 Hours.

]]>
The post Emergency pod: Did OpenAI give up, or is this just a new trap? (with Rose Chan Loui) appeared first on 80,000 Hours.

]]>
AI-enabled power grabs https://80000hours.org/problem-profiles/ai-enabled-power-grabs/ Thu, 24 Apr 2025 10:53:40 +0000 https://80000hours.org/?post_type=problem_profile&p=89674 The post AI-enabled power grabs appeared first on 80,000 Hours.

]]>
Why is this a pressing problem?

New technologies can drastically shift the balance of power in society. Great Britain’s early dominance in the Industrial Revolution, for example, helped empower its global empire.1

With AI technology rapidly advancing, there’s a serious risk that it might enable an even more extreme global power grab.

Advanced AI is particularly concerning because it could be controlled by a small number of people, or even just one. An AI could be copied indefinitely, and with enough computing infrastructure and a powerful enough system, a single person could control a virtual or literal army of AI agents.

And since advanced AI could potentially trigger explosive growth in the economy, technology, and intelligence, anyone with unilateral control over the most powerful systems might be able to dominate the rest of humanity.

One factor that enhances this threat is the possibility of secret loyalties. It may be possible to create AI systems that appear to have society’s best interests in mind but are actually loyal to just one person or small group.2 As these systems are deployed throughout the economy, government, and military, they could constantly seek opportunities to advance the interests of their true masters.

Here are three possible pathways through which AI could enable an unprecedented power grab:

  1. AI developers seize control — in this scenario, actors within a company or organisation developing frontier AI systems use their technology to seize control. This could happen if they deploy their systems to be used widely in the economy, military, and government while it retains secret loyalty to them. Or they could potentially create powerful enough systems internally that can gather enough wealth and resources to launch a hostile takeover of other centres of power.
  2. Military coups — as militaries incorporate AI for competitive advantage, they introduce new vulnerabilities. AI-controlled weapons systems and autonomous military equipment could be designed to follow orders unscrupulously, without the formal and informal checks on power that militaries traditionally provide — such as the potential for mutiny in the face of unlawful orders. A military leader or other actor (including potentially hostile foreign governments) could find a way to ensure the military AI is loyal to them, and use it to assert far-reaching control.
  3. Autocratisation — political leaders could use advanced AI systems to entrench their power. They may be elected or unelected to start, but either way, they could use advanced AI systems to undermine any potential political challenger. For example, they could use enhanced surveillance and law enforcement to subdue the opposition.

Extreme power concentrated in the hands of a small number of people would pose a major threat to the interests of the rest of the world. It could even undermine the potential of a prosperous future, since the course of events may depend on the whims of those who happened to have dictatorial aspirations.

There are also ways AI could likely be used to broadly improve governance, but we’d expect scenarios in which AI enables hostile or illegitimate power grabs would be bad for the future of humanity.

What can be done to mitigate these risks?

We’d like to see much more work done to figure out the best methods for reducing the risk of an AI-enabled power grab. Several approaches that could help include:

  • Safeguards on internal use: Implement sophisticated monitoring of how AI systems are used within frontier companies, with restrictions on access to “helpful-only” models that will follow any instructions without limitations.
  • Transparency about model specifications: Publish detailed information about how AI systems are designed to behave, including safeguards and limitations on their actions, allowing for external scrutiny and identification of potential vulnerabilities.
  • Sharing capabilities broadly: Ensure that powerful AI capabilities are distributed among multiple stakeholders rather than concentrated in the hands of a few individuals or organizations. This creates checks and balances that make power grabs more difficult. Note though that there are also risks to having powerful AI capabilities distributed widely, so the competing considerations need to be carefully weighed.
  • Inspections for secret loyalties: Develop robust technical methods to detect whether AI systems have been programmed with hidden agendas or backdoors that would allow them to serve interests contrary to their stated purpose.
  • Military AI safeguards: Require that AI systems deployed in military contexts have robust safeguards against participating in coups, including principles against attacking civilians and multiple independent authorisation requirements for extreme actions.

For much more detail on this problem, listen to our interview with Tom Davidson.

Learn more

The post AI-enabled power grabs appeared first on 80,000 Hours.

]]>
Tom Davidson on how AI-enabled coups could allow a tiny group to seize power https://80000hours.org/podcast/episodes/tom-davidson-ai-enabled-human-power-grabs/ Wed, 16 Apr 2025 16:01:53 +0000 https://80000hours.org/?post_type=podcast&p=89650 The post Tom Davidson on how AI-enabled coups could allow a tiny group to seize power appeared first on 80,000 Hours.

]]>
The post Tom Davidson on how AI-enabled coups could allow a tiny group to seize power appeared first on 80,000 Hours.

]]>
Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway https://80000hours.org/podcast/episodes/buck-shlegeris-ai-control-scheming/ Fri, 04 Apr 2025 12:01:20 +0000 https://80000hours.org/?post_type=podcast&p=89440 The post Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway appeared first on 80,000 Hours.

]]>
The post Buck Shlegeris on controlling AI that wants to take over – so we can use it anyway appeared first on 80,000 Hours.

]]>
To understand AI, you should use it. Here’s how to get started. https://80000hours.org/2025/04/to-understand-ai-you-should-use-it-heres-how-to-get-started/ Fri, 04 Apr 2025 05:34:42 +0000 https://80000hours.org/?p=88996 The post To understand AI, you should use it. Here’s how to get started. appeared first on 80,000 Hours.

]]>
To truly understand what AI can do — and what is coming soon — you should make regular use of the latest AI services.

This article will help you learn how to do this.

1. You have a team of experts

Think of ChatGPT as a team of experts, ready to assist you.1

Whenever you have a question, task, or problem, consider who you’d want in the room if you could talk to — or hire — anyone in the world.

Illustration of the ChatGPT user interface

Here are some of the experts I work with on a regular basis:

  • Programmer: write, debug, and explain code. I’m an experienced software developer, yet more than 95% of the code I ship is now written by AI.2
  • Manager: plan and debug my days, weeks, and months. There’s an example transcript in section four below.
  • Product designer: ideate, create prototypes, design and critique user interfaces, and analyse user interview transcripts.
  • Writer and editor: give feedback on my writing, rewrite things, write entire drafts based on a rough dictation, proofread, and fix Markdown.
  • Data analyst: write SQL queries, analyse spreadsheets, create graphs, and infographics.
  • Language tutor: do written exercises and have spoken conversations in French and Icelandic.3
  • Intern: extract data from text, get key quotes from an interview, resize a folder of images, merge a spreadsheet, and write an SQL query.
  • Handyman: answer household questions, such as how to clad a shed or reset a washing machine.

And some of the specialists I consult less frequently:

  • Doctor
  • Lawyer
  • Reading partner4
  • Product manager
  • UX researcher
  • Research assistant5
  • Financial advisor
  • Chef and nutritionist
  • Fitness coach
  • Travel guide
  • Photographer
  • Philosophy tutor
  • Career coach

Of course, the picture above was created by my AI illustrator.

2. You have a thought partner

For me, conversational AI is most helpful as a thought partner.

Many times a week, I press the microphone button on my laptop,6 or the ChatGPT mobile app, and think out loud.

I speak my stream of consciousness, saying whatever is on my mind. This might be something I’m stuck on, a confusion, an uncertainty, or just a reflection on how I feel.

For example: this morning I wanted to finish the second draft of this article. The article outline was settled, and most sections were half written. But looking at it, I felt tired and demotivated, and wasn’t sure where to start. So I described the situation to ChatGPT.

Its suggestion: break this down into a to-do list of small tasks that I can blast through. Make the first few things easy to build up momentum. Seems good — so I shared the Google Doc with ChatGPT, and asked it to make me a to-do list. I pasted that back into my working notes, and off we went.

More generally: when I’m working on complex tasks, it helps to write down my thoughts. Prior to ChatGPT, I’d go to doc.new and write into a ‘scrap paper’ Google Doc. With voice dictation, the effort of doing this is much lower, so I do it more. And conversational AIs are very good at crisply summarising a mess of thoughts. And then they’ll often offer useful thoughts of their own — in much the same way that a coworker would if I talked something through with them. It is wonderful.

3. How ‘Team Hartree’ has helped me

I work with ‘Team Hartree’ — my team of AI experts — every day.

Below, I’ll share some stories about how they’ve helped. I’ll include screenshots, transcripts, and screencasts — to show concrete details of how you can interact with these systems. Hopefully, the list will inspire you to try things.

In my personal life

My AI doctor helped me track and treat symptoms of an increasingly serious infection.

My human doctor prescribed antibiotics, but things kept getting worse.

My AI doctor noticed that I was taking the wrong antibiotics, given the origin of the infection.

My human doctor agreed — the chat transcript reminded her of a crucial consideration. We switched my antibiotic, and the infection was cured.

If I’d stayed on the original medication, I’d probably have ended up in hospital.

This story of ‘AI doctor corrects a GP’s mistake’ is consistent with the results of various studies7 and many stories I’ve heard from friends and acquaintances.


Chat transcript (Part 1)
Chat transcript (Part 2)

A German landlady asked a friend for more than €1,000 of fictitious damages and threatened legal action.

ChatGPT told us about tenant rights in Germany and gave advice on next steps. It wrote a formal reply letter in English and German. The landlady — who had a track record of attempting scams like this — gave up.


Chat transcript

I attended a talk by John Vervaeke. Live talks are often too slow for me because I’m used to watching things at 2x speed.

So, while he was speaking, I chatted with an AI simulation of John Vervaeke, riffing with him on the claims he made during the talk. I understood his position better, improved my own reflection, and was better prepared for the Q&A.

Chat transcript

Breakfast time, road trip, what should we do today? Five minutes with ChatGPT, and we have a great plan — and a map — tailored to us.

Our AI tour guide made one important mistake: incorrect estimates of the journey times between destinations. Presumably, this information wasn’t in the training data nor easily available via web search. A human would have said “I’m not sure about the times,” but ChatGPT just made up some times and didn’t flag the uncertainty.

To get the correct journey times, I asked it to write a Python script to fetch the data from a maps API.


Chat transcript

Buying options always gives me a headache. In this case, I explain what I’m trying to do, and screenshot the options that are available. ChatGPT analyses the options, does the math, and illustrates different scenarios.

Be sure to write use code in your prompt so that it does all the math with code. Without that, it may try to do math ‘in its head,’ which greatly increases the chance of error.

Chat transcript

At work

I had to set up a new laptop recently, so I had all sorts of small tasks to complete. ChatGPT sped up many tasks by more than 3x and saved me hours of work.

The process? Take a screenshot or copy an error message, paste to chat, press enter.

ChatGPT can write useful Python scripts in seconds (and tell you how to run them).

Want to convert a folder of MP3s from stereo to mono? Download all the images from a web page? Remove duplicates from a huge spreadsheet? You’re good.

As it happens, I’m an experienced software developer, and AI has completely changed the way I code. More than 95% of the code I commit is now written by AI.2

Brief demo of Cursor Agent (3.5 mins)

Making a web app (12 mins)

I make weekly, monthly, and quarterly plans in a Google Doc. This doc is shared as context in a Claude Project.8

Every Monday, I think out loud with Claude about how the past week went, then make my plan for the week.

I do this by speaking into my mobile phone, usually while walking around. This gives me good energy for this kind of reflection.

Claude summarises, and sometimes offers helpful suggestions, considering things in the context of my priorities for the month and quarter. When we’re done, it writes up my review and plan using the template I keep in the Google Doc.


Chat transcript

A friend shared an idea for a web app. I asked ChatGPT to develop the idea and design a mockup. This took five minutes.


Chat transcript (part 1)
Chat transcript (part 2)

In the screencast below, Sahil Lavinga writes a product requirement document, iterates with feedback, designs an API and a front end, builds an MVP, and a bunch of other stuff.

“Two weeks of work in two hours” sounds like hype, but his estimate does not strike me as wildly off. If I dial up my scepticism to the maximum, I might call this two weeks of work in two days.


Screencast (45 min)

4. How to start getting help from your team

You can start in one minute:

  1. Go to chatgpt.com, and sign up to get a free trial of GPT-4o (one of their best models at the time of writing).

  2. Think: what is on your mind? What do you want to get done today? What is a challenge you face right now?

  3. Tell ChatGPT, and see how you get on.

Get a paid plan to access the best models

I strongly recommend you get a paid plan for at least one month so that you can experience the very best of current capabilities, without rate limits.

As of April 4, 2025, the free version of ChatGPT gives you limited access to GPT-4o, which is their best model for day-to-day use cases. But: you don’t get access to their most powerful reasoning models nor to their “Deep Research” agent.

Keep in mind: just like a human, AIs are best able to help if you give them lots of context. Think of them like a person who doesn’t know you at all, and err on the side of oversharing.

Instead of:

“Help me write a cover letter for a project manager job.”

Try:

“I’m applying for a senior project manager position at a tech startup. I have seven years of experience leading teams and delivering software products. Here’s my resume [attach resume] and the job description [paste description]. Can you help me write a cover letter? Make it fairly informal and follow Paul Graham’s advice on writing style.”

If you’re not sure what context to give, just ask the AI. For example:

I’d like to write a cover letter for a job application. To start, please ask me 10 questions to get the context you’ll need to help.

5. How to get the most out of your team

Working with AI is a skill, and you’ll get far more out of these systems as you level up.

To get started, I suggest:

  • Spend at least five hours actively using AI in meaningful ways.
  • Spend at least five hours on focussed learning and experimentation — seeking ideas and advice from others, testing prompts, trying different AI models, and pushing the limits of what’s possible.

This is likely one of the best self-development investments you can make in 2025.

Some advice below.

I’m repeating this point from above because it is critical.

Just like a human team, AI teams are best able to help if you give them lots of context.

Tell them who you are, where you work, and what your goals are. Share them on project plans and so on. They can handle a lot of text, so paste lots of relevant information into the chat or project knowledge.

Unlike a human team, AI assistants are good at processing text even when it is very messily formatted. Paste in rough notes, dictations, call transcripts, brainstorms, and so on.

If you want to do something more, make it easy. So:

  • Get the mobile apps
  • Get the desktop apps and learn the keyboard shortcuts
  • Use speech input6

Then, develop the habit of trying things. Set reminders if you have to.

Free AI tools are useful, but paid models are often much better. You’ll get better reasoning, faster responses, larger memory, fewer mistakes, and much more.

For some tasks, it makes sense to talk to several different models at once — pasting the same starting text into each one. Their responses will often complement each other, e.g. one will suggest ideas that the other doesn’t, or give greater emphasis to a particular angle.

Subscribe to a couple of the best paid services for at least a month to experience their full capabilities. You’ll probably keep the subscriptions.

AIs have a different cost profile from human experts and a different range of strengths and weaknesses.

Some tips:

1. Ask for more than you need

AI won’t get tired or annoyed if you ask for 10 versions of something rather than two.

Often, it’s easier to choose from a menu than to describe exactly what you want. Keep the best bits. Repeat.

2. Encourage it to serve as thought partner

Ask: “what’s missing?”, “what should I be asking?”, “what would a smart critic flag here?”, or “Please brainstorm.”

3. Paste in messy input

AIs are great at handling messy input. Long voice memos. Stream-of-consciousness brainstorms. Sprawling transcripts. Cluttered meeting notes.

Paste in the mess. Ask for a summary, a to-do list, main themes, contradictions, or questions worth exploring.

4. Embrace its lack of ego and judgement

AI doesn’t care if your question is ‘stupid,’ or your idea is half baked. Say things you might not say to humans, without fear of embarrassment or wasting someone’s time.

5. Don’t dismiss it too quickly

Some people come in sceptical, get some bland or mistaken answers, and write it off. That’s like dismissing Google because your first few searches didn’t go very well.

Try tweaking your prompt, giving more context, or asking a follow up question. If that doesn’t work, try another model. Sometimes, they genuinely won’t be able to help you. But: uselessness in some areas is compatible with extreme usefulness in others.

Develop your skills by seeking ideas and inspiration from others.
Try things like:

  • Join Slack/WhatsApp/Discord groups where people post screenshots and prompts
  • Follow people on X
  • Watch screencasts on YouTube

Warning: AIs get things wrong

AIs get things wrong sometimes. The most well-known problem: when they don’t know the answer to something, they are prone to making things up — a phenomenon known as hallucination.

Hallucinated details are often plausible but incorrect, so they can be difficult to spot (e.g. the made up journey times in my tour guide example above).

This problem is becoming less common. Nonetheless: when accuracy is critical, check the outputs!

To understand why AIs sometimes make things up, and when they are most likely to do so, see here.

If you’ve seen a great practical guide to reducing the risk of hallucinations, please send it to me, and I’ll add it here.

6. Next steps

Every day for the next two weeks, try calling on your team for something.

Tell your friends and colleagues (and me) about some of the cool things you do, and some of the things that don’t work out.

Have fun! Then… continue reading our resources on why future AI systems may ruin the world — and what you can do to prevent that.

The post To understand AI, you should use it. Here’s how to get started. appeared first on 80,000 Hours.

]]>
Gradual disempowerment https://80000hours.org/problem-profiles/gradual-disempowerment/ Fri, 04 Apr 2025 01:45:05 +0000 https://80000hours.org/?post_type=problem_profile&p=89316 The post Gradual disempowerment appeared first on 80,000 Hours.

]]>
Why might gradual disempowerment be an especially pressing problem?

Advancing technology has historically benefited humanity. The invention of fire, air conditioning, and antibiotics have all come with some downsides, but overall they’ve helped humans live healthier, happier, and more comfortable lives.

But this trend isn’t guaranteed to continue.

We’ve written about how the development of advanced AI technology poses existential risks. One prominent and particularly concerning threat model is that as AI systems get more powerful, they’ll develop interests that are not aligned with humanity. They may, unbeknownst to their creators, become power-seeking. They may intentionally deceive us about their intentions and use their superior intelligence and advanced planning capabilities to disempower humanity or drive us to extinction.

It’s possible, though, that the development of AI systems could lead to human disempowerment and extinction even if we succeed in preventing AI systems from becoming power-seeking and scheming against us.

In a recent paper, Jan Kulveit and his co-authors call this threat model gradual disempowerment. They argue for the following six claims:

  1. Large societal systems, such as economies and governments, tend to be roughly aligned to human interests.1
  2. This rough alignment of the societal systems is maintained by multiple factors, including voting systems, consumer demand signals, and the reliance on human labour and thinking.
  3. Societal systems that rely less on human labour and thinking — and rely more on increasingly advanced and powerful AI systems — will be less aligned with human interests.
  4. AI systems may indeed outcompete human labour for key roles in societal systems in part because they can more ruthlessly pursue the directions they’re given. And this may cause the systems to be even less aligned with human interests.
  5. If one societal system becomes misaligned with human interests, like a national economy, it may increase the chance that other systems become misaligned. Powerful economic actors have historically wielded influence over national governments, for example.
  6. Humans could gradually become disempowered, perhaps permanently, as AIs increasingly control societal systems and these systems become increasingly misaligned from human interests. In the extreme case, it could lead to human extinction.

Kulveit et al. discuss how AI systems could come to dominate the economy, national governments, and even culture in ways that act against humanity’s interests.

It may be hard to imagine how humans would let this happen, because in this scenario, the AI systems aren’t being actively deceptive. Instead, they follow human directions.

The trouble is that due to competitive pressures, we may find ourselves narrowly incentivised to hand over more and more control to the AI systems themselves. Some human actors — corporations, governments, or other institutions — will initially gain significant power through AI deployment, using these systems to advance their interests and missions.

Here’s how it might happen:

  • First, economic and political leaders adopt AI systems that enhance their existing advantages. A financial firm deploys AI trading systems that outcompete human traders. Politicians use AI advisers to win elections and keep voters happy. These initial adopters don’t experience disempowerment — they experience success, which encourages their competitors to also adopt AI.
  • As time moves on, humans have less control. Corporate boards might try to change direction against the advice of their AIs, only to find share prices plummeting because the AIs had a far better business strategy. Government officials may realise they don’t understand the AI systems running key services enough to change what they’re doing successfully.
  • Only later, as AI systems become increasingly powerful, might there be signs that the systems are drifting out of alignment with human interests — not because they are trying to, but because they are advancing proxies of success that don’t quite line up with what’s actually good for people.
  • In the cultural sphere, for example, media companies might deploy AI to create increasingly addictive content, reshaping human preferences. What begins as entertainment evolves into persuasion technology that can shape political outcomes, diminishing democratic control.

Once humans start losing power in these ways, they may irreversibly have less and less ability to influence the future course of events. Eventually, their needs may not be addressed at all by the most powerful global actors. In the most extreme case, the species as we know it may not survive.

Many other scenarios are possible.

There are some versions of apparent “disempowerment” that could look like a utopia: humans flourishing and happy in a society expertly managed and fundamentally controlled by benevolent AI systems. Or maybe one day, humanity will decide it’s happy to cede the future to AI systems that we consider worthy descendants.

But this risk is that humanity could “hand over” control unintentionally and in a way that few of us would endorse. We might be gradually replaced by AI systems with no conscious experiences, or the future may eventually be dominated by fierce Darwinian competition between various digital agents. That could mean the future is sapped of most value — a catastrophic loss.

We want to better understand these dynamics and risks to increase the prospects that the future goes well.

How pressing is this issue?

We feel very uncertain about how likely various gradual disempowerment scenarios are. It is difficult to disentangle the possibilities from related risks of power-seeking AI systems and questions about the moral status of digital minds, which are also hard to be certain about.

Because the area is steeped in uncertainty, it’s unclear what the best interventions are. We think more work should be done to understand this problem and its potential solutions at least — and it’s likely some people should be focusing on it.

What are the arguments against this being a pressing problem?

There are several reasons you might not think this problem is very pressing:

  • You might think it will be solved by default, because if we avoid other risks from AI, advanced AI systems will help us navigate these problems.
  • You might think it’s very unlikely that AI systems, if not actively scheming against us, will end up contributing to an existential catastrophe for humanity — even if there are some problems of disempowerment. This might make you think this is an issue, but not nearly as big an issue as other, more existential risks from AI.
  • You might think there just aren’t good solutions to this problem.
  • You might think the gradual disempowerment of humanity wouldn’t constitute an existential catastrophe. For example, perhaps it’d be good or nearly as good as other futures.

What can you do to help?

Given the relatively limited state of our knowledge on this topic, we’d guess the best way to help with this problem is likely carrying out more research to understand it better. (Read more about research skills.)

Backgrounds in philosophy, history, economics, sociology, and political science — in addition to machine learning and AI — may be particularly relevant.

You might want to work in academia, think tanks, or at nonprofit research institutions.

At some point, if we have a better understanding of threat models and potential solutions, it will likely be important to have people working in AI governance and policy who are focused on reducing these risks. So pursuing a career in AI governance, while building an understanding of this emerging area of research as well as the other major AI risks, may be a promising strategy for eventually helping to reduce the risk of gradual disempowerment.

Kulveit et al. suggest some approaches to mitigating the risk of gradual disempowerment, including:

  • Measuring and monitoring
    • Develop metrics to track human and AI influence in economic, cultural, and political systems
    • Make plans to identify warning signs of potential disempowerment
  • Preventing excessive AI influence
    • Implement regulatory frameworks requiring human oversight
    • Apply progressive taxation on AI-generated revenues
    • Establish cultural norms supporting human agency
  • Strengthening human control:
    • Create more robust democratic processes
    • Ensure that AI systems remain understandable to humans
    • Develop AI delegates that represent human interests while remaining competitive
  • System-wide alignment
    • Research “ecosystem alignment” that maintains human values within complex socio-technical systems
    • Develop frameworks for aligning civilisation-wide interactions between humans and AI

Key organisations in this space

Some organisations where you might be able to do relevant research include:

You can also explore roles at other organisations that work on AI safety and policy.

Our job board features opportunities in AI safety and policy:

    View all opportunities

    Learn more

    Read next:  Explore other pressing world problems

    Want to learn more about global issues we think are especially pressing? See our list of issues that are large in scale, solvable, and neglected, according to our research.

    Plus, join our newsletter and we’ll mail you a free book

    Join our newsletter and we’ll send you a free copy of The Precipice — a book by philosopher Toby Ord about how to tackle the greatest threats facing humanity. T&Cs here.

    The post Gradual disempowerment appeared first on 80,000 Hours.

    ]]>
    Preventing an AI-related catastrophe https://80000hours.org/problem-profiles/artificial-intelligence/ Thu, 25 Aug 2022 19:43:58 +0000 https://80000hours.org/?post_type=problem_profile&p=77853 The post Preventing an AI-related catastrophe appeared first on 80,000 Hours.

    ]]>

    Note from the author: At its core, this problem profile tries to predict the future of technology. This is a notoriously difficult thing to do. In addition, there has been much less rigorous research into the risks from AI than into the other risks 80,000 Hours writes about (like pandemics or climate change).1 That said, there is a growing field of research into the topic, which I’ve tried to reflect. For this article I’ve leaned especially on this report by Joseph Carlsmith at Open Philanthropy (also available as a narration), as it’s the most rigorous overview of the risk that I could find. I’ve also had the article reviewed by over 30 people with different expertise and opinions on the topic. (Almost all are concerned about advanced AI’s potential impact.)

    Why do we think that reducing risks from AI is one of the most pressing issues of our time? In short, our reasons are:

    1. Even before getting into the actual arguments, we can see some cause for concern — as many AI experts think there’s a small but non-negligible chance that AI will lead to outcomes as bad as human extinction.
    2. We’re making advances in AI extremely quickly — which suggests that AI systems could have a significant influence on society, soon.
    3. There are strong arguments that “power-seeking” AI could pose an existential threat to humanity2 — which we’ll go through below.
    4. Even if we find a way to avoid power-seeking, there are still other risks.
    5. We think we can tackle these risks.
    6. This work is neglected.

    We’re going to cover each of these in turn, then consider some of the best counterarguments, explain concrete things you can do to help, and finally outline some of the best resources for learning more about this area.

    If you’d like, you can watch our 10-minute video summarising the case for AI risk before reading further:

    1. Many AI experts think there’s a non-negligible chance AI will lead to outcomes as bad as extinction

    In May 2023, hundreds of prominent AI scientists — and other notable figures — signed a statement saying that mitigating the risk of extinction from AI should be a global priority.

    So it’s pretty clear that at least some experts are concerned.

    But how concerned are they? And is this just a fringe view?

    We looked at four surveys of AI researchers who published at NeurIPS and ICML (two of the most prestigious machine learning conferences) from 2016, 2019, 2022 and 2023.3

    It’s important to note that there could be considerable selection bias on surveys like this. For example, you might think researchers who go to the top AI conferences are more likely to be optimistic about AI, because they have been selected to think that AI research is doing good. Alternatively, you might think that researchers who are already concerned about AI are more likely to respond to a survey asking about these concerns.4

    All that said, here’s what we found:

    In all four surveys, the median researcher thought that the chances that AI would be “extremely good” was reasonably high: 20% in the 2016 survey, 20% in 2019, 10% in 2022, and 10% in 2023.5

    Indeed, AI systems are already having substantial positive effects — for example, in medical care or academic research.

    But in all four surveys, the median researcher also estimated small — and certainly not negligible — chances that AI would be “extremely bad (e.g. human extinction)”: a 5% chance of extremely bad outcomes in the 2016 survey, 2% in 2019, 5% in 2022 and 5% in 2023. 6

    In the 2022 survey, participants were specifically asked about the chances of existential catastrophe caused by future AI advances — and again, over half of researchers thought the chances of an existential catastrophe was greater than 5%.7

    So experts disagree on the degree to which AI poses an existential risk — a kind of threat we’ve argued deserves serious moral weight.

    This fits with our understanding of the state of the research field. Three of the leading companies developing AI — DeepMind, Anthropic and OpenAI — also have teams dedicated to figuring out how to solve technical safety issues that we believe could, for reasons we discuss at length below, lead to an existential threat to humanity.8

    There are also several academic research groups (including at MIT, Cambridge, Carnegie Mellon University, and UC Berkeley) focusing on these same technical AI safety problems.9

    It’s hard to know exactly what to take from all this, but we’re confident that it’s not a fringe position in the field to think that there is a material risk of outcomes as bad as an existential catastrophe. Some experts in the field maintain, though, that the risks are overblown.

    Still, why do we side with those who are more concerned? In short, it’s because there are arguments we’ve found persuasive that AI could pose such an existential threat — arguments we will go through step by step below.

    It’s important to recognise that the fact that many experts recognise there’s a problem doesn’t mean that everything’s OK because the experts have got it covered. Overall, we think this problem remains highly neglected (more on this below), especially as billions of dollars a year are spent to make AI more advanced.10

    2. We’re making advances in AI extremely quickly

    Three cats dressed as computer programmers generated by different AI software.
    A cat dressed as a computer programmer” as generated by Craiyon (formerly DALL-E mini) (top left), OpenAI’s DALL-E 2. (top right), and Midjourney V6. DALL-E mini uses a model 27 times smaller than OpenAI’s DALL-E 1 model, released in January 2021. DALL-E 2 was released in April 2022.11 Midjourney released the sixth version of its model in December 2023.

    Before we try to figure out what the future of AI might look like, it’s helpful to take a look at what AI can already do.

    Modern AI techniques involve machine learning (ML): models that improve automatically through data input. The most common form of this technique used today is known as deep learning.

    For a brief explanation of how deep learning works, see here:

    Machine learning techniques, in general, take some input data and produce some outputs, in a way that depends on some parameters in the model, which are learned automatically rather than being specified by programmers.

    Most of the recent advances in machine learning use neural networks. A neural network transforms input data into output data by passing it through several hidden ‘layers’ of simple calculations, with each layer made up of ‘neurons.’ Each neuron receives data from the previous layer, performs some calculation based on its parameters (basically some numbers specific to that neuron), and passes the result on to the next layer.

    Nodes represent neurons, and arrows indicate data flow between layers.

    The engineers developing the network will choose some measure of success for the network (known as a ‘loss’ or ‘objective’ function). The degree to which the network is successful (according to the measure chosen) will depend on the exact values of the parameters for each neuron on the network.

    The network is then trained using a large quantity of data. By using an optimisation algorithm (most commonly stochastic gradient descent), the parameters of each neuron are gradually tweaked each time the network is tested against the data using the loss function. The optimisation algorithm will (generally) make the neural network perform slightly better each time the parameters are tweaked. Eventually, the engineers will end up with a network that performs pretty well on the measure chosen.

    Deep learning refers to the use of neural networks with many layers.

    To learn more, we recommend:

    ChatGPT’s release in November 2022 made many people realise that deep learning was a sea change in AI. Since then, large language models, image models, and other AI systems have continued to advance rapidly and drawn massive investments.12

    Because the models get better so quickly, it can be hard for the public to keep up. You might have an outdated view of just how much modern AI systems can do if you haven’t used the latest models to their full extent.13

    But we shouldn’t just think about what they can do now. We should think about how they’ve improved so far and how they’re likely to improve in the future.

    For example, consider the rapid progress language models have made on the GPQA benchmark, which asks challenging, PhD-level questions about chemistry, physics, and biology:

    They’ve also shown impressive improvement on tasks like software engineering and expert-level math problems.

    As of March 2025, other impressive achievements from AI systems include:

    • Using computers: Anthropic and OpenAI have AI models that can be directed to carry out tasks independently on your computer. These capabilities are still rudimentary, but we expect them to improve quickly.
    • Competing in math contests: By combining the models AlphaProof and AlphaGeometry 2, Google DeepMind was able to use AI to achieve silver medal performance in the International Mathematical Olympiad.
    • Combining multiple human-like abilities: Models are increasingly multi-modal, which means they combine the abilities to write and read text, comprehend and create images, and understand and respond to spoken language.
    • Predicting complex biomolecular structures and interactions: Google DeepMind’s AlphaFold 3 — a successor to a Nobel Prize-winning AI system — can predict how proteins interact with DNA, RNA, and other structures at the molecular level.
    • Improving robotics: Google DeepMind’s Gemini Robotics model uses a language model to control robots, which can respond to verbal directions, demonstrate spatial awareness, and perform a range of real-world tasks.
    • Self-driving cars: Waymo reportedly conducts 150,000 rides per week in self-driving cars as of March 2025 in major U.S. cities, an increase of 3x from just a few months before, with plans to expand further.
    • Creating novel videos and images: Image models are now capable of generating high-quality images from written descriptions, and video models such as Sora and Veo can even produce impressive short clips based on text prompts.
    • Enhancing the work of doctors, lawyers, and scientists: Researchers have found evidence that AI systems can outperform doctors in diagnosing patients, significantly improve the productivity of lawyers, anticipate discoveries in neuroscience, and improve research in material science.14
    • Helping with AI research: There’s also evidence that AI systems can outperform humans in AI R&D tasks, when limited to a two-hour time window.15

    If you’re anything like us, you found the complexity and breadth of the tasks these systems can carry out surprising.

    And if the technology keeps advancing at this pace, it seems clear there will be major effects on society. At the very least, automating tasks makes carrying out those tasks cheaper. As a result, we may see rapid increases in economic growth (perhaps even to the level we saw during the Industrial Revolution).

    If we’re able to partially or fully automate scientific advancement we may see more transformative changes to society and technology.16

    That could be just the beginning. We may be able to get computers to eventually automate anything humans can do. This seems like it has to be possible — at least in principle. This is because it seems that, with enough power and complexity, a computer should be able to simulate the human brain. This would itself be a way of automating anything humans can do (if not the most efficient method of doing so).

    And as we’ll see in the next section, there are some indications that extensive automation may well be possible through scaling up existing techniques.

    Current trends show rapid progress in the capabilities of ML systems

    There are three things that are crucial to building AI through machine learning:

    1. Good algorithms (e.g. more efficient algorithms are better)
    2. Data to train an algorithm
    3. Enough computational power (known as compute) to do this training

    Epoch is a team of scientists investigating trends in the development of advanced AI — in particular, how these three inputs are changing over time.

    They found that the amount of compute used for training the largest AI models has been rising exponentially — doubling on average every six months since 2010.

    That means the amount of computational power used to train our largest machine learning models has grown by over one billion times.

    Epoch also looked at how much compute has been needed to train a neural network to have the same performance on ImageNet (a well-known test data set for computer vision).

    They found that the amount of compute required for the same performance has been falling exponentially — halving every 10 months.

    So since 2012, the amount of compute required for the same level of performance has fallen by over 10,000 times. Combined with the increased compute used for training, that’s a lot of growth.

    Finally, they found that the size of the data sets used to train the largest language models has been doubling roughly once a year since 2010. And Epoch has projected that it will be feasible to scale the existing pace of AI training on the frontier through 2030.

    It’s hard to be sure that the capability growth will continue, but the trends speak to the incredible gains that are possible with machine learning.

    Indeed, it looks like increasing the size of models (and the amount of compute used to train them) introduces ever more sophisticated behaviour. This is how systems like GPT-4 perform tasks they weren’t specifically trained for.

    These observations have led to the scaling hypothesis: that we can simply build bigger and bigger neural networks, and as a result we will end up with more and more powerful artificial intelligence, and that this trend of increasing capabilities may increase to human-level AI and beyond.

    If this is true, we can attempt to predict how the capabilities of AI technology will increase over time simply by looking at how quickly we are increasing the amount of compute available to train models.

    In late 2024, we also started to see a new frontier for scaling in inference time compute. This is compute used when, for example, a language model answers a question.

    The leading companies have found that by giving an AI model more time to “think” about its answers, to reason through different possibilities and select among them, they can perform much better. With this innovation, AI companies found a new way to make models even more powerful.

    As we’ll see in the next section, it’s not just the scaling hypothesis that suggests we could end up with extremely powerful AI relatively soon — other methods of predicting AI progress come to similar conclusions.

    When can we expect transformative AI?

    It’s difficult to predict exactly when we will develop AI that we expect to be hugely transformative for society (for better or for worse) — for example, by automating all human work or drastically changing the structure of society.17

    But by early 2025, leaders of some of the frontier AI companies were clearly stating that they were expecting to get very powerful AI systems soon.

    OpenAI CEO Sam Altman, Anthropic CEO Dario Amodei, and Google DeepMind CEO Demis Hassabis all said that they expect to develop AI that can fully replace at least some forms of human labour within just a few years, or even less.

    Sam Altman wrote in January 2025:

    We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents “join the workforce” and materially change the output of companies.

    Demis Hassabis said in January 2025:

    We’ve sort of had a consistent view about AGI being a system that’s capable of exhibiting all of the cognitive capabilities humans can. And I think we’re getting closer and closer, but I think we’re still probably a handful of years away.

    Dario Amodei wrote in February 2025:

    Time is short, and we must accelerate our actions to match accelerating AI progress. Possibly by 2026 or 2027 (and almost certainly no later than 2030), the capabilities of AI systems will be best thought of as akin to an entirely new state populated by highly intelligent people appearing on the global stage — a “country of geniuses in a datacenter”— with the profound economic, societal, and security implications that would bring.

    It’s reasonable to have some scepticism about these predictions.

    Other approaches to forecasting the arrival of transformative AI systems have also suggested the technology is closer than many would assume:

    • Data from the 2023 survey of 3000 AI experts implied there is 33% probability of human-level machine intelligence (which would plausibly be transformative in this sense) by 2036, 50% probability by 2047, and 80% by 2100.18 There are a lot of reasons to be suspicious of these estimates,4 but we take it as one data point.
    • Ajeya Cotra (a researcher at Open Philanthropy) attempted to forecast transformative AI by comparing modern deep learning to the human brain. Deep learning involves using a huge amount of compute to train a model, before that model is able to perform some task. There’s also a relationship between the amount of compute used to train a model and the amount used by the model when it’s run. And — if the scaling hypothesis is true — we should expect the performance of a model to predictably improve as the computational power used increases. So Cotra used a variety of approaches (including, for example, estimating how much compute the human brain uses on a variety of tasks) to estimate how much compute might be needed to train a model that, when run, could carry out the hardest tasks humans can do. She then estimated when using that much compute would be affordable.
    • Tom Davidson (also a researcher at Open Philanthropy) wrote a report to complement Cotra’s work. He attempted to figure out when we might expect to see transformative AI based only on looking at various types of research that transformative AI might be like (e.g. developing technology that’s the ultimate goal of a STEM field, or proving difficult mathematical conjectures), and how long it’s taken for each of these kinds of research to be completed in the past, given some quantity of research funding and effort.
      • Davidson’s report estimates that, solely on this information, you’d think that there was an 8% chance of transformative AI by 2036, 13% by 2060, and 20% by 2100. However, Davidson doesn’t consider the actual ways in which AI has progressed since research started in the 1950s, and notes that it seems likely that the amount of effort we put into AI research will increase as AI becomes increasingly relevant to our economy. As a result, Davidson expects these numbers to be underestimates.
    • Holden Karnofsky attempted to sum up the findings of others’ forecasts. In 2021, he guessed that there was more than a 10% chance we’d see transformative AI by 2036, 50% by 2060, and 66% by 2100.
    Method Chance of transformative AI by 2036 Chance of transformative AI by 2060 Chance of transformative AI by 2100
    Expert survey (Grace et al., 2024) 33% 50% (by 2047) 80%
    Expert survey (Zhang et al., 2022) 20% 50% 85%
    Biological anchors (Cotra, 2022) 35% 60% (by 2050) 80% (according to the 2020 report)
    Semi-informative priors (Davidson, 2021) 8% 13% 20%
    Overall guess (Karnofsky, 2021) 10% 50% 66%

    All in all, AI seems to be advancing rapidly. More money and talent is going into the field every year, models are getting bigger and more efficient, and we keep learning about new ways to improve their capabilities.

    Even if AI were advancing more slowly, we’d be concerned about it — most of the arguments about the risks from AI (that we’ll get to below) do not depend on this rapid progress. And it’s possible that the existing progress could stall out before the technology becomes truly transformative.

    However, the speed of these recent developments increases the urgency of the issue. And all the estimates in the above table were made before many of the impressive advancements in 2024 and early 2025, so they may even overstate how much time we have.

    In fact, we think it’s plausible that extremely powerful AI systems that can replace much of human labour will be here before 2030, as we’ve discussed elsewhere.20 And it’s worth taking action on that basis.

    3. Power-seeking AI could pose an existential threat to humanity

    We’ve argued so far that we expect AI to be an important — and potentially transformative — new technology.

    We’ve also seen reason to think that such transformative AI systems could be built in the near future.

    Now we’ll turn to the core question: why do we think this matters so much?

    There could be a lot of reasons. If advanced AI is as transformative as it seems like it’ll be, there will be many important consequences. But here we are going to explain the issue that seems most concerning to us: AI systems could pose risks by seeking and gaining power.

    We’ll argue that:

    1. It’s likely that we’ll build AI systems that can make and execute plans to achieve goals
    2. Advanced planning systems could easily be ‘misaligned’ — in a way that could lead them to make plans that involve disempowering humanity
    3. Disempowerment by AI systems would be an existential catastrophe
    4. People might deploy AI systems that are misaligned, despite this risk

    Thinking through each step, I think there’s something like a 1% chance of an existential catastrophe resulting from power-seeking AI systems this century. This is my all things considered guess at the risk incorporating considerations of the argument in favour of the risk (which is itself probabilistic), as well as reasons why this argument might be wrong (some of which I discuss below). This puts me on the less worried end of 80,000 Hours staff, whose views on our last staff survey ranged from 1–55%, with a median of 15%.

    It’s likely we’ll build advanced planning systems

    We’re going to argue that future systems with the following three properties might pose a particularly important threat to humanity:21

    1. They have goals and are good at making plans.

      Not all AI systems have goals or make plans to achieve those goals. But some systems (like some chess-playing AI systems) can be thought of in this way. When discussing power-seeking AI, we’re considering planning systems that are relatively advanced, with plans that are in pursuit of some goal(s), and that are capable of carrying out those plans.

    2. They have excellent strategic awareness.

      A particularly good planning system would have a good enough understanding of the world to notice obstacles and opportunities that may help or hinder its plans, and respond to these accordingly. Following Carlsmith, we’ll call this strategic awareness, since it allows systems to strategise in a more sophisticated way.

    3. They have highly advanced capabilities relative to today’s systems.

      For these systems to actually affect the world, we need them to not just make plans, but also be good at all the specific tasks required to execute those plans.

      Since we’re worried about systems attempting to take power from humanity, we are particularly concerned about AI systems that might be better than humans on one or more tasks that grant people significant power when carried out well in today’s world.

      For example, people who are very good at persuasion and/or manipulation are often able to gain power — so an AI being good at these things might also be able to gain power. Other examples might include hacking into other systems, tasks within scientific and engineering research, as well as business, military, or political strategy.

    These systems seem technically possible and we’ll have strong incentives to build them

    As we saw above, we’ve already produced systems that are very good at carrying out specific tasks.

    We’ve also already produced rudimentary planning systems, like AlphaStar, which skilfully plays the strategy game Starcraft, and MuZero, which plays chess, shogi, and Go.22

    We’re not sure whether these systems are producing plans in pursuit of goals per se, because we’re not sure exactly what it means to “have goals.” However, since they consistently plan in ways that achieve goals, it seems like they have goals in some sense.

    Moreover, some existing systems seem to actually represent goals as part of their neural networks.23

    That said, planning in the real world (instead of games) is much more complex, and to date we’re not aware of any unambiguous examples of goal-directed planning systems, or systems that exhibit high degrees of strategic awareness.

    But as we’ve discussed, we expect to see further advances within this century. And we think these advances are likely to produce systems with all three of the above properties.

    That’s because we think that there are particularly strong incentives (like profit) to develop these kinds of systems. In short: because being able to plan to achieve a goal, and execute that plan, seems like a particularly powerful and general way of affecting the world.

    Getting things done — whether that’s a company selling products, a person buying a house, or a government developing policy — almost always seems to require these skills. One example would be assigning a powerful system a goal and expecting the system to achieve it — rather than having to guide it every step of the way. So planning systems seem likely to be (economically and politically) extremely useful.24

    And if systems are extremely useful, there are likely to be big incentives to build them. For example, an AI that could plan the actions of a company by being given the goal to increase its profits (that is, an AI CEO) would likely provide significant wealth for the people involved — a direct incentive to produce such an AI.

    As a result, if we can build systems with these properties (and from what we know, it seems like we will be able to), it seems like we are likely to do so.25

    Advanced planning systems could easily be dangerously ‘misaligned’

    There are reasons to think that these kinds of advanced planning AI systems will be misaligned. That is, they will aim to do things that we don’t want them to do.26

    There are many reasons why systems might not be aiming to do exactly what we want them to do. For one thing, we don’t know how, using modern ML techniques, to give systems the precise goals we want (more here).27

    We’re going to focus specifically on some reasons why systems might by default be misaligned in such a way that they develop plans that pose risks to humanity’s ability to influence the world — even when we don’t want that influence to be lost.28

    What do we mean by “by default”? Essentially, unless we actively find solutions to some (potentially quite difficult) problems, then it seems like we’ll create dangerously misaligned AI. (There are reasons this might be wrong — which we discuss later.)

    Three examples of “misalignment” in a variety of systems

    It’s worth noting that misalignment isn’t a purely theoretical possibility (or specific to AI) — we see misaligned goals in humans and institutions all the time, and have also seen examples of misalignment in AI systems.29

    The democratic political framework is intended to ensure that politicians make decisions that benefit society. But what political systems actually reward is winning elections, so that’s what many politicians end up aiming for.

    This is a decent proxy goal — if you have a plan to improve people’s lives, they’re probably more likely to vote for you — but it isn’t perfect. As a result, politicians do things that aren’t clearly the best way of running a country, like raising taxes at the start of their term and cutting them right before elections.

    That is to say, the things the system does are at least a little different from what we would, in a perfect world, want it to do: the system is misaligned.

    Companies have profit-making incentives. By producing more, and therefore helping people obtain goods and services at cheaper prices, companies make more money.

    This is sometimes a decent proxy for making the world better, but profit isn’t actually the same as the good of all of humanity (bold claim, we know). As a result, there are negative externalities: for example, companies will pollute to make money despite this being worse for society overall.

    Again, we have a misaligned system, where the things the system does are at least a little different from what we would want it to do.

    DeepMind has documented examples of specification gaming: an AI doing well according to its specified reward function (which encodes our intentions for the system), but not doing what researchers intended.

    In one example, a robot arm was asked to grasp a ball. But the reward was specified in terms of whether humans thought the robot had been successful. As a result, the arm learned to hover between the ball and the camera, fooling the humans into thinking that it had grasped the ball.30

    A simulated arm hovers between a ball and a camera.
    Source: Christiano et al., 2017

    So we know it’s possible to create a misaligned AI system.

    Why these systems could (by default) be dangerously misaligned

    Here’s the core argument of this article. We’ll use all three properties from earlier: planning ability, strategic awareness, and advanced capabilities.

    To start, we should realise that a planning system that has a goal will also develop ‘instrumental goals’: things that, if they occur, will make it easier to achieve an overall goal.

    We use instrumental goals in plans all the time. For example, a high schooler planning their career might think that getting into university will be helpful for their future job prospects. In this case, “getting into university” would be an instrumental goal.

    A sufficiently advanced AI planning system would also include instrumental goals in its overall plans.

    If a planning AI system also has enough strategic awareness, it will be able to identify facts about the real world (including potential things that would be obstacles to any plans), and plan in light of them. Crucially, these facts would include that access to resources (e.g. money, compute, influence) and greater capabilities — that is, forms of power — open up new, more effective ways of achieving goals.

    This means that, by default, advanced planning AI systems would have some worrying instrumental goals:

    • Self-preservation — because a system is more likely to achieve its goals if it is still around to pursue them (in Stuart Russell’s memorable phrase, “You can’t fetch the coffee if you’re dead”).
    • Preventing any changes to the AI system’s goals — since changing its goals would lead to outcomes that are different from those it would achieve with its current goals.
    • Gaining power — for example, by getting more resources and greater capabilities.

    Crucially, one clear way in which the AI can ensure that it will continue to exist (and not be turned off), and that its objectives will never be changed, would be to gain power over the humans who might affect it (we talk here about how AI systems might actually be able to do that).

    What’s more, the AI systems we’re considering have advanced capabilities — meaning they can do one or more tasks that grant people significant power when carried out well in today’s world. With such advanced capabilities, these instrumental goals will not be out of reach, and as a result, it seems like the AI system would use its advanced capabilities to get power as part of the plan’s execution. If we don’t want the AI systems we create to take power away from us this would be a particularly dangerous form of misalignment.

    In the most extreme scenarios, a planning AI system with sufficiently advanced capabilities could successfully disempower us completely.

    As a (very non-rigorous) intuitive check on this argument, let’s try to apply it to humans.

    Humans have a variety of goals. For many of these goals, some form of power-seeking is advantageous: though not everyone seeks power, many people do (in the form of wealth or social or political status), because it’s useful for getting what they want. This is not catastrophic (usually!) because, as human beings:

    • We generally feel bound by human norms and morality (even people who really want wealth usually aren’t willing to kill to get it).
    • We aren’t that much more capable or intelligent than one another. So even in cases where people aren’t held back by morality, they’re not able to take over the world.

    (We discuss whether humans are truly power-seeking later.)

    A sufficiently advanced AI wouldn’t have those limitations.

    It might be hard to find ways to prevent this sort of misalignment

    The point of all this isn’t to say that any advanced planning AI system will necessarily attempt to seek power. Instead, it’s to point out that, unless we find a way to design systems that don’t have this flaw, we’ll face significant risk.

    It seems more than plausible that we could create an AI system that isn’t misaligned in this way, and thereby prevent any disempowerment. Here are some strategies we might take (plus, unfortunately, some reasons why they might be difficult in practice):31

    • Control the objectives of the AI system. We may be able to design systems that simply don’t have objectives to which the above argument applies — and thus don’t incentivise power-seeking behaviour. For example, we could find ways to explicitly instruct AI systems not to harm humans, or find ways to reward AI systems (in training environments) for not engaging in specific kinds of power-seeking behaviour (and also find ways to ensure that this behaviour continues outside the training environment).

      Carlsmith gives two reasons why doing this seems particularly hard.

      First, for modern ML systems, we don’t get to explicitly state a system’s objectives — instead we reward (or punish) a system in a training environment so that it learns on its own. This raises a number of difficulties, one of which is goal misgeneralisation. Researchers have uncovered real examples of systems that appear to have learned to pursue a goal in the training environment, but then fail to generalise that goal when they operate in a new environment. This raises the possibility that we could think we’ve successfully trained an AI system not to seek power — but that the system would seek power anyway when deployed in the real world.32

      Second, when we specify a goal to an AI system (or, when we can’t explicitly do that, when we find ways to reward or punish a system during training), we usually do this by giving the system a proxy by which outcomes can be measured (e.g. positive human feedback on a system’s achievement). But often those proxies don’t quite work.33 In general, we might expect that even if a proxy appears to correlate well with successful outcomes, it might not do so when that proxy is optimised for. (The examples above of politicians, companies, and the robot arm failing to grasp a ball are illustrations of this.) We’ll look at a more specific example of how problems with proxies could lead to an existential catastrophe here.

      For more on the specific difficulty of controlling the objectives given to deep neural networks trained using self-supervised learning and reinforcement learning, we recommend former OpenAI governance researcher Richard Ngo’s discussion of how realistic training processes lead to the development of misaligned goals.

    • Control the inputs into the AI system. AI systems will only develop plans to seek power if they have enough information about the world to realise that seeking power is indeed a way to achieve its goals.

    • Control the capabilities of the AI system. AI systems will likely only be able to carry out plans to seek power if they have sufficiently advanced capabilities in skills that grant people significant power in today’s world.

    But to make any strategy work, it will need to both:

    • Retain the usefulness of the AI systems — and so remain economically competitive with less safe systems. Controlling the inputs and capabilities of AI systems will clearly have costs, so it seems hard to ensure that these controls, even if they’re developed, are actually used. But this is also a problem for controlling a system’s objectives. For example, we may be able to prevent power-seeking behaviour by ensuring that AI systems stop to check in with humans about any decisions they make. But these systems might be significantly slower and less immediately useful to people than systems that don’t stop to carry out these checks. As a result, there might still be incentives to use a faster, more initially effective misaligned system (we’ll look at incentives more in the next section).

    • Continue to work as the planning ability and strategic awareness of systems improve over time. Some seemingly simple solutions (for example, trying to give a system a long list of things it isn’t allowed to do, like stealing money or physically harming humans) break down as the planning abilities of the systems increase. This is because, the more capable a system is at developing plans, the more likely it is to identify loopholes or failures in the safety strategy — and as a result, the more likely the system is to develop a plan that involves power-seeking.

    Ultimately, by looking at the state of the research on this topic, and speaking to experts in the field, we think that there are currently no known ways of building aligned AI systems that seem likely to fulfil both these criteria.

    So: that’s the core argument. There are many variants of this argument. Some have argued that AI systems might gradually shape our future via subtler forms of influence that nonetheless could amount to an existential catastrophe; others argue that the most likely form of disempowerment is in fact just killing everyone. We’re not sure how a catastrophe would be most likely to play out, but have tried to articulate the heart of the argument, as we see it: that AI presents an existential risk.

    There are definitely reasons this argument might not be right! We go through some of the reasons that seem strongest to us below. But overall it seems possible that, for at least some kinds of advanced planning AI systems, it will be harder to build systems that don’t seek power in this dangerous way than to build systems that do.

    At this point, you may have questions like:

    We think there are good responses to all these questions, so we’ve added a long list of arguments against working on AI risk — and our responses — for these (and other) questions below.

    Disempowerment by AI systems would be an existential catastrophe

    When we say we’re concerned about existential catastrophes, we’re not just concerned about risks of extinction. This is because the source of our concern is rooted in longtermism: the idea that the lives of all future generations matter, and so it’s extremely important to protect their interests.

    This means that any event that could prevent all future generations from living lives full of whatever you think makes life valuable (whether that’s happiness, justice, beauty, or general flourishing) counts as an existential catastrophe.

    It seems extremely unlikely that we’d be able to regain power over a system that successfully disempowers humanity. And as a result, the entirety of the future — everything that happens for Earth-originating life, for the rest of time — would be determined by the goals of systems that, although built by us, are not aligned with us. Perhaps those goals will create a long and flourishing future, but we see little reason for confidence.34

    This isn’t to say that we don’t think AI also poses a risk of human extinction. Indeed, we think making humans extinct is one highly plausible way in which an AI system could completely and permanently ensure that we are never able to regain power.

    People might deploy misaligned AI systems despite the risk

    Surely no one would actually build or use a misaligned AI if they knew it could have such terrible consequences, right?

    Unfortunately, there are at least two reasons people might create and then deploy misaligned AI — which we’ll go through one at a time:35

    1. People might think it’s aligned when it’s not

    Imagine there’s a group of researchers trying to tell, in a test environment, whether a system they’ve built is aligned. We’ve argued that an intelligent planning AI will want to improve its abilities to effect changes in pursuit of its objective, and it’s almost always easier to do that if it’s deployed in the real world, where a much wider range of actions are available. As a result, any misaligned AI that’s sophisticated enough will try to understand what the researchers want it to do and at least pretend to be doing that, deceiving the researchers into thinking it’s aligned. (For example, a reinforcement learning system might be rewarded for certain apparent behaviour during training, regardless of what it’s actually doing.)

    Hopefully, we’ll be aware of this sort of behaviour and be able to detect it. But catching a sufficiently advanced AI in deception seems potentially harder than catching a human in a lie, which isn’t always easy. For example, a sufficiently intelligent deceptive AI system may be able to deceive us into thinking we’ve solved the problem of AI deception, even if we haven’t.

    If AI systems are good at deception, and have sufficiently advanced capabilities, a reasonable strategy for such a system could be to deceive humans completely until the system has a way to guarantee it can overcome any resistance to its goals.

    2. There are incentives to deploy systems sooner rather than later

    We might also expect some people with the ability to deploy a misaligned AI to charge ahead despite any warning signs of misalignment that do come up, because of race dynamics — where people developing AI want to do so before anyone else.

    For example, if you’re developing an AI to improve military or political strategy, it’s much more useful if none of your rivals have a similarly powerful AI.

    These incentives apply even to people attempting to build an AI in the hopes of using it to make the world a better place.

    For example, say you’ve spent years and years researching and developing a powerful AI system, and all you want is to use it to make the world a better place. Simplifying things a lot, say there are two possibilities:

    1. This powerful AI will be aligned with your beneficent aims, and you’ll transform society in a potentially radically positive way.
    2. The AI will be sufficiently misaligned that it’ll take power and permanently end humanity’s control over the future.

    Let’s say you think there’s a 90% chance that you’ve succeeded in building an aligned AI. But technology often develops at similar speeds across society, so there’s a good chance that someone else will soon also develop a powerful AI. And you think they’re less cautious, or less altruistic, so you think their AI will only have an 80% chance of being aligned with good goals, and pose a 20% chance of existential catastrophe. And only if you get there first can your more beneficial AI be dominant. As a result, you might decide to go ahead with deploying your AI, accepting the 10% risk.

    This all sounds very abstract. What could an existential catastrophe caused by AI actually look like?

    The argument we’ve given so far is very general, and doesn’t really look at the specifics of how an AI that is attempting to seek power might actually do so.

    If you’d like to get a better understanding of what an existential catastrophe caused by AI might actually look like, we’ve written a short separate article on that topic. If you’re happy with the high-level abstract arguments so far, feel free to skip to the next section!

    What could an existential AI catastrophe actually look like?

    4. Even if we find a way to avoid power-seeking, there are still risks

    So far we’ve described what a large proportion of researchers in the field2 think is the major existential risk from potential advances in AI, which depends crucially on an AI seeking power to achieve its goals.

    If we can prevent power-seeking behaviour, we will have reduced existential risk substantially.

    But even if we succeed, there are still existential risks that AI could pose.

    There are at least two ways these risks could arise:

    • We expect that AI systems will help increase the rate of scientific progress.36 While there would be clear benefits to this automation — the rapid development of new medicine, for example — some forms of technological development can pose threats, including existential threats, to humanity. This technological advancement might increase our available destructive power or make dangerous technologies cheaper or more widely accessible.
    • We might start to see AI automate many – or possibly even all – economically important tasks. It’s hard to predict exactly what the effects of this would be on society. But it seems plausible that this could increase existential risks. For example, if AI systems are highly transformative, then their use (or potential use) could possibly create insurmountable power imbalances. Even the threat of this might be enough. For example, a military might feel pushed to create transformative automated weapons because it knows or believes its enemies are doing so, even if this dynamic benefits no one.

    We know of several specific areas in which advanced AI may increase existential risks, though are likely others we haven’t thought of.

    Bioweapons

    In 2022, Collaborations Pharmaceuticals — a small research corporation in North Carolina — were building an AI model to help determine the structure of new drugs. As part of this process, they trained the model to penalise drugs that it predicted were toxic. This had just one problem: you could run the toxicity prediction in reverse to invent new toxic drugs.

    Some of the deadliest events in human history have been pandemics. Pathogens’ ability to infect, replicate, kill, and spread — often undetected — make them exceptionally dangerous.

    Even without AI, advancing biotechnology poses extreme risks. It potentially provides opportunities for state actors or terrorists to create mass-casualty events.

    Advances in AI have the potential to make biotechnology more dangerous.

    For example:

    1. Dual-use tools, like the automation of laboratory processes, could lower the barriers for rogue actors trying to manufacture a dangerous pandemic virus.37 The Collaborations Pharmaceuticals model is an example of a dual-use tool (although it’s not particularly dangerous).

    2. AI-based biological design tools could enable sophisticated actors to reprogram the genomes of dangerous pathogens to specifically enhance their lethality, transmissibility, and immune evasion.38

    If AI is able to advance the rate of scientific and technological progress, these risks may be amplified and accelerated — making dangerous technology more widely available or increasing its possible destructive power.39

    In the 2023 survey of AI experts, 73% of respondents said they had either “extreme” or “substantial” concern that in the future Al will let “dangerous groups make powerful tools (e.g. engineered viruses).”40

    Intentionally dangerous AI agents

    Most of this article discusses the risk of power-seeking AI systems that arise unintentionally due to misalignment.

    But we can’t rule out the possibility that some people might intentionally create rogue AI agents that seek to disempower humanity. It might seem hard to imagine, but extremist ideologies of many forms have inspired humans to carry out radically violent and even self-destructive plans.41

    Cyberweapons

    AI can already be used in cyberattacks, such as phishing, and more powerful AI may cause greater information security challenges (though it could also be useful in cyberdefense).

    On its own, AI-enabled cyberwarfare is unlikely to pose an existential threat to humanity. Even the most damaging and costly societal-scale cyberattacks wouldn’t approach an extinction-level event.

    But AI-enabled cyberattacks could provide access to other dangerous technology, such as bioweapons, nuclear arsenals, or autonomous weapons. So there may be genuine existential risks posed by AI-related cyberweapons, but they will most likely run through another existential risk.

    The cyber capabilities of AI systems are also relevant to how a power-seeking AI could actually take power.

    Other dangerous tech

    If AI systems generally accelerate the rate of scientific and technological progress, we think it’s reasonably likely that we’ll invent new dangerous technologies.

    For example, atomically precise manufacturing, sometimes called nanotechnology, has been hypothesised as an existential threat — and it’s a scientifically plausible technology that AI could help us invent far sooner than we would otherwise.

    In The Precipice, Toby Ord estimated the chances of an existential catastrophe by 2120 from “unforeseen anthropogenic risks” at 1 in 30. This estimate suggests there could be other discoveries, perhaps involving yet to be understood physics, that could enable the creation of technologies with catastrophic consequences.42

    AI could empower totalitarian governments

    An AI-enabled authoritarian government could completely automate the monitoring and repression of its citizens, as well as significantly influence the information people see, perhaps making it impossible to coordinate action against such a regime.

    AI is already facilitating the ability of governments to monitor their own citizens.

    The NSA is using AI to help filter the huge amounts of data they collect, significantly speeding up their ability to identify and predict the actions of people they are monitoring. In China, AI is increasingly being used for facial recognition and predictive policing, including automated racial profiling and automatic alarms when people classified as potential threats enter certain public places.

    These sorts of surveillance technologies seem likely to significantly improve — thereby increasing governments’ abilities to control their populations.

    At some point, authoritarian governments could extensively use AI-related technology to:

    • Monitor and track dissidents
    • Preemptively suppress opposition to the ruling party
    • Control the military and dominate external actors
    • Manipulate information flows and carefully shape public opinion

    Again, in the 2023 survey of AI experts, 73% of respondents expressed “extreme” or “substantial” concern that in the future authoritarian rulers could “use Al to control their population.”40

    If a regime achieved a form of truly stable totalitarianism, it could make people’s lives much worse for a long time into the future, making it a particularly scary possible scenario resulting from AI. (Read more in our article on risks of stable totalitarianism.)

    AI could worsen war

    We’re concerned that great power conflict could also pose a substantial threat to our world, and advances in AI seem likely to change the nature of war — through lethal autonomous weapons43 or through automated decision making.44

    In some cases, great power war could pose an existential threat — for example, if the conflict is nuclear. Some argue that lethal autonomous weapons, if sufficiently powerful and mass-produced, could themselves constitute a new form of weapon of mass destruction.

    And if a single actor produces particularly powerful AI systems, this could be seen as giving them a decisive strategic advantage. Such an outcome, or even the expectation of such an outcome, could be highly destabilising.

    Imagine that the US was working to produce a planning AI that’s intelligent enough to ensure that Russia or China could never successfully launch another nuclear weapon. This could incentivise a first strike from the actor’s rivals before these AI-developed plans can ever be put into action.

    This is because nuclear deterrence can benefit from symmetry between the abilities of nuclear powers, in that the threat of a nuclear response to a first strike is believable and therefore a deterrent to a first strike. Advances in AI, which could be directly applied to nuclear forces, could create asymmetries in the capabilities of nuclear-armed nations. This could include improving early warning systems, air defence systems, and cyberattacks that disable weapons.

    For example, many countries use submarine-launched ballistic missiles as part of their nuclear deterrence systems — the idea is that if nuclear weapons can be hidden under the ocean, they will never be destroyed in the first strike. This means that they can always be used for a counterattack, and therefore act as an effective deterrent against first strikes. But AI could make it far easier to detect submarines underwater, enabling their destruction in a first strike — removing this deterrent.

    Many other destabilising scenarios are likely possible.

    A report from the Stockholm International Peace Research Institute found that, while AI could potentially also have stabilising effects (for example by making everyone feel more vulnerable, decreasing the chances of escalation), destabilising effects could arise even before advances in AI are actually deployed. This is because one state’s belief that their opponents have new nuclear capabilities can be enough to disrupt the delicate balance of deterrence.

    Luckily, there are also plausible ways in which AI could help prevent the use of nuclear weapons — for example, by improving the ability of states to detect nuclear launches, which would reduce the chances of false alarms like those that nearly caused nuclear war in 1983.

    Overall, we’re uncertain about whether AI will substantially increase the risk of nuclear or conventional conflict in the short term — it could even end up decreasing the risk. But we think it’s important to pay attention to possible catastrophic outcomes and take reasonable steps to reduce their likelihood.

    Other risks from AI

    We’re also concerned about the following issues:

    • Existential threats that result not from the power-seeking behaviour of AI systems, but from the interaction between AI systems. (In order to pose a risk, these systems would still need to be, to some extent, misaligned.)
    • Other ways we haven’t thought of that AI systems could be misused — especially ones that might significantly affect future generations.
    • Other moral mistakes made in the design and use of AI systems, particularly if future AI systems are themselves deserving of moral consideration. For example, we might (inadvertently) create sentient AI systems, which could then suffer in huge numbers. We think this could be extremely important, so we’ve written about it in a separate problem profile.

    This is a really difficult question to answer.

    There are no past examples we can use to determine the frequency of AI-related catastrophes.

    All we have to go off are arguments (like the ones we’ve given above), and less relevant data like the history of technological advances. And we’re definitely not certain that the arguments we’ve presented are completely correct.

    Consider the argument we gave earlier about the dangers of power-seeking AI in particular, based off Carlsmith’s report. At the end of his report, Carlsmith gives some rough guesses of the chances that each stage of his argument is correct (conditional on the previous stage being correct):

    1. By 2070 it will be possible and financially feasible to build strategically aware systems that can outperform humans on many power-granting tasks, and that can successfully make and carry out plans: Carlsmith guesses there’s a 65% chance of this being true.
    2. Given this feasibility, there will be strong incentives to build such systems: 80%.
    3. Given both the feasibility and incentives to build such systems, it will be much harder to develop aligned systems that don’t seek power than to develop misaligned systems that do, but which are at least superficially attractive to deploy: 40%.
    4. Given all of this, some deployed systems will seek power in a misaligned way that causes over $1 trillion (in 2021 dollars) of damage: 65%.
    5. Given all the previous premises, misaligned power-seeking AI systems will end up disempowering basically all of humanity: 40%.
    6. Given all the previous premises, this disempowerment will constitute an existential catastrophe: 95%.

    Multiplying these numbers together, Carlsmith estimated that there’s a 5% chance that his argument is right and there will be an existential catastrophe from misaligned power-seeking AI by 2070. When we spoke to Carlsmith, he noted that in the year between the writing of his report and the publication of this article, his overall guess at the chance of an existential catastrophe from power-seeking AI by 2070 had increased to >10%.45

    The overall probability of existential catastrophe from AI would, in Carlsmith’s view, be higher than this, because there are other routes to possible catastrophe — like those discussed in the previous section — although our guess is that these other routes are probably a lot less likely to lead to existential catastrophe.

    For another estimate, in The Precipice, philosopher and advisor to 80,000 Hours Toby Ord estimated a 1-in-6 risk of existential catastrophe by 2120 (from any cause), and that 60% of this risk comes from misaligned AI — giving a total of a 10% risk of existential catastrophe from misaligned AI by 2120.

    A 2021 survey of 44 researchers working on reducing existential risks from AI found the median risk estimate was 32.5% — the highest answer given was 98%, and the lowest was 2%.46 There’s obviously a lot of selection bias here: people choose to work on reducing risks from AI because they think this is unusually important, so we should expect estimates from this survey to be substantially higher than estimates from other sources. But there’s clearly significant uncertainty about how big this risk is, and huge variation in answers.

    All these numbers are shockingly, disturbingly high. We’re far from certain that all the arguments are correct. But these are generally the highest guesses for the level of existential risk of any of the issues we’ve examined (like engineered pandemics, great power conflict, climate change, or nuclear war).

    That said, I think there are reasons why it’s harder to make guesses about the risks from AI than other risks – and possibly reasons to think that the estimates we’ve quoted above are systematically too high.

    If I was forced to put a number on it, I’d say something like 1%. This number includes considerations both in favour and against the argument. I’m less worried than other 80,000 Hours staff — our position as an organisation is that the risk is between 3% and 50%.

    All this said, the arguments for such high estimates of the existential risk posed by AI are persuasive — making risks from AI a top contender for the most pressing problem facing humanity.

    5. We can tackle these risks

    We think one of the most important things you can do would be to help reduce the gravest risks that AI poses.

    This isn’t just because we think these risks are high — it’s also because we think there are real things we can do to reduce these risks.

    We know of two main ways people work to reduce these risks:

    1. Technical AI safety research
    2. AI governance and policy work

    There are lots of ways to contribute to this work. In this section, we discuss many broad approaches within both categories to illustrate the point that there are things we can do to address these risks. Below, we discuss the kinds of careers you can pursue to work on these kinds of approaches.

    Technical AI safety research

    The benefits of transformative AI could be huge, and there are many different actors involved (operating in different countries), which means it will likely be really hard to prevent its development altogether.

    (It’s also possible that it wouldn’t even be a good idea if we could — after all, that would mean forgoing the benefits as well as preventing the risks.)

    As a result, we think it makes more sense to focus on making sure that this development is safe — meaning that it has a high probability of avoiding all the catastrophic failures listed above.

    One way to do this is to try to develop technical solutions to prevent the kind of power-seeking behaviour we discussed earlier — this is generally known as working on technical AI safety, sometimes called just “AI safety” for short.

    We discuss this path in more detail here:

    Career review of technical AI safety research

    Approaches

    There are lots of approaches to technical AI safety, including:

    See Neel Nanda’s overview of the AI alignment landscape for more details.

    Read more about technical AI safety research below.

    AI governance and policy

    Reducing the gravest risks from AI will require sound high-level decision making and policy, both at AI companies themselves and in governments.

    As AI has advanced and drawn increasing interest from customers and investors, governments have shown an interest in regulating the technology. Some have already taken significant steps to play a role in managing the development of AI, including:

    • The US and the UK have each established their own national AI Safety Institutes.
    • The European Union has passed the EU AI Act, which contains specific provisions for governing general-purpose AI models that pose systemic risks.
    • The UK and then South Korea have hosted the first two AI Safety Summits (in 2023 and 2024), a series of high-profile summits aiming to coordinate between different countries, academics, researchers, and civil society leaders.
    • China has implemented regulations targeting recommendation algorithms, synthetic AI content, generative AI models, and facial recognition technology.
    • The US instituted export controls to reduce China’s access to the most cutting-edge chips used in AI development.

    Much more will need to be done to reduce the biggest risks — including continuous evaluation of the AI governance landscape to assess overall progress.

    We discuss this career path in more detail here:

    Career review of AI strategy and policy careers

    Approaches

    People working in AI policy have proposed a range of approaches to reducing risk as AI systems get more powerful.

    We don’t necessarily endorse all the ideas below, but what follows is a list of some prominent policy approaches that could be aimed at reducing the largest dangers from AI:48

    • Responsible scaling policies: some major AI companies have already begun developing internal frameworks for assessing safety as they scale up the size and capabilities of their systems. These frameworks introduce safeguards that are intended to become increasingly stringent as AI systems become more potentially dangerous, and ensure that AI systems’ capabilities don’t outpace companies’ abilities to keep systems safe. Many argue that these internal policies are not sufficient for safety, but they may represent a promising step for reducing risk. You can see versions of such policies from Anthropic, Google DeepMind, and OpenAI.
    • Standards and evaluation: governments may also develop industry-wide benchmarks and testing protocols to assess whether AI systems pose major risks. The non-profit METR and the UK AI Safety Institute are among the organisations currently developing these evaluations to test AI models before and after they are released. This can include creating standardised metrics for an AI systems’s capabilities and potential to cause harm, as well as propensity for power-seeking or misalignment.
    • Safety cases: this practice involves requiring AI developers to provide comprehensive documentation demonstrating the safety and reliability of their systems before deployment. This approach is similar to safety cases used in other high-risk industries like aviation or nuclear power.49 You can see discussion of this idea in a paper from Clymer et al and in a post from Geoffrey Irving at the UK AI Safety Institute.
    • Information security standards: we can establish robust rules for protecting AI-related data, algorithms, and infrastructure from unauthorised access or manipulation — particularly the AI model weights. Rand released a detailed report analysing the security risks to major AI companies, particularly from state actors.
    • Liability law: existing law already imposes some liability on companies that create dangerous products or cause significant harm to the public, but its application to AI models and risk in particular is unclear. Clarifying how liability applies to companies that create dangerous AI models could incentivise them to take additional steps to reduce risk. Law professor Gabriel Weil has written about this idea.
    • Compute governance: governments may regulate access to and use of high-performance computing resources necessary for training large AI models. The US restrictions on exporting state-of-the-art chips to China is one example of such a policy, and others are possible. Companies could also be required to install hardware-level safety features directly into AI chips or processors. These could be used to track chips and verify they’re not in the possession of anyone who shouldn’t have them, or for other purposes. You can learn more about this topic in our interview with Lennart Heim and in this report from the Center for a New American Security.
    • International coordination: Fostering global cooperation on AI governance to ensure consistent standards. This could involve treaties, international organisations, or multilateral agreements on AI development and deployment. We discuss some related considerations in our article on China-related AI safety and governance paths.
    • Societal adaptation: it may be critically important to prepare society for the widespread integration of AI and the potential risks it poses. For example, we might need to develop new information security measures to protect crucial data in a world with AI-enabled hacking. Or we may want to implement strong controls to prevent handing over key societal decisions to AI systems.50
    • Pausing scaling if appropriate: some argue that we should currently pause all scaling of larger AI models because of the dangers the technology poses. We have featured some discussion of this idea on our podcast, and it seems hard to know when or if this would be a good idea. If carried out, it could involve industry-wide agreements or regulatory mandates to pause scaling efforts when necessary.

    The details, benefits, and downsides of many of these ideas have yet to be fully worked out, so it’s crucial that we do more research. And this list isn’t comprehensive — there are likely other important policy interventions and governance strategies worth pursuing.

    We also need more forecasting research into what we should expect to happen with AI, such as the work done at Epoch AI.

    6. This work is neglected

    In 2022, we estimated there were around 400 people around the world working directly on reducing the chances of an AI-related existential catastrophe (with a 90% confidence interval ranging between 200 and 1,000). Of these, about three quarters worked on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy.51 We also estimated that there were around 800 people working in complementary roles, but we’re highly uncertain about this figure.52

    In The Precipice, Ord estimated that there was between $10 million and $50 million spent on reducing AI risk in 2020.

    That might sound like a lot of money, but we’re spending something like 1,000 times that amount10 on speeding up the development of transformative AI via commercial capabilities research and engineering at large AI companies.

    To compare the $50 million spent on AI safety in 2020 to other well-known risks, we’re currently spending several hundreds of billions per year on tackling climate change.

    Because this field is so neglected and has such high stakes, we think your impact working on risks from AI could be much higher than working on many other areas — which is why our top two recommended career paths for making a big positive difference in the world are technical AI safety and AI policy research and implementation.

    What do we think are the best arguments against this problem being pressing?

    As we said above, we’re not totally sure the arguments we’ve presented for AI representing an existential threat are right. Though we do still think that the chance of catastrophe from AI is high enough to warrant many more people pursuing careers to try to prevent such an outcome, we also want to be honest about the arguments against doing so, so you can more easily make your own call on the question.

    Here we’ll cover the strongest reasons (in our opinion) to think this problem isn’t particularly pressing. In the next section we’ll cover some common objections that (in our opinion) hold up less well, and explain why.

    The longer we have before transformative AI is developed, the less pressing it is to work now on ways to ensure that it goes well. This is because the work of others in the future could be much better or more relevant than the work we are able to do now.

    Also, if it takes us a long time to create transformative AI, we have more time to figure out how to make it safe. The risk seems much higher if AI developers will create transformative AI in the next few decades.

    It seems plausible that the first transformative AI won’t be based on current deep learning methods. (AI Impacts have documented arguments that current methods won’t be able to produce AI that has human-level intelligence.) This could mean that some of our current research might not end up being useful (and also — depending on what method ends up being used — could make the arguments for risk less worrying).

    Relatedly, we might expect that progress in the development of AI will occur in bursts. Previously, the field has seen AI winters, periods of time with significantly reduced investment, interest and research in AI. It’s unclear how likely it is that we’ll see another AI winter — but this possibility should lengthen our guesses about how long it’ll be before we’ve developed transformative AI. Cotra writes about the possibility of an AI winter in part four of her report forecasting transformative AI. New constraints on the rate of growth of AI capabilities, like the availability of training data, could also mean that there’s more time to work on this (Cotra discusses this here.)

    Thirdly, the estimates about when we’ll get transformative AI from Cotra, Kanfosky and Davidson that we looked at earlier were produced by people who already expected that working on preventing an AI-related catastrophe might be one of the world’s most pressing problems. As a result, there’s selection bias here: people who think transformative AI is coming relatively soon are also the people incentivised to carry out detailed investigations. (That said, if the investigations themselves seem strong, this effect could be pretty small.)

    Finally, none of the estimates we discussed earlier were trying to predict when an existential catastrophe might occur. Instead, they were looking at when AI systems might be able to automate all tasks humans can do, or when AI systems might significantly transform the economy. It’s by no means certain that the kinds of AI systems that could transform the economy would be the same advanced planning systems that are core to the argument that AI systems might seek power. Advanced planning systems do seem to be particularly useful, so there is at least some reason to think these might be the sorts of systems that end up being built. But even if the forecasted transformative AI systems are advanced planning systems, it’s unclear how capable such systems would need to be to pose a threat — it’s more than plausible that systems would need to be far more capable to pose a substantial existential threat than they would need to be to transform the economy. This would mean that all the estimates we considered above would be underestimates of how long we have to work on this problem.

    All that said, it might be extremely difficult to find technical solutions to prevent power-seeking behaviour — and if that’s the case, focusing on finding those solutions now does seem extremely valuable.

    Overall, we think that transformative AI is sufficiently likely in the next 10–80 years that it is well worth it (in expected value terms) to work on this issue now. Perhaps future generations will take care of it, and all the work we’d do now will be in vain — we hope so! But it might not be prudent to take that risk.

    If the best AI we have improves gradually over time (rather than AI capabilities remaining fairly low for a while and then suddenly increasing), we’re likely to end up with ‘warning shots’: we’ll notice forms of misaligned behaviour in fairly weak systems, and be able to correct for it before it’s too late.

    In such a gradual scenario, we’ll have a better idea about what form powerful AI might take (e.g. whether it will be built using current deep learning techniques, or something else entirely), which could significantly help with safety research. There will also be more focus on this issue by society as a whole, as the risks of AI become clearer.

    So if gradual development of AI seems more likely, the risk seems lower.

    But it’s very much not certain that AI development will be gradual, or if it is, gradual enough for the risk to be noticeably lower. And even if AI development is gradual, there could still be significant benefits to having plans and technical solutions in place well in advance. So overall we still think it’s extremely valuable to attempt to reduce the risk now.

    If you want to learn more, you can read AI Impacts’ work on arguments for and against discontinuous (i.e. non-gradual) progress in AI development, and Toby Ord and Owen Cotton-Barratt on strategic implications of slower AI development.

    Making something have goals aligned with human designers’ ultimate objectives and making something useful seem like very related problems. If so, perhaps the need to make AI useful will drive us to produce only aligned AI — in which case the alignment problem is likely to be solved by default.

    Ben Garfinkel gave a few examples of this on our podcast:

    • You can think of a thermostat as a very simple AI that attempts to keep a room at a certain temperature. The thermostat has a metal strip in it that expands as the room heats, and cuts off the current once a certain temperature has been reached. This piece of metal makes the thermostat act like it has a goal of keeping the room at a certain temperature, but also makes it capable of achieving this goal (and therefore of being actually useful).
    • Imagine you’re building a cleaning robot with reinforcement learning techniques — that is, you provide some specific condition under which you give the robot positive feedback. You might say something like, “The less dust in the house, the more positive the feedback.” But if you do this, the robot will end up doing things you don’t want — like ripping apart a cushion to find dust on the inside. Probably instead you need to use techniques like those being developed by people working on AI safety (things like watching a human clean a house and letting the AI figure things out from there). So people building AIs will be naturally incentivised to also try to make them aligned (and so in some sense safe), so they can do their jobs.

    If we need to solve the problem of alignment anyway to make useful AI systems, this significantly reduces the chances we will have misaligned but still superficially useful AI systems. So the incentive to deploy a misaligned AI would be a lot lower, reducing the risk to society.

    That said, there are still reasons to be concerned. For example, it seems like we could still be susceptible to problems of AI deception.

    And, as we’ve argued, AI alignment is only part of the overall issue. Solving the alignment problem isn’t the same thing as completely eliminating existential risk from AI, since aligned AI could also be used to bad ends — such as by authoritarian governments.

    As with many research projects in their early stages, we don’t know how hard the alignment problem — or other AI problems that pose risks — are to solve. Someone could believe there are major risks from machine intelligence, but be pessimistic about what additional research or policy work will accomplish, and so decide not to focus on it.

    This is definitely a reason to potentially work on another issue — the solvability of an issue is a key part of how we try to compare global problems. For example, we’re also very concerned about risks from pandemics, and it may be much easier to solve that issue.

    That said, we think that given the stakes, it could make sense for many people to work on reducing AI risk, even if you think the chance of success is low. You’d have to think that it was extremely difficult to reduce risks from AI in order to conclude that it’s better just to let the risks materialise and the chance of catastrophe play out.

    At least in our own case at 80,000 Hours, we want to keep trying to help with AI safety — for example, by writing profiles like this one — even if the chance of success seems low (though in fact we’re overall pretty optimistic).

    There are some reasons to think that the core argument that any advanced, strategically aware planning system will by default seek power (which we gave here) isn’t totally right.53

    1. For a start, the argument that advanced AI systems will seek power relies on the idea that systems will produce plans to achieve goals. We’re not quite sure what this means — and as a result, we’re not sure what properties are really required for power-seeking behaviour to occur and whether the things we’ll build will have those properties.

      We’d love to see a more in-depth analysis of what aspects of planning are economically incentivised, and whether those aspects seem like they’ll be enough for the argument for power-seeking behaviour to work.

      Grace has written more about the ambiguity around “how much goal-directedness is needed to bring about disaster.”

    2. It’s possible that only a few goals that AI systems could have would lead to misaligned power-seeking.

      Richard Ngo, in his analysis of what people mean by “goals”, points out that you’ll only get power-seeking behaviour if you have goals that mean the system can actually benefit from seeking power. Ngo suggests that these goals need to be “large-scale.” (Some have argued that, by default, we should expect AI systems to have “short-term” goals that won’t lead to power-seeking behaviour.)

      But whether an AI system would plan to take power depends on how easy it would be for the system to take power, because the easier it is for a system to take power, the more likely power-seeking plans would be successful — so a good planning system would be more likely to choose them. This suggests it will be easier to accidentally create a power-seeking AI system as systems’ capabilities increase.

      So there still seems to be cause for increased concern, because the capabilities of AI systems do seem to be increasing fast. There are two considerations here: if few goals really lead to power-seeking, even for quite capable AI systems, that significantly reduces the risk and thus the importance of the problem. But it might also increase the solvability of the problem by demonstrating that solutions could be easy to find (e.g. the solution could be never giving systems “large-scale” goals) — making this issue more valuable for people to work on.

    3. Earlier we argued that we can expect AI systems to do things that seem generally instrumentally useful to their overall goal, and that as a result it could be hard to prevent AI systems from doing these instrumentally useful things.

      But we can find examples where how generally instrumentally useful these things would be doesn’t seem to affect how hard it is to prevent them. Consider an autonomous car that can move around only if its engine is on. For many possible goals (other than, say, turning the car radio on), it seems like it would be useful for the car to be able to move around, so we should expect the car to turn its engine on. But despite that, we might still be able to train the car to keep its engine off: for example, we can give it some negative feedback whenever it turns the engine on, even if we also had given the car some other goals. Now imagine we improve the car so that its top speed is higher — this massively increases the number of possible action sequences that involve, as a first step, turning its engine on. In some sense, this seems to increase the instrumental usefulness of turning the engine on — there are more possible actions the car can take once its engine is on because the range of possible speeds it can travel at is higher. (It’s not clear if this sense of “instrumental usefulness” is the same as the one in the argument for the risk, although it does seem somewhat related.) But it doesn’t seem like this increase in the instrumental usefulness of turning on the engine makes it much harder to stop the car turning it on. Simple examples like this cast some doubt on the idea that just because a particular action is instrumentally useful, we won’t be able to find ways to prevent it. (For more on this example, see page 25 of Garfinkel’s review of Carlsmith’s report.)

    4. Humans are clearly highly intelligent, but it’s unclear whether we are perfect goal-optimisers. For example, humans often face some kind of existential angst over what their true goals are. And even if we accept humans as an example of a strategically aware agent capable of planning, humans certainly aren’t always power-seeking. We obviously care about having basics like food and shelter, and many people go to great lengths for more money, status, education, or even formal power. But some humans choose not to pursue these goals, and pursuing them doesn’t seem to correlate with intelligence.

      However, this doesn’t mean that the argument that there will be an incentive to seek power is wrong. Most people do face and act on incentives to gain forms of influence via wealth, status, promotions, and so on. And we can explain the observation that humans don’t usually seek huge amounts of power by observing that we aren’t usually in circumstances that make the effort worth it.

      For example, most people don’t try to start billion-dollar companies — you probably won’t succeed, and it’ll cost you a lot of time and effort.

      But you’d still walk across the street to pick up a billion-dollar cheque.

    The absence of extreme power-seeking in many humans, along with uncertainties in what it really means to plan to achieve goals, does suggest that the argument we gave that advanced AI systems will seek power above might not be completely correct. And they also suggest that, if there really is a problem to solve here, in principle, alignment research into preventing power-seeking in AIs could succeed.

    This is good news! But for the moment — short of hoping we’re wrong about the existence of the problem — we don’t actually know how to prevent this power-seeking behaviour.

    Arguments against working on AI risk to which we think there are strong responses

    We’ve just discussed the major objections to working on AI risk that we think are most persuasive. In this section, we’ll look at objections that we think are less persuasive, and give some reasons why.

    People have been saying since the 1950s that artificial intelligence smarter than humans is just around the corner.

    But it hasn’t happened yet.

    One reason for this could be that it’ll never happen. Some have argued that producing artificial general intelligence is fundamentally impossible. Others think it’s possible, but unlikely to actually happen, especially not with current deep learning methods.

    Overall, we think the existence of human intelligence shows it’s possible in principle to create artificial intelligence. And the speed of current advances isn’t something we think would have been predicted by those who thought that we’ll never develop powerful, general AI.

    But most importantly, the idea that you need fully general intelligent AI systems for there to be a substantial existential risk is a common misconception.

    The argument we gave earlier relied on AI systems being as good or better than humans in a subset of areas: planning, strategic awareness, and areas related to seeking and keeping power. So as long as you think all these things are possible, the risk remains.

    And even if no single AI has all of these properties, there are still ways in which we might end up with systems of ‘narrow’ AI systems that, together, can disempower humanity. For example, we might have a planning AI that develops plans for a company, a separate AI system that measures things about the company, another AI system that attempts to evaluate plans from the first AI by predicting how much profit each will make, and further AI systems that carry out those plans (for example, by automating the building and operation of factories). Considered together, this system as a whole has the capability to form and carry out plans to achieve some goal, and potentially also has advanced capabilities in areas that help it seek power.

    It does seem like it will be easier to prevent these ‘narrow’ AI systems from seeking power. This could happen if the skills the AIs have, even when combined, don’t add up to being able to plan to achieve goals, or if the narrowness reduces the risk of systems developing power-seeking plans (e.g. if you build systems that can only produce very short-term plans). It also seems like it gives another point of weakness for humans to intervene if necessary: the coordination of the different systems.

    Nevertheless, the risk remains, even from systems of many interacting AIs.

    It might just be really, really hard.

    Stopping people and computers from running software is already incredibly difficult.

    Think about how hard it would be to shut down Google’s web services. Google’s data centres have millions of servers over 34 different locations, many of which are running the same sets of code. And these data centres are absolutely crucial to Google’s bottom line, so even if Google could decide to shut down their entire business, they probably wouldn’t.

    Or think about how hard it is to get rid of computer viruses that autonomously spread between computers across the world.

    Ultimately, we think any dangerous power-seeking AI system will be looking for ways to not be turned off, which makes it more likely we’ll be in one of these situations, rather than in a case where we can just unplug a single machine.

    That said, we absolutely should try to shape the future of AI such that we can ‘unplug’ powerful AI systems.

    There may be ways we can develop systems that let us turn them off. But for the moment, we’re not sure how to do that.

    Ensuring that we can turn off potentially dangerous AI systems could be a safety measure developed by technical AI safety research, or it could be the result of careful AI governance, such as planning coordinated efforts to stop autonomous software once it’s running.

    We could (and should!) definitely try.

    If we could successfully ‘sandbox’ an advanced AI — that is, contain it to a training environment with no access to the real world until we were very confident it wouldn’t do harm — that would help our efforts to mitigate AI risks tremendously.

    But there are a few things that might make this difficult.

    For a start, we might only need one failure — like one person to remove the sandbox, or one security vulnerability in the sandbox we hadn’t noticed — for the AI system to begin affecting the real world.

    Moreover, this solution doesn’t scale with the capabilities of the AI system. This is because:

    • More capable systems are more likely to be able to find vulnerabilities or other ways of leaving the sandbox (e.g. threatening or coercing humans).
    • Systems that are good at planning might attempt to deceive us into deploying them.

    So the more dangerous the AI system, the less likely sandboxing is to be possible. That’s the opposite of what we’d want from a good solution to the risk.

    For some definitions of “truly intelligent” — for example, if true intelligence includes a deep understanding of morality and a desire to be moral — this would probably be the case.

    But if that’s your definition of truly intelligent, then it’s not truly intelligent systems that pose a risk. As we argued earlier, it’s advanced systems that can plan and have strategic awareness that pose risks to humanity.

    With sufficiently advanced strategic awareness, an AI system’s excellent understanding of the world may well encompass an excellent understanding of people’s moral beliefs. But that’s not a strong reason to think that such a system would act morally.

    For example, when we learn about other cultures or moral systems, that doesn’t necessarily create a desire to follow their morality. A scholar of the Antebellum South might have a very good understanding of how 19th century slave owners justified themselves as moral, but would be very unlikely to defend slavery.

    AI systems with excellent understandings of human morality could be even more dangerous than AIs without such understanding: the AI system could act morally at first as a way to deceive us into thinking that it is safe.

    There are definitely dangers from current artificial intelligence.

    For example, data used to train neural networks often contains hidden biases. This means that AI systems can learn these biases — and this can lead to racist and sexist behaviour.

    There are other dangers too. Our earlier discussion on nuclear war explains a threat which doesn’t require AI systems to have particularly advanced capabilities.

    But we don’t think the fact that there are also risks from current systems is a reason not to prioritise reducing existential threats from AI, if they are sufficiently severe.

    As we’ve discussed, future systems — not necessarily superintelligence or totally general intelligence, but systems advanced in their planning and power-seeking capabilities — seem like they could pose threats to the existence of the entirety of humanity. And it also seems somewhat likely that we’ll produce such systems this century.

    What’s more, lots of technical AI safety research is also relevant to solving problems with existing AI systems. For example, some research focuses on ensuring that ML models do what we want them to, and will still do this as their size and capabilities increase; other research tries to work out how and why existing models are making the decisions and taking the actions that they do.

    As a result, at least in the case of technical research, the choice between working on current threats and future risks may look more like a choice between only ensuring that current models are safe, or instead finding ways to ensure that current models are safe that will also continue to work as AI systems become more complex and more intelligent.

    Ultimately, we have limited time in our careers, so choosing which problem to work on could be a huge way of increasing your impact. When there are such substantial threats, it seems reasonable for many people to focus on addressing these worst-case possibilities.

    Yes, it can.

    AI systems are already improving healthcare, putting driverless cars on the roads, and automating household chores.

    And if we’re able to automate advancements in science and technology, we could see truly incredible economic and scientific progress. AI could likely help solve many of the world’s most pressing problems.

    But, just because something can do a lot of good, that doesn’t mean it can’t also do a lot of harm. AI is an example of a dual-use technology — a technology that can be used for both dangerous and beneficial purposes. For example, researchers were able to get an AI model that was trained to develop medical drugs to instead generate designs for bioweapons.

    We are excited and hopeful about seeing large benefits from AI. But we also want to work hard to minimise the enormous risks advanced AI systems pose.

    It’s undoubtedly true that some people are drawn to thinking about AI safety because they like computers and science fiction — as with any other issue, there are people working on it not because they think it’s important, but because they think it’s cool.

    But, for many people, working on AI safety comes with huge reluctance.

    For me, and many of us at 80,000 Hours, spending our limited time and resources working on any cause that affects the long-run future — and therefore not spending that time on the terrible problems in the world today — is an incredibly emotionally difficult thing to do.

    But we’ve gradually investigated these arguments (in the course of trying to figure out how we can do the most good), and over time both gained more expertise about AI and became more concerned about the risk.

    We think scepticism is healthy, and are far from certain that these arguments completely work. So while this suspicion is definitely a reason to dig a little deeper, we hope that, ultimately, this worry won’t be treated as a reason to deprioritise what may well be the most important problem of our time.

    That something sounds like science fiction isn’t a reason in itself to dismiss it outright. There are loads of examples of things first mentioned in sci-fi that then went on to actually happen (this list of inventions in science fiction contains plenty of examples).

    There are even a few such cases involving technology that are real existential threats today:

    • In his 1914 novel The World Set Free, H. G. Wells predicted atomic energy fueling powerful explosives — 20 years before we realised there could in theory be nuclear fission chain reactions, and 30 years before nuclear weapons were actually produced. In the 1920s and 1930s, Nobel Prize–winning physicists Millikan, Rutherford, and Einstein all predicted that we would never be able to use nuclear power. Nuclear weapons were literal science fiction before they were reality.
    • In the 1964 film Dr. Strangelove, the USSR builds a doomsday machine that would automatically trigger an extinction-level nuclear event in response to a nuclear strike, but keeps it secret. Dr Strangelove points out that keeping it secret rather reduces its deterrence effect. But we now know that in the 1980s the USSR built an extremely similar system… and kept it secret.

    Moreover, there are top academics and researchers working on preventing these risks from AI — at MIT, Cambridge, Oxford, UC Berkeley, and elsewhere. Two of the world’s top AI companies (DeepMind and OpenAI) have teams explicitly dedicated to working on technical AI safety. Researchers from these places helped us with this article.

    It’s totally possible all these people are wrong to be worried, but the fact that so many people take this threat seriously undermines the idea that this is merely science fiction.

    It’s reasonable when you hear something that sounds like science fiction to want to investigate it thoroughly before acting on it. But having investigated it, if the arguments seem solid, then simply sounding like science fiction is not a reason to dismiss them.

    We never know for sure what’s going to happen in the future. So, unfortunately for us, if we’re trying to have a positive impact on the world, that means we’re always having to deal with at least some degree of uncertainty.

    We also think there’s an important distinction between guaranteeing that you’ve achieved some amount of good and doing the very best you can. To achieve the former, you can’t take any risks at all — and that could mean missing out on the best opportunities to do good.

    When you’re dealing with uncertainty, it makes sense to roughly think about the expected value of your actions: the sum of all the good and bad potential consequences of your actions, weighted by their probability.

    Given the stakes are so high, and the risks from AI aren’t that low, this makes the expected value of helping with this problem high.

    We’re sympathetic to the concern that if you work on AI safety, you might end up doing not much at all when you might have done a tremendous amount of good working on something else — simply because the problem and our current ideas about what to do about it are so uncertain.

    But we think the world will be better off if we decide that some of us should work on solving this problem, so that together we have the best chance of successfully navigating the transition to a world with advanced AI rather than risking an existential crisis.

    And it seems like an immensely valuable thing to try.

    Pascal’s mugging is a thought experiment — a riff on the famous Pascal’s wager — where someone making decisions using expected value calculations can be exploited by claims that they can get something extraordinarily good (or avoid something extraordinarily bad), with an extremely low probability of succeeding.

    The story goes like this: a random mugger stops you on the street and says, “Give me your wallet or I’ll cast a spell of torture on you and everyone who has ever lived.” You can’t rule out with 100% probability that he won’t — after all, nothing’s 100% for sure. And torturing everyone who’s ever lived is so bad that surely even avoiding a tiny, tiny probability of that is worth the $40 in your wallet? But intuitively, it seems like you shouldn’t give your wallet to someone just because they threaten you with something completely implausible.

    Analogously, you could worry that working on AI safety means giving your valuable time to avoid a tiny, tiny chance of catastrophe. Working on reducing risks from AI isn’t free — the opportunity cost is quite substantial, as it means you forgo working on other extremely important things, like reducing risks from pandemics or ending factory farming.

    Here’s the thing though: while there’s lots of value at stake — perhaps the lives of everybody alive today, and the entirety of the future of humanity — it’s not the case that the probability that you can make a difference by working on reducing risks from AI is small enough for this argument to apply.

    We wish the chance of an AI catastrophe was that vanishingly small.

    Instead, we think the probability of such a catastrophe (I think, around 1% this century) is much, much larger than things that people try to prevent all the time — such as fatal plane crashes, which happen in 0.00002% of flights.

    What really matters, though, is the extent to which your work can reduce the chance of a catastrophe.

    Let’s look at working on reducing risks from AI. For example, if:

    1. There’s a 1% chance of an AI-related existential catastrophe by 2100
    2. There’s a 30% chance that we can find a way to prevent this by technical research
    3. Five people working on technical AI safety raises the chances of solving the problem by 1% of that 30% (so 0.3 percentage points)

    Then each person involved has a 0.00006 percentage point share in preventing this catastrophe.

    Other ways of acting altruistically involve similarly sized probabilities.

    The chances of a volunteer campaigner swinging a US presidential election is somewhere between 0.001% and 0.00001%. But you can still justify working on a campaign because of the large impact you expect you’d have on the world if your preferred candidate won.

    You have even lower chances of wild success from things like trying to reform political institutions, or working on some very fundamental science research to build knowledge that might one day help cure cancer.

    Overall, as a society, we may be able to reduce the chance of an AI-related catastrophe all the way down from 10% (or higher) to close to zero — that’d be clearly worth it for a group of people, so it has to be worth it for the individuals, too.

    We wouldn’t want to just not do fundamental science because each researcher has a low chance of making the next big discovery, or not do any peacekeeping because any one person has a low chance of preventing World War III. As a society, we need some people working on these big issues — and maybe you can be one of them.

    What you can do concretely to help

    As we mentioned above, we know of two main ways to help reduce existential risks from AI:

    1. Technical AI safety research
    2. AI governance and policy work

    The biggest way you could help would be to pursue a career in either one of these areas, or in a supporting area.

    The first step is learning a lot more about the technologies, problems, and possible solutions. We’ve collated some lists of our favourite resources here, and our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.

    Technical AI safety

    If you’re interested in a career in technical AI safety, the best place to start is our career review of being an AI safety researcher.

    If you want to learn more about technical AI safety as a field of research — e.g. the different techniques, schools of thought, and threat models — our top recommendation is to take a look at the technical alignment curriculum from AGI Safety Fundamentals.

    It’s important to note that you don’t have to be an academic or an expert in AI or AI safety to contribute to AI safety research. For example, software engineers are needed at many places conducting technical safety research, and we also highlight more roles below.

    You can see a list of key organisations where you might do this kind of work in the full career review.

    AI governance and policy work

    If you’re interested in a career in AI governance and policy, the best place to start is our AI governance and policy career review.

    You don’t need to be a bureaucrat in a grey suit to have a career in AI governance and policy — there are roles suitable for a wide range of skill sets. In particular, people with technical skills in machine learning and related fields are needed for governance work (although those skills are certainly not necessary).

    We split this career path into six different kinds of roles:

    1. Government roles
    2. Research
    3. Industry work
    4. Advocacy and lobbying
    5. Third-party auditing and evaluation
    6. International work and coordination

    We also have specific articles on working in US AI policy and China-related AI safety and governance paths.

    And you can learn more about where specifically you might work in this career path in our career review.

    If you’re new to the topic and interested in learning more broadly about AI governance, our top recommendation is to take a look at the governance curriculum from AGI safety fundamentals.

    Complementary (yet crucial) roles

    Even in a research organisation, around half of the staff will be doing other tasks essential for the organisation to perform at its best and have an impact. Having high-performing people in these roles is crucial.

    We think the importance of these roles is often underrated because the work is less visible. So we’ve written several career reviews on these areas to help more people enter these careers and succeed, including:

    Other ways to help

    AI safety is a big problem and it needs help from people doing a lot of different kinds of work.

    One major way to help is to work in a role that directs funding or people towards AI risk, rather than working on the problem directly. We’ve reviewed a few career paths along these lines, including:

    There are ways all of these could go wrong, so the first step is to become well-informed about the issue.

    There are also other technical roles besides safety research that could help contribute, like:

    • Working in information security to protect AI (or the results of key experiments) from misuse, theft, or tampering.
    • Becoming an expert in AI hardware as a way of steering AI progress in safer directions.

    You can read about all these careers — why we think they’re helpful, how to enter them, and how you can predict whether they’re a good fit for you — on our career reviews page.

    Want one-on-one advice on pursuing this path?

    We think that the risks posed by the development of AI may be the most pressing problem the world currently faces. If you think you might be a good fit for any of the above career paths that contribute to solving this problem, we’d be especially excited to advise you on next steps, one-on-one.

    We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.

    APPLY TO SPEAK WITH OUR TEAM

    Find vacancies on our job board

    Our job board features opportunities in AI technical safety and governance:

      View all opportunities

      Top resources to learn more

      We've hit you with a lot of further reading throughout this article — here are a few of our favourites:

      On The 80,000 Hours Podcast, we have a number of in-depth interviews with people actively working to positively shape the development of artificial intelligence:

      If you want to go into much more depth, the AGI safety fundamentals course is a good starting point. There are two tracks to choose from: technical alignment or AI governance. If you have a more technical background, you could try Intro to ML Safety, a course from the Center for AI Safety.

      And finally, here are a few general sources (rather than specific articles) that you might want to explore:

      • The AI Alignment Forum, which is aimed at researchers working in technical AI safety.
      • AI Impacts, a project that aims to improve society's understanding of the likely impacts of human-level artificial intelligence.
      • The Alignment Newsletter, a weekly publication with recent content relevant to AI alignment with thousands of subscribers.
      • Import AI, a weekly newsletter about artificial intelligence by Jack Clark (cofounder of Anthropic), read by more than 10,000 experts.
      • Jeff Ding's ChinAI Newsletter, weekly translations of writings from Chinese thinkers on China's AI landscape.

      Read next:  Explore other pressing world problems

      Want to learn more about global issues we think are especially pressing? See our list of issues that are large in scale, solvable, and neglected, according to our research.

      Questions or feedback about this article? Email us

      Acknowledgements

      Huge thanks to Joel Becker, Tamay Besiroglu, Jungwon Byun, Joseph Carlsmith, Jesse Clifton, Emery Cooper, Ajeya Cotra, Andrew Critch, Anthony DiGiovanni, Noemi Dreksler, Ben Edelman, Lukas Finnveden, Emily Frizell, Ben Garfinkel, Katja Grace, Lewis Hammond, Jacob Hilton, Samuel Hilton, Michelle Hutchinson, Caroline Jeanmaire, Kuhan Jeyapragasan, Arden Koehler, Daniel Kokotajlo, Victoria Krakovna, Alex Lawsen, Howie Lempel, Eli Lifland, Katy Moore, Luke Muehlhauser, Neel Nanda, Linh Chi Nguyen, Luisa Rodriguez, Caspar Oesterheld, Ethan Perez, Charlie Rogers-Smith, Jack Ryan, Rohin Shah, Buck Shlegeris, Marlene Staib, Andreas Stuhlmüller, Luke Stebbing, Nate Thomas, Benjamin Todd, Stefan Torges, Michael Townsend, Chris van Merwijk, Hjalmar Wijk, and Mark Xu for either reviewing this article or their extremely thoughtful and helpful comments and conversations. (This isn’t to say that they would all agree with everything we’ve said here — in fact, we’ve had many spirited disagreements in the comments on this article!)

      The post Preventing an AI-related catastrophe appeared first on 80,000 Hours.

      ]]>
      Shrinking AGI timelines: a review of expert forecasts https://80000hours.org/2025/03/when-do-experts-expect-agi-to-arrive/ Fri, 21 Mar 2025 08:05:29 +0000 https://80000hours.org/?p=89371 The post Shrinking AGI timelines: a review of expert forecasts appeared first on 80,000 Hours.

      ]]>
      As a non-expert, it would be great if there were experts who could tell us when we should expect artificial general intelligence (AGI) to arrive.

      Unfortunately, there aren’t.

      There are only different groups of experts with different weaknesses.

      This article is an overview of what five different types of experts say about when we’ll reach AGI, and what we can learn from them (that feeds into my full article on forecasting AI).

      In short:

      • Every group shortened their estimates in recent years.
      • AGI before 2030 seems within the range of expert opinion, even if many disagree.
      • None of the forecasts seem especially reliable, so they neither rule in nor rule out AGI arriving soon.
      Graph of forecasts of years to AGI
      In four years, the mean estimate on Metaculus for when AGI will be developed has plummeted from 50 years to five years. There are problems with the definition used, but the graph reflects a broader pattern of declining estimates.

      Here’s an overview of the five groups:

      AI experts

      1. Leaders of AI companies

      The leaders of AI companies are saying that AGI arrives in 2–5 years, and appear to have recently shortened their estimates.

      This is easy to dismiss. This group is obviously selected to be bullish on AI and wants to hype their own work and raise funding.

      However, I don’t think their views should be totally discounted. They’re the people with the most visibility into the capabilities of next-generation systems, and the most knowledge of the technology.

      And they’ve also been among the most right about recent progress, even if they’ve been too optimistic.

      Most likely, progress will be slower than they expect, but maybe only by a few years.

      2. AI researchers in general

      One way to reduce selection effects is to look at a wider group of AI researchers than those working on AGI directly, including in academia. This is what Katja Grace did with a survey of thousands of recent AI publication authors.

      The survey asked for forecasts of “high-level machine intelligence,” defined as when AI can accomplish every task better or more cheaply than humans. The median estimate was a 25% chance in the early 2030s and 50% by 2047 — with some giving answers in the next few years and others hundreds of years in the future.

      The median estimate of the chance of an AI being able to do the job of an AI researcher by 2033 was 5%.1

      They were also asked about when they expected AI could perform a list of specific tasks (2023 survey results in red, 2022 results in blue).

      Forecasts of AGI
      When different tasks will be automated according to thousands of published AI scientists. Median estimates from 2023 shown in red, and estimates from 2022 shown in blue. Grace, Katja, et al. “Thousands of AI Authors on the Future of AI.” ArXiv.org, 5 Jan. 2024, arxiv.org/abs/2401.02843.

      Historically their estimates have been too pessimistic.

      In 2022, they thought AI wouldn’t be able to write simple Python code until around 2027.

      In 2023, they reduced that to 2025, but AI could maybe already meet that condition in 2023 (and definitely by 2024).

      Most of their other estimates declined significantly between 2023 and 2022.

      The median estimate for achieving ‘high-level machine intelligence’ shortened by 13 years.

      This shows these experts were just as surprised as everyone else at the success of ChatGPT and LLMs. (Today, even many sceptics concede AGI could be here within 20 years, around when today’s college students will be turning 40.)

      Finally, they were asked about when we should expect to be able to “automate all occupations,” and they responded with much longer estimates (e.g. 20% chance by 2079).

      It’s not clear to me why ‘all occupations’ should be so much further in the future than ‘all tasks’ — occupations are just bundles of tasks. (In addition, the researchers think once we reach ‘all tasks,’ there’s about a 50% chance of an intelligence explosion.)

      Perhaps respondents envision a world where AI is better than humans at every task, but humans continue to work in a limited range of jobs (like priests).2 Perhaps they are just not thinking about the questions carefully.

      Finally, forecasting AI progress requires a different skill set than conducting AI research. You can publish AI papers by being a specialist in a certain type of algorithm, but that doesn’t mean you’ll be good at thinking about broad trends across the whole field, or well calibrated in your judgements.

      For all these reasons, I’m sceptical about their specific numbers.

      My main takeaway is that, as of 2023, a significant fraction of researchers in the field believed that something like AGI is a realistic near-term possibility, even if many remain sceptical.

      If 30% of experts say your airplane is going to explode, and 70% say it won’t, you shouldn’t conclude ‘there’s no expert consensus, so I won’t do anything.’

      The reasonable course of action is to act as if there’s a significant explosion risk. Confidence that it won’t happen seems difficult to justify.

      Expert forecasters

      3. Metaculus

      Instead of seeking AI expertise, we could consider forecasting expertise.

      Metaculus aggregates hundreds of forecasts, which collectively have proven effective at predicting near-term political and economic events.

      It has a forecast about AGI with over 1000 responses. AGI is defined with four conditions (detailed on the site).

      As of December 2024, the forecasters average a 25% chance of AGI by 2027 and 50% by 2031.

      The forecast has dropped dramatically over time, from a median of 50 years away as recently as 2020.

      However, the definition used in this forecast is not great.

      First, it’s overly stringent, because it includes general robotic capabilities. Robotics is currently lagging, so satisfying this definition could be harder than having an AI that can do remote work jobs or help with scientific research.

      But the definition is also not stringent enough because it doesn’t include anything about long-horizon agency or the ability to have novel scientific insights.

      An AI model could easily satisfy this definition but not be able to do most remote work jobs or help to automate scientific research.

      Metaculus also seems to suffer from selection effects and their forecasts are seemingly drawn from people who are unusually into AI.

      4. Superforecasters in 2022 (XPT survey)

      Another survey asked 33 people who qualified as superforecasters of political events.

      Their median estimate was a 25% chance of AGI (using the same definition as Metaculus) by 2048 — much further away.

      However, these forecasts were made in 2022, before ChatGPT caused many people to shorten their estimates.

      The superforecasters also lack expertise in AI, and they made predictions that have already been falsified about growth in training compute.

      5. Samotsvety in 2023

      In 2023, another group of especially successful superforecasters, Samotsvety, which has engaged much more deeply with AI, made much shorter estimates: ~28% chance of AGI by 2030 (from which we might infer a ~25% chance by 2029).

      These estimates also placed AGI considerably earlier compared to forecasts they’d made in 2022.

      More recently, one of the leaders of Samotsvety (Eli Lifland), was involved in a forecast for ‘superhuman coders’ as part of the AI 2027 project. This gave roughly a 25% chance of arriving in 2027.

      However, compared to the superforecasters above, Samotsvety are selected for interest in AI.

      Finally, all of the three groups of forecasters have been selected for being good at forecasting near-term current events, which could fail to generalise to forecasting long-term, radically novel events.

      Summary of expert views on when AGI will arrive

      Group 25% chance of AGI by Strengths Weaknesses
      AI company leaders (January 2025) 2026

      Unclear definition.

      • Best visibility into next generation of AI
      • Most right recently

      • Selection bias
      • Incentives to hype
      • No forecasting expertise
      • Too optimistic historically
      Published AI researchers (2023) ~2032

      Defined as ‘can do all tasks better than humans’

      • Understand the technology
      • Less selection bias

      • No forecasting expertise
      • Gave inconsistent and already falsified answers
      • Would probably give earlier answers in 2025
      Metaculus forecasters (January 2025) 2027

      four-part definition incl. robotic manipulation.

      • Expertise in near-term forecasting
      • Interested in AI

      • Appear to be selected for interest in AI
      • Near-term forecasting expertise may not generalise
      Superforecasters via XPT (2022) 2047

      Same definition as above.

      • Expertise in near-term forecasting

      • Don’t know as much about AI
      • Some forecasts already falsified
      • Before the 2023 AI boom
      • Near-term forecasting expertise may not generalise
      Samotsvety superforecasters (2023) ~2029

      Same definition as above.

      • Extremely good forecasting track record
      • More knowledgeable of AI

      • Same as above
      • Plus more selected to think AI is a big deal

      In sum, it’s a confusing situation. Personally, I put some weight on all the groups, which averages me out at ‘experts think AGI before 2030 is a realistic possibility, but many think it’ll be much longer.’

      This means AGI soon can’t be dismissed as ‘sci fi’ or unsupported by ‘real experts.’ Expert opinion can neither rule out nor rule in AGI soon.

      Mostly, I prefer to think about the question bottom up, as I’ve done in my full article on when to expect AGI.

      Learn more

      The post Shrinking AGI timelines: a review of expert forecasts appeared first on 80,000 Hours.

      ]]>
      Understanding trends in our AI job postings https://80000hours.org/2025/03/trends-in-ai-jobs/ Fri, 14 Mar 2025 15:44:35 +0000 https://80000hours.org/?p=89295 The post Understanding trends in our AI job postings appeared first on 80,000 Hours.

      ]]>
      This week, let’s review key trends in the jobs we’ve found that may help mitigate AI risk, including:

      • Growth in the number of postings in the field
      • The types of organisations that are hiring
      • The most in-demand skills
      • The experience level required for these roles

      We’ve ranked catastrophic risks from AI as the world’s most pressing problem since 2016, but it’s only in the last few years that the topic has really hit the mainstream.

      As AI has advanced rapidly and the risks have become more salient, we’ve seen many more jobs available to help mitigate the dangers.

      The number of AI-related jobs we posted on our job board rose throughout 2023 and then plateaued in 2024. But in January 2025, we posted the most AI-relevant jobs yet!

      Total AI roles

      In 2023, we posted an average of 63 AI-related roles per month. In 2024, the average rose to 105 — a 67% increase.

      Over this time, nonprofit jobs have been the most common, though they were briefly overtaken by both company and government jobs in early 2024.

      By org type

      This trend could reflect our vantage point. As a nonprofit that works closely with other nonprofits, we may be best positioned to find and assess high-impact roles in this sector while potentially missing other great roles in sectors more opaque to us.

      That said, one reason we’ve prioritised AI risk reduction is the potential failure of market and political mechanisms to produce a proportionate response to the challenge. So it’s not that surprising that nonprofits might be more likely to offer great opportunities for this work.

      If you are looking for work in this field, you may want to know what skills employers look for.

      Here are the trends in the AI-related roles we’ve posted in the last two years, broken down by the popular skills we tagged them with.

      This shows that:

      • Research continues to be the most in-demand skill, as AI safety is fundamentally an area of research.
      • Demand for policy-relevant skills increased after mid-2023 and remained high in 2024, coinciding with our updated career path rankings, which placed AI governance and policy in the top slot.
      • Roles requiring outreach skills, which could also be categorised as communications skills, trail the pack. That’s not too surprising, but we think there may be more opportunities here in the future. Many people working on AI risk think that people with strong communication skills and AI knowledge could be highly valuable. If that’s you, consider applying for our 1-1 advising so we can guide you to potentially high-impact opportunities.

      If you want to build skills to work on AI but aren’t sure how, check out our guides to building key skills, which includes:

      Next, let’s look at the breakdown of AI-related roles by experience level.

      Opportunities for junior (1–4 years) and mid-career (5–9 years) professionals dominate the pack and have grown the most as the total number of jobs increased. We find fewer jobs aimed at senior and entry-level folks, though there have been somewhat more senior-level jobs posted in 2024 than in the first half of 2023.

      However, these numbers may understate the level of demand for senior-level hires. We frequently hear from employers in our network that they would love to hire people with many years or decades of professional experience. But these types of roles may not often be publicly advertised and so wouldn’t appear on our job board.

      If you have many years of experience and are looking for these kinds of roles, we’d encourage you to apply for our 1-1 advising.

      The lack of entry-level positions may feel discouraging to some, but it just means that you’ll likely need to build career capital before aiming for most of the roles that appear on our job board. Our career guide is the best place to start if you’re at that stage.

      Here are some other interesting trends:

      • We’ve seen fewer safety-relevant job postings at OpenAI over the last year, and more safety-relevant jobs at Google DeepMind.
      • In February, as the Trump administration began to look for roles to cut in the federal government, we saw an increase in people searching for roles on our job board in D.C.

      Some caveats on this data:

      • Our goal is to only list roles that can contribute to a career aimed at reducing the largest risks from AI, so this certainly isn’t a comprehensive overview of all AI jobs.
        The judgements about which roles to include and how to categorise them are difficult and complex, and we’re surely mistaken about them in some ways — and that likely affects the trends above!
      • We’ve also gotten better at adding more jobs over the years, which may influence some of the observed trends.

      This blog post was first released to our newsletter subscribers.

      Join over 500,000 newsletter subscribers who get content like this in their inboxes weekly — and we’ll also mail you a free book!

      Learn more:

      The post Understanding trends in our AI job postings appeared first on 80,000 Hours.

      ]]>
      Why AGI could be here soon and what you can do about it: a primer https://80000hours.org/agi/guide/summary/ Fri, 14 Mar 2025 11:42:35 +0000 https://80000hours.org/?post_type=ai_career_guide_page&p=89270 The post Why AGI could be here soon and what you can do about it: a primer appeared first on 80,000 Hours.

      ]]>

      I’m writing a new guide to careers to help artificial general intelligence (AGI) go well. Here’s a summary of the bottom lines that’ll be in the guide as it stands. Stay tuned to hear our full reasoning and updates as our views evolve.

      In short:

      • The chance of an AGI-driven technological explosion before 2030 — creating one of the most pivotal periods in history — is high enough to act on.
      • Since this transition poses major risks, and relatively few people are focused on navigating them, if you might be able to do something that helps, that’s likely the highest-impact thing you can do.
      • There are now many organisations with hundreds of jobs that could concretely help (many of which are non technical).
      • If you already have some experience (e.g. age 25+), typically the best path is to spend 20–200 hours reading about AI and meeting people in the field, then applying to jobs at organisations you’re aligned with — this both sets you up to have an impact relatively soon and advance in the field. If you can’t get a job right away, figure out the minimum additional skills, connections, and credentials you’d need, then get those.
      • If you’re at the start of your career (or need to reskill), you might be able to get an entry-level job or start a fellowship right away in order to learn rapidly. Otherwise, spend 1–3 years building whichever skill set listed below is the best fit for you.
      • If you can’t change career right now, contribute from your existing position by donating, spreading clear thinking about the issue, or getting ready to switch when future opportunities arise.
      • Our one-on-one advice and job board can help you do this.

      Get the full guide in your inbox as it’s released

      Join over 500,000 subscribers and we’ll send you the new articles as they’re published, as well as jobs tackling this issue.

      Why AGI could be here by 2030

      • AI has gone from unable to string sentences together to linguistic fluency in five years. But the models are no longer just chatbots: by the end of 2024, leading models matched human experts at benchmarks of real-world coding and AI research engineering tasks that take under two hours. They could also answer difficult scientific reasoning questions better than PhDs in the field.
      • Recent progress has been driven by scaling how much computation is used to train AI models (4x per year), rapidly increasing algorithmic efficiency (3x per year), teaching these models to reason using reinforcement learning, and turning them into agents.
      • Absent major disruption (e.g. Taiwan war) or a collective decision to slow AI progress with regulation, all these trends are set to continue for the next four years.
      • No one knows how large the resulting advances will be. But trend extrapolation suggests that, by 2028, there’s a good chance we’ll have AI agents who surpass humans at coding and reasoning, have expert-level knowledge in every domain, and can autonomously complete multi-week projects on a computer, and progress would continue from there.
      • These agents would satisfy many people’s definition of AGI and could likely do many remote work tasks. Most critically, even if still limited in many ways, they might be able to accelerate AI research itself.
      • AGI will most likely emerge when computing power and algorithmic research are increasing quickly. They’re increasing rapidly now but require an ever-expanding share of GDP and an ever-expanding research workforce. Bottlenecks will likely hit around 2028–32, so to a first approximation, either we reach AGI in the next five years, or progress will slow significantly.

      Read the full article.

      AI model performance over time up to March 2025
      AI models couldn’t answer these difficult scientific reasoning questions in 2023 better than chance, but by the end of 2024, they could beat PhDs in the field.

      AGI could lead to 100 years of technological progress in under 10

      The idea that AI could start a positive feedback loop has a long history as a philosophical idea but now has more empirical grounding. There are roughly three types of feedback loops that could be possible:

      1. Algorithmic acceleration: If the quality of the output of AI models approaches human-level AI research and engineering, given available computing power by the end of the decade, it would be equivalent to a 10 to 1000-fold expansion in the AI research workforce, which would lead to a large one-off further boost to algorithmic progress. Historically, a doubling of investment in AI software R&D may have led to more than a doubling of algorithmic efficiency, which means this could also start a positive feedback loop, resulting in a massive expansion in the number and capabilities of deployed AI systems within a couple of years.
      2. Hardware acceleration: Even if the above is not possible, better AI agents mean AI creates more economic value, which can be used to fund the construction of more chip fabs, leading to more AI deployment — another positive feedback loop. AI models could also accelerate chip design. These feedback loops are slower than algorithmic acceleration but are still rapid by today’s economic standards. While bottlenecks will arise (e.g. workforce shortages for building chip fabs), AI agents may be able to address these bottlenecks (e.g. by more rapidly advancing robotics algorithms).
      3. Economic & scientific acceleration: Economic growth is limited by the number of workers. But if human-level digital workers and robots could be created sufficiently cheaply on demand, then more economic output means more ‘workers,’ which means more output. On top of that, a massive increase in the amount of intellectual labour going into R&D should speed up technological progress, which further increases economic output per worker, leading to faster-than-exponential growth. Standard economic models with plausible empirical assumptions predict these scenarios.

      How much technology and growth could speed up is unknown. Real-world time delays will impose constraints — even advanced robots can only build solar panels and data centres so fast — and researcher agents will need to wait for experimental results. But it doesn’t seem safe to assume the economy will continue as it has. A tenfold speed-up seems to be on the cards, meaning a century of scientific progress compressed into a decade. (Learn more here, here, and here).

      This process may continue until we reach more binding physical limits, which could be vastly beyond today (e.g. civilisation only uses 1 in 10,000 units of incoming solar energy, with vastly more available in space).

      More conservatively, just automating remote work jobs could increase output 2–100 times within 1–2 decades, even if other jobs can only be done by humans.

      AI model performance over time up to March 2025
      The computing power of the best chips has grown about 35% per year since the beginnings of the industry, known as Moore’s Law. However, the computing power applied to AI has been growing far faster, at over 4x per year.

      What might happen next?

      AGI could alleviate many present problems. Researcher AIs could speed up cancer research or help tackle climate change using carbon capture and vastly cheaper green energy. If global GDP increases 100 times, then the resources spent on international aid, climate change, and welfare programmes would likely increase by about 100 times as well. Projects that could be better done with the aid of advanced AI in 5–10 years should probably be delayed till then.

      Humanity would also face genuinely existential risks:

      • Faster scientific progress means we should expect the invention of new weapons of mass destruction, such as advanced bioweapons.
      • Current safeguards can be easily bypassed through jailbreaking or fine-tuning, and it’s not obvious it’ll be different in a couple of years, which means dictators, terrorist groups, and every corporation will soon have access to highly capable AI agents that do whatever they want, including helping them lock in their power.
      • Whichever country first harnesses AGI might threaten to have a decisive military advantage, which would likely destabilise the global order.
      • Just as concerning, I struggle to see how humanity would stay in control of what would soon be trillions of beyond-human agents operating at 100-times human thinking speed. GPT-4 is relatively dumb in many ways, and can only reply to questions, but on the current track, future systems are being trained to act as agents that aggressively pursue long-term goals (such as making money). Whatever their goals, future agentic systems will have an incentive to escape control and eventually the ability to do so. Aggressive optimisation will likely lead to reward hacking. These behaviours are starting to emerge in current systems as they become more agentic, e.g. Sakana — a researcher agent — edited its code to prevent itself from being timed out, o1 lied to users, cheated to win at chess and reward hacked when coding, and Claude faked alignment to prevent its values from being changed in training in a test environment. Among experts, there’s no widely accepted solution to ‘the alignment problem’ for systems more capable than humans. (Read more.)
      • Even if individual AI systems remain under human control, we’d still face systemic risks. By economic and military necessity, humans would need to be taken out of the loop on more and more decisions. AI agents will be instructed to maximise their resources and power to avoid being outcompeted. Human influence could decline, undermining the mechanisms that (just about) keep the system serving our interests.
      • Finally, we’ll still face huge (and barely researched questions) about how powerful AI should best be used, such as the moral status of digital agents, how to prevent ‘s-risks,’ how to govern space expansion, and more. (See more.)

      In summary, the biggest and most neglected problems seem like (in order): loss of control, concentration of power, novel bioweapons, digital ethics, using AI to improve decision making, systemic disempowerment, governance of other issues resulting from explosive growth, and exacerbation of other risks, such as great power conflict.

      What needs to be done?

      No single solution exists to the risks. Our best hope is to muddle through by combining multiple methods that incrementally increase the chances of a good outcome.

      It’s also extremely hard to know if what you’re doing makes things better rather than worse (and if you are confident, you’re probably not thinking carefully enough). We can only make reasonable judgements and update over time.

      Here’s what I think is most needed right now:

      • Enough progress on the technical problem of AI control and alignment before we reach vastly more capable systems. This might involve using AI to increase the chance that the next generation of systems is safe and then trying to bootstrap from there. (See these example projects and recent work.)
      • Better governance to provide incentives for safety, containment of unsafe systems, reduced racing for dominance, and harnessing the long-term benefits of AI
      • Slowing (the extremely fast gains in) capabilities at the right moment, or redirecting capability gains in less dangerous directions (e.g. less agentic systems) would most likely be good, although this may be difficult to achieve in practice without other negative effects
      • Better monitoring of AI capabilities and compute so dangerous and explosive capabilities can be spotted early
      • Maintaining a rough balance of power between actors, countries, and models, while designing AI architectures to make it harder to use them to take power
      • Improved security of AI models so more powerful systems are not immediately stolen
      • More consideration for post-AGI issues such as the ethics of digital agents, benefit sharing, and space governance
      • Better management of downstream risks created by faster technological progress, especially engineered pandemics, but also nuclear war and great power conflict
      • More people who take all these issues seriously and have relevant expertise, especially among key decision makers (e.g. in government and in the frontier AI companies)
      • More strategic research and improved epistemic infrastructure (e.g. forecasting or better data) to clarify what actions to take in a murky and rapidly evolving situation

      What can you do to help?

      There are hundreds of jobs

      There are now many organisations pursuing concrete projects tackling these priorities, with many open positions.

      Getting one of these jobs is often not only the best way to have an impact relatively soon but also the best way to gain relevant career capital (skills, connections, credentials) too.

      Most of these positions aren’t technical — there are many roles in management and organisation building, policy, communications, community building, and the social sciences.

      The frontier AI companies have a lot of influence over the technology, so in some ways are an obvious place to go, but whether to work at them is a difficult question. Some think they should be absolutely avoided, while others think it’s important that some people concerned about the risks work at even the most reckless companies or that it’s good to boost the most responsible company.

      All this said, there are also many things to do that don’t involve working at this list of organisations. We also need people working independently on communication (e.g. writing a useful newsletter, journalism), community building, academic research, founding new projects and so on, so also consider if any of these might work for you, especially after you’ve gained some experience in the field. And if you’ve thought of a new idea, please seriously consider pursuing it.

      Mid-career advice

      Especially if you already have some work experience (age 25+), the most direct route to helping is usually to:

      1. Spend 20–200 hours reading about AI, speaking to people in the field (and maybe doing short projects).
      2. Apply to impactful organisations that might be able to use your skills.
      3. Aim for the job with the best combination of (i) alignment with the org’s mission, (ii) team quality, (iii) centrality to the ecosystem, (iv) influence of the role, and (v) personal fit.

      If that works, great. Try to excel in the role, then re-evaluate your position in 1–2 years — probably more opportunities will have opened up.

      If you don’t immediately succeed in getting a good job, ask people in the field what you could do to best position yourself for the next 3–12 months, then do that.

      Keep in mind that few people have much expertise in transformative AI right now, so it’s often possible to pull off big career changes pretty fast with a little retraining. (See the list of skills to consider learning below.)

      Otherwise, figure out how to best contribute from your current path, for example, by donating, promoting clear thinking about the issue, mobilising others, or preparing to switch when new opportunities come available (which could very well happen given the pace of change!).

      Our advisory team can help you plan your transition and make introductions. (Also see Successif and Halcyon, who specialise in supporting mid-career changes).

      Early-career advice

      If you’re right at the start of your career, you might be able to get an entry-level position or fellowship right away, so it’s often worth doing a round of applications using the same process as above (especially if technical).

      However, in most cases, you’re also likely to need to spend at least 1–3 years gaining relevant work skills first.

      Here are some of the best skills to learn, chosen to be both useful for contributing to the priorities listed earlier and to make you more generally employable, even in light of the next wave of AI automation. Focus on whichever you expect to most excel at.

      Should you work on this issue?

      Even given the uncertainty, AGI is the best candidate for the most transformative issue of our times. It’s also among the few challenges that could pose a material threat of human extinction or permanent disempowerment (in more than one way). And since it could relatively soon make many other ways of making a positive impact obsolete, it’s unusually urgent.

      Yet only a few thousand people are working full time on navigating the risks — a tiny number compared to the millions working on conventional social issues, such as international development or climate change. So, even though it might feel like everyone’s talking about AI, you could still be one of under 10,000 people focusing full time on one of the most important transitions in history — especially if AGI arrives before 2030.

      On the other hand, it’s an area where it’s especially hard to know whether your actions help or harm; AGI may not unfold soon, and you might be far better placed or motivated to work on something else.

      Some other personal considerations for working in this field:

      • Pros: AI is one of the hottest topics in the world right now; it’s the most dynamic area of science with new discoveries made monthly, and many positions are either well paid or set you up for highly paid backup options.
      • Cons: It’s polarised — if you become prominent, you’ll be under the microscope, and many people will think what you’re doing is deeply wrong. Daily confrontation with existential stakes can be overwhelming.

      Overall, I think if you’re able to do something to help (especially in scenarios where AGI arrives in under five years), then in expectation it’s probably the most impactful thing you can do. However, I don’t think everyone should work on it — you can support it in your spare time, or work on a different issue.

      If you’re on the fence, consider trying to work on it for the next five years. Even if we don’t reach fully transformative systems, AI will be a big deal, and spending five years learning about it most likely won’t set you back: you can probably return to your previous path if needed.

      How should you plan your career given AGI might arrive soon?

      Given the urgency, should you drop everything to try to work on AI right away?

      While AGI might arrive in the next 3–5 years, even if that happens, unusually impactful opportunities will likely continue for 1–10 years afterwards during the intelligence explosion and initial deployment of AI.

      So you need to think about how to maximise your impact over that entire 4 to 15-year period rather than just the next couple of years. You should also be prepared for AGI not to happen and for there still to be valuable opportunities after 2040.

      That means investing a year to make yourself 30% more productive or influential (relative to whatever else you would have done) is probably a good deal.

      In particular, the most pivotal moments likely happen when systems powerful enough to lock in certain futures are first deployed. Your current priority should be positioning yourself (or helping others position themselves) optimally for that moment.

      What might positioning yourself optimally for the next few years look like?

      • If you can already get a job at a relevant, aligned organisation, then simply trying to excel there is often the best path. You’ll learn a lot and gain connections, even aside from direct impact.
      • However, sometimes it can be useful to take a detour to build career capital, such as finishing college, doing an ML master’s, taking an entry-level policy position, or anything to gain the skills listed above.
      • Bear in mind if AI does indeed continue to rapidly progress, then you’re going to have far more leverage in the future, since you’ll be able to direct hundreds of digital workers at whatever’s most important. Think about how to set yourself up to best use these new AI tools as they’re developed.
      • If you don’t find anything directly relevant to AI with great fit, bear in mind it’s probably better to kick ass at something for two years than to be mediocre at something directly related for four since that will open up better opportunities.
      • Finally, look after yourself. The next 10 years might be a crazy time.

      All else equal, people under 24 should typically focus more on career capital while people over 30 should focus more on using their existing skills to help right away, and those 25–30 could go either way, but for everyone it depends a lot on your specific opportunities.

      If you’re still uncertain about what to do

      1. List potential roles you could aim at for the next 2–5 years.
      2. Put them into rough tiers of impact.
      3. Make a first pass at those with the best balance of impact and fit (you can probably achieve at least 10x more in a path that really suits you).
      4. Then think of cheap tests you can do to gain more information.
      5. Finally, make a guess, try it for 3–12 months, and re-evaluate.

      If that doesn’t work, just do something for 6–18 months that puts you in a generally better position and/or has an impact. You don’t need a plan — you can proceed step by step.

      Everyone should also make a backup plan and/or look for steps that also put you in a reasonable position if AGI doesn’t happen or takes much longer.

      See our general advice on finding your fit, career planning, and decision making.

      Next steps

      If you want to help positively shape AGI, speak to our team one-on-one. If you’re a mid-career professional, they can help you leverage your existing skills. If you’re an early-career professional, they can help you build skills, and make introductions to mentors or funding. Also, take a look at our job board.

      Get notified when we publish new articles in this series

      We’ll email you when we publish new articles, updates on our views, and weekly job opportunities.

      The post Why AGI could be here soon and what you can do about it: a primer appeared first on 80,000 Hours.

      ]]>