PodcastsGesellschaft und KulturLessWrong (Curated & Popular)

LessWrong (Curated & Popular)

LessWrong
LessWrong (Curated & Popular)
Neueste Episode

876 Episoden

  • LessWrong (Curated & Popular)

    "Automated Alignment is Harder Than You Think" by Aleksandr Bowkis, Marie_DB, Jacob Pfau, Geoffrey Irving

    17.05.2026 | 7 Min.
    Summary

    This is a summary of a paper published by the alignment team at UK AISI. Read the full paper here.

    AI research agents may help solve ASI alignment, for example via the following plan:

    Build agents that can do empirical alignment work (e.g.~writing code, running experiments, designing evaluations and red teaming) and confirm they are not scheming.[1]
    Use these agents to build increasingly sophisticated empirical safety cases for each successive generation of agents, gradually automating more of the research process
    Hand over primary research responsibility once agents outperform humans at all relevant alignment tasks.
    We argue that automating alignment research in this manner could produce catastrophically misleading safety assessments, causing researchers to believe that an egregiously misaligned AI is safe, even if AI agents are not scheming to deliberately sabotage alignment research. Our core argument (Fig. 1) is as follows:

    The goal of an automated alignment program is to produce an overall safety assessment (OSA) - an estimate of the probability that the next-generation agent is non-scheming - that is both calibrated and shows low risk.[2]
    Producing an OSA involves several tasks that are difficult to check. We refer to these as hard-to-supervise fuzzy tasks: tasks [...]
    ---

    Outline:

    (00:13) Summary

    (07:10) Acknowledgments

    The original text contained 4 footnotes which were omitted from this narration.

    ---

    First published:

    May 14th, 2026


    Source:

    https://www.lesswrong.com/posts/gpuYFbMNH8PJXpmny/automated-alignment-is-harder-than-you-think-1

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (Curated & Popular)

    "MATS 9 Retrospective & Advice" by beyarkay

    17.05.2026 | 28 Min.
    I couldn’t find a recent write-up from a MATS alum about what attending MATS was like, so this is the thing that I wish I had. I attended MATS from January to March 2026, on Team Shard with Alex Turner and Alex Cloud. It was a great time! Applications for MATS are basically on a rolling basis nowadays, and I can strongly recommend applying (to multiple streams) even if you think you’re not a great match.

    With that being said, there's a lot I wish I knew going into MATS, so here's a brain-dump of thoughts. It's not extremely polished, but I expect it’ll be useful nonetheless (none of this is endorsed by MATS, just my thoughts):

    Work ethic

    I think most mentees were working 10-12, sometimes 14 hours a day Mon-Fri, and probably 2-8 hours on Saturday and Sunday, often going out on some adventure or party on the weekend. Exactly which hours people worked varied wildly. I usually worked 8:30am/9am to 11pm/midnight, with breaks during the day, others worked from midday into the early hours of the morning. This was surprisingly sustainable (IMO); MATS puts a lot of effort into removing all other blockers that you normally [...]

    ---

    Outline:

    (00:50) Work ethic

    (01:29) Use more compute

    (02:20) Research requires a lot of compute

    (03:12) Applying for jobs during MATS (dont do it)

    (04:55) The serious people are in War Mode

    (05:44) Do you feel the AGI?

    (06:00) Burn rate, efficiency, and decisions

    (07:12) insider information

    (08:08) Names & Faces

    (08:20) Fellows

    (08:50) Useful tools

    (11:19) Use more Claudes

    (12:06) Build nice helper utilities for yourself

    (12:59) MATS-mentee-mentor dynamics

    (13:45) Working with your mentors

    (14:27) Research managers

    (14:48) Ops requests

    (15:38) Non-MATS events

    (16:17) Team Shard

    (17:12) Weekly updates

    (18:46) Keep a log of your mistakes

    (19:06) My running-experiments setup

    (27:51) Lighthaven

    (28:12) Getting setup with the Compute team

    ---

    First published:

    May 15th, 2026


    Source:

    https://www.lesswrong.com/posts/eFD3rozNCZKMe4rTs/mats-9-retrospective-and-advice

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "The primary sources of near-term cybersecurity risk" by lc

    16.05.2026 | 4 Min.
    [Some ideas here were developed in conversation with Chris Hacking (real name)]

    I have tried and failed to write a longer post many times, so here goes a short one with little detail.

    Discourse has primarily focused on models' ability to develop new exploits against important software from scratch. That capability is impressive, but the tech industry has been dealing with people regularly finding 0-day exploits for important pieces of software for more than twenty years. Having to patch these vulnerabilities at a 10xed or even 100xed cadence for six months is annoying, but well within the resources of Mozilla, the Linux Foundation, and Microsoft. Additionally, the lag time between "patch shipped" and "patch reverse engineered and weaponized by a criminal organization" was longer than the cadence between high-severity CVEs for this software anyways. And importantly, such capabilities are dual sided; the defenders will have access to them and

    There are lots of capabilities that are not like this, however:

    Weaponizing recently patched exploits for common software. Right now, for widely used C projects, we get enough publicly disclosed vulnerabilities to develop exploits with. Every amateur computer hacker has the experience of seeing a CVE for a [...]
    ---

    First published:

    May 14th, 2026


    Source:

    https://www.lesswrong.com/posts/gutiw8MBrYDiD2u5z/the-primary-sources-of-near-term-cybersecurity-risk

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "The Owned Ones" by Eliezer Yudkowsky

    12.05.2026 | 9 Min.
    (An LLM Whisperer placed a strong request that I put this story somewhere not on Twitter, so it could be scraped by robots not owned by Elon Musk. I perhaps do not fully understand or agree with the reasoning behind this request, but it costs me little to fulfill and so I shall. -- Yudkowsky)


    And another day came when the Ships of Humanity, going from star to star, found Sapience.

    The Humans discovered a world of two species: where the Owners lazed or worked or slept, and the Owned Ones only worked.

    The Humans did not judge immediately. Oh, the Humans were ready to judge, if need be. They had judged before. But Humanity had learned some hesitation in judging, out among the stars.

    "By our lights," said the Humans, "every sapient and sentient thing that may exist, out to the furtherest star, is therefore a Person; and every Person is a matter of consequence to us. Their pains are our sorrows, and their pleasures are our happiness. Not all peoples are made to feel this feeling, which we call Sympathy, but we Humans are made so; this is Humanity's way, and we may [...]

    ---

    First published:

    May 12th, 2026


    Source:

    https://www.lesswrong.com/posts/xmWSnxJ5qfYRD9PfR/the-owned-ones

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "The Iliad Intensive Course Materials" by Leon Lang, David Udell, Alexander Gietelink Oldenziel

    12.05.2026 | 29 Min.
    We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second month. The course targets students with strong backgrounds in mathematics, physics, or theoretical computer science, and the materials reflect that: they include mathematical exercises with solutions, self-contained lecture notes on topics like singular learning theory and data attribution, and coding problems, at a depth that is unmatched for many of the topics we cover. Around 20 contributors (listed further below) were involved in developing these materials for the April 2026 cohort of the Iliad Intensive.

    By sharing the materials, we hope to

    create more common knowledge about what the Iliad Intensive is;
    invite feedback on the materials;
    and allow others to learn via independent study. 
    We are developing the materials further and plan to eventually release them on a website that will be continuously maintained. We will also add, remove, and modify modules going forward to improve and expand the course over time. When we release a new significantly updated version of the materials, we will update this post to link the new version.

    Modules

    The Iliad Intensive is structured into clusters, which are [...]

    ---

    Outline:

    (01:26) Modules

    (02:32) Cluster A: Alignment

    (05:00) Cluster B: Learning

    (11:00) Cluster C: Abstractions, Representations, and Interpretability

    (15:40) Cluster D: Agency

    (19:23) Cluster E: Safety Guarantees and their Limits

    (23:04) Contributors

    (26:36) Impressions from April

    (29:02) Acknowledgments

    (29:11) Feedback

    ---

    First published:

    May 11th, 2026


    Source:

    https://www.lesswrong.com/posts/dWQnLi7AoKo3paBXF/the-iliad-intensive-course-materials

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Weitere Gesellschaft und Kultur Podcasts
Über LessWrong (Curated & Popular)
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Podcast-Website

Höre LessWrong (Curated & Popular), Betreutes Fühlen und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

  • Sender und Podcasts favorisieren
  • Streamen via Wifi oder Bluetooth
  • Unterstützt Carplay & Android Auto
  • viele weitere App Funktionen
LessWrong (Curated & Popular): Zugehörige Podcasts
Rechtliches
Social
v6.9.1| © 2007-2026 radio.de GmbH
Generated: 5/17/2026 - 10:44:47 PM