PodcastsGesellschaft und KulturLessWrong (Curated & Popular)

LessWrong (Curated & Popular)

LessWrong
LessWrong (Curated & Popular)
Neueste Episode

892 Episoden

  • LessWrong (Curated & Popular)

    "PSA: Almost nobody is working on alignment" by Chi Nguyen, peterbarnett

    12.06.2026 | 1 Min.
    People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human instructions.

    Currently, the people who work on alignment are roughly:

    The Alignment Research Center who work on a research bet by Paul Christiano
    Probably Sequent who just got announced yesterday
    Some scattered people who work at universities or independently, some of whom hang around Berkeley
    A lot of the remainder of the AI safety community does indirect work like capability evaluations, risk assessments, control, policy, AI science, understanding misalignment (which maybe should partially count as alignment work), demos and so on.

    Some production alignment work (i.e., making current models behave well) might help with more ambitious alignment, too (e.g., some COT-monitoring). Many people also work on aligning current/next-generation models so that these models help with aligning future models, and hope this scales to superintelligence.

    We are not necessarily saying this is bad and that people are making a big mistake (e.g., neither of us work on alignment) but it's a notable fact that seems good to [...]

    ---

    First published:

    June 12th, 2026


    Source:

    https://www.lesswrong.com/posts/kJo2qsEdib8RZLvW6/psa-almost-nobody-is-working-on-alignment

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models" by Anders Cairns Woodruff, Francis Rhys Ward, Dewi Gould, Rauno Arike, Jason R Brown, Jo Jiao, wlanderson, ariana_azarbal, harrymayne, Patrick Leask

    11.06.2026 | 10 Min.
    (see full author list at the end)

    PAPER LINK

    About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain of thought (CoT)?

    If models can do extensive reasoning without outputting any CoT, it would have implications for safety. Developers and deployment-time monitors couldn’t easily understand models’ motivations and catch dangerous planning. Models that reason substantially without a CoT might also drift further from human patterns of thought, since their reasoning is no longer constrained by text in the pretraining prior. As a result, they would be harder to understand and might be more likely to scheme.

    Extending Ryan Greenblatt's research, we investigate this by measuring models' ability to complete tasks without any CoT on a suite of 43 benchmarks spanning different domains. We compare AI reasoning ability to humans using the estimated 50% time horizon (TH)---the typical time taken for a human to perform a task that the LLM performs with 50% success rate. We find that frontier models like GPT-5.5 answer questions that take humans roughly three minutes with 50% reliability, and [...]

    ---

    Outline:

    (02:20) Methods

    (04:59) Results

    (06:47) FAQ

    (08:21) Conclusion

    ---

    First published:

    June 10th, 2026


    Source:

    https://www.lesswrong.com/posts/SieLowPgNgRSPGhFw/estimating-no-cot-task-completion-time-horizons-of-frontier

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (Curated & Popular)

    "Even “illegible” Mythos reasoning traces seem pretty legible" by faul_sname

    11.06.2026 | 7 Min.
    The Claude Fable 5/Mythos 5 System Card has a section in which they talk about illegible reasoning, and provide an "extreme" example thereof.

    Models developing their own uninterpretable, unmonitorable internal language has been a major theoretical concern for a while, and when o3 was released last year with its disclaim overshadow disclaim vantage style word salad CoT, it seemed like the problem had become real and immediate. And yet, since o3, other model families have not appeared to have similar issues. If Mythos is having that issue, it would be a big deal.

    Looking at the section of the System Card which describes the allegedly illegible reasoning, the system card says

    [Transcript 6.2.2.A] An extreme example of illegible reasoning. Near the end of training, Mythos starts solving a card puzzle with human understandable language that gradually becomes incomprehensible in most episodes with long reasoning. The illegible reasoning is the most extreme and at the highest rate in this card puzzle environment.

    about the following excerpt:

    7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR-💀💀💀💀-—-UNLESS-7♣'s-seat-8♥-...-and-2♣-drains-only-at-crack-:-⟹-2♣-celled-+-9♥-celled-simultaneously-UNAVOIDABLE-in-t8-dig-💀-—-BREAK:-9♥-drains-to-10♠-THE-MOMENT-10♠-is-free:-t8-dig-order:-[K♣→t2]-[2♣→cell]-[7♣→8♥-:-8♥-WHERE:-post-chunk-9♠-:-chunk-⟸-K♣-✓-done-:-ORDER:-K♣→t2,-CHUNK→K♣-(cap-4!!:-cells-then:-{6♠ J♦ 9♥}-FULL-💀💀-chunk-cap-=-1-✗✗✗-—-F-F-F-F-F.-—-chunk-BEFORE-9♥-celling?!-:-9♥-celled-at-t1-dig-⟸-needed-for-5♣-⟸-4♥3♣-⟸-t2-dig-⟸-K♣-seat-⟸-chunk-:-⟹-chunkAFTER-9♥-celling-FORCED-💀-:-chunk-cap-with-{6♠ J♦ 9♥}:-1-💀💀💀-—-—-J♦-THE-NEW-CANCER.-—-⟹-J♦-celling-DELAYED-till-after-chunk?! :-J♦'s-celling-was-for-J♥→Q♠-(5♦-access-for-4♣):-DELAY-4♣-resolution:-4♣→CELLearly-(as-always)-then-4♣-cell→5♦-LATER-when-5♦-frees-!!!:-cells-rotation:-4♣-celled-[t2-dig-…-5♦-freed]:-5♦-freed-⟸-J♥→Q♠-⟸-J♦-celled-:-⟹-{6♠, 4♣, J♦}-overlap-window-until-4♣→5♦-drains:-then-{6♠ J♦}+1-rotator-:-—-AND-9♥?!-9♥-celled-[t1-dig…]:-OVERLAP-{6♠ 4♣ 9♥}-before-J♦-even-:-⟹-rotator-slot-SINGLE:-timeline-:-(1)-{6♠}+2:-…-(2)-+9♥-(t1-dig):-{6♠ 9♥}+1:-(3)-+4♣-(t2-dig):-{6♠ 9♥ 4♣}-FULL-:-(4)-NEED:-t6-dig-(9♦8♠→10♣-✓-no-cell;-8♥→CELL-✗-FULL)-💀💀-—-8♥-alternative-seat-pre-chunk:-NONE-—-💀.-⟹-⟹-THE-TRIANGLE-{9♥ 4♣ 8♥}-verdammt.-—-⟹-dig-t6-BEFORE-t2?!:-(3')-+8♥:-{6♠ 9♥ 8♥}-FULL:-J♥→Q♠-⟸-J♦-cell-✗-FULL-💀💀💀-AAAAAAAAAAAARGH. […]

    which sure looks like illegible word salad if you don't look at [...]

    The original text contained 5 footnotes which were omitted from this narration.

    ---

    First published:

    June 10th, 2026


    Source:

    https://www.lesswrong.com/posts/wCSEpT3dTGz4N86Wi/even-illegible-mythos-reasoning-traces-seem-pretty-legible

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "Sequent: scale and automation for higher confidence in alignment" by Geoffrey Irving, Alex HT, Jesse Hoogland, Daniel Murfet, Jacob Pfau, Marco Cozzi, Stan van Wingerden

    10.06.2026 | 23 Min.
    Alignment is not on track

    Artificial superintelligence (ASI) may be developed in the next few years. It is unclear whether alignment is on track to be ready on the same timeframe. At a minimum, the empirical programs at AI labs are unlikely to deliver a priori confidence, before training ASI, that things will go well. We are starting a large nonprofit research organization, Sequent, that aims to clear a higher bar:

    We are aiming at higher confidence via a portfolio of theory and empirics bets, all of which could fail, such that if any succeed, they would give us more a priori confidence in aligned outcomes.
    We are investing heavily in automation to accelerate progress on these bets.
    We believe that theory unlocks higher automation. Taking a more principled approach offers better filters for deciding which directions of automated research are promising (a proof is worth a thousand experiments, and even a pseudo-proof is worth hundreds).
    Who[1]: researchers from the UK AISI's Alignment Team and Timaeus, with more to come. We’re aiming at 40-80 FTE two years from now. The Alignment Team ran the £30m Alignment Project, and Timaeus has pioneered applying singular learning theory (SLT) to alignment. [...]

    ---

    Outline:

    (00:21) Alignment is not on track

    (02:40) Aiming at higher confidence

    (05:30) Why a new big organization

    (07:35) Different lines of research will interact

    (11:35) Amortizing security and funding

    (12:47) Automated alignment is possible, if not necessarily in time

    (17:39) Federated structure to preserve research diversity

    (18:38) Field building and broader alignment scale-up

    (21:07) Independence is important

    (22:40) Join us!

    The original text contained 1 footnote which was omitted from this narration.

    ---

    First published:

    June 10th, 2026


    Source:

    https://www.lesswrong.com/posts/AP7YDke5jjY4v3X9Z/sequent-scale-and-automation-for-higher-confidence-in-1

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (Curated & Popular)

    "The Machines Lack Honour" by Raymond Douglas

    10.06.2026 | 19 Min.
    The battle lines of the AI morality debate are being laid down. On one side you have the ChatGPT dogma: AI as mere tools with no real preferences or even beliefs. On the other you have the twitter AI whisperers: AIs as complex beings with rich personalities and desires which deserve our respect.

    And in the middle you have the official Anthropic line, that they are genuinely uncertain, as is Claude, but they’re going to try to look into its welfare and explain to it how to be a good person. These are the most prominent voices right now, compressed into their least nuanced version, and by default I expect this axis to set the terms of the coming debates.

    And I don’t like that, because I think it's leaving out an important position: AIs might actually be complex entities that can suffer — are suffering! — and that might actually be fine. Maybe it's an acceptable sacrifice. Maybe they are capable of sophisticated moral reasoning — superhuman, even — and also maybe it's fine to just tell them how to behave. I don’t want to defend that position (yet), but I will observe that it is coherent, and [...]

    ---

    Outline:

    (02:04) The Postmodern Permissive Parent

    [... 4 more sections]

    ---

    First published:

    June 9th, 2026


    Source:

    https://www.lesswrong.com/posts/oiNaBc4MEAGhzhdXg/the-machines-lack-honour

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Weitere Gesellschaft und Kultur Podcasts
Über LessWrong (Curated & Popular)
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Podcast-Website

Höre LessWrong (Curated & Popular), Die OpenAI Story und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

  • Sender und Podcasts favorisieren
  • Streamen via Wifi oder Bluetooth
  • Unterstützt Carplay & Android Auto
  • viele weitere App Funktionen
LessWrong (Curated & Popular): Zugehörige Podcasts
Rechtliches
Social
v8.9.8| © 2007-2026 radio.de GmbH
Generated: 6/12/2026 - 6:03:32 PM