PodcastsGesellschaft und KulturLessWrong (Curated & Popular)

LessWrong (Curated & Popular)

LessWrong
LessWrong (Curated & Popular)
Neueste Episode

870 Episoden

  • LessWrong (Curated & Popular)

    "What I did in the hedonium shockwave, by Emma, age six and a half" by ozymandias

    11.05.2026 | 7 Min.
    My name is Emma and I’m six and a half years old and I like pink and Pokemon and my cat River and I’m going to be swallowed by a hedonium shockwave soon, except you already know that about me because everyone else is too.

    “Hedonium shockwave” means that everyone is going to be happy forever. Not just all the humans but all the animals and the flowers and the ground and River too. It has already made a bunch of the stars happy, like Betelgeuse and Alpha Centauri.

    Scientists saw that the stars were blinking out, and they did a lot of very hard science and figured out that the stars were turning into happiness. I wanted to be a scientist when I grew up but I won’t be a scientist because instead I’m going to be happy forever.

    I used to have a hard time saying “hedonium shockwave” but grownups keep saying it so I’ve gotten a lot of practice. Sometimes it seems like all grownups do, in real life and on the TV, is say “hedonium shockwave” at each other until they all start crying.

    I looked at the sky to see if I could see [...]

    ---

    First published:

    April 13th, 2026


    Source:

    https://www.lesswrong.com/posts/rgXQuG8KXtxugSG6H/what-i-did-in-the-hedonium-shockwave-by-emma-age-six-and-a

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "Bad Problems Don’t Stop Being Bad Because Somebody’s Wrong About Fault Analysis" by Linch

    10.05.2026 | 5 Min.
    Here's a dynamic I’ve seen at least a dozen times:




    Alice: Man that article has a very inaccurate/misleading/horrifying headline.

    Bob: Did you know, *actually* article writers don't write their own headlines?



    But what I care about is the misleading headline, not your org chart

    __

    Another example I’ve encountered recently is (anonymizing) when a friend complained about a prosaic safety problem at a major AI company that went unfixed for multiple months. Someone else with background information “usefully” chimed in with a long explanation of organizational limitations and why the team responsible for fixing the problem had limitations on resources like senior employees and compute, and actually not fixing the problem was the correct priority for them etc etc etc.

    But what I (and my friend) cared about was the prosaic safety problem not being fixed! And what this says about the company's ability to proactively respond to and fix future problems. We’re complaining about your company overall. Your internal team management was never a serious concern for us to begin with!

    __

    A third example comes from Kelsey Piper.

    Kelsey wrote about the (horrifying) recent case where Hantavirus carriers in the recent [...]

    The original text contained 1 footnote which was omitted from this narration.

    ---

    First published:

    May 8th, 2026


    Source:

    https://www.lesswrong.com/posts/PCsmhN9z65HtC4t5v/bad-problems-don-t-stop-being-bad-because-somebody-s-wrong

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "x-risk-themed" by kave

    09.05.2026 | 6 Min.
    Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They’ll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could join. A lot of the conversation will be about who's hiring, what the pay is, what the work-life balance is like, or how qualified the person is for the role.

    Sometimes the conversation focuses on what will help with x-risk, and where people are dropping the ball. But often, that's not the focus. In those conversations, people seem mostly worried about where they'll thrive. And I think that's often the correct concern.

    Most people aren’t in crunch mode, in super short timelines mode; even if their models would license that, I think they don’t know how to do it without throwing their minds away or Pascal's mugging themselves. And if they're playing a longer time horizon game, the plan can't be to run unsustainably forever. People probably make better plans if they’re honest about their limits.

    But, given that they're willing to trade off so much impact for fit, I’m surprised that basically no one mentions [...]

    ---

    First published:

    May 6th, 2026


    Source:

    https://www.lesswrong.com/posts/eW7knx6zPSKzFc8iK/x-risk-themed

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

    08.05.2026 | 18 Min.
    Abstract

    We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text description and an activation reconstructor (AR) that maps the description back to an activation. We jointly train the AV and AR with reinforcement learning to reconstruct residual stream activations. Although we optimize for activation reconstruction, the resulting NLA explanations read as plausible interpretations of model internals that, according to our quantitative evaluations, grow more informative over training.

    We apply NLAs to model auditing. During our pre-deployment audit of Claude Opus 4.6, NLAs helped diagnose safety-relevant behaviors and surfaced unverbalized evaluation awareness—cases where Claude believed, but did not say, that it was being evaluated. We present these audit findings as case studies and corroborate them using independent methods. On an automated auditing benchmark requiring end-to-end investigation of an intentionally-misaligned model, NLA-equipped agents outperform baselines and can succeed even without access to the misaligned model's training data.

    NLAs offer a convenient interface for interpretability, with expressive natural language explanations that we can directly read. To support further work, we release training code and trained NLAs [...]

    ---

    Outline:

    (00:15) Abstract

    [... 6 more sections]

    ---

    First published:

    May 7th, 2026


    Source:

    https://www.lesswrong.com/posts/oeYesesaxjzMAktCM/natural-language-autoencoders-produce-unsupervised

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    [Linkpost] "Interpreting Language Model Parameters" by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

    07.05.2026 | 4 Min.
    This is a link post. This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it.

    VPD greatly improves on our previous techniques, Stochastic Parameter Decomposition (SPD) and Attribution-based Parameter Decomposition (APD). We think the parameter decomposition approach is now more-or-less ready to be applied at scale to models people care about.







    Importantly, we show that we can decompose attention layers, which interp methods like transcoders and SAEs have historically struggled with.





    We also build attribution graphs of the model for some prompts using causally important parameter subcomponents as the nodes, and interpret parts of them.

    While we made these graphs, we discovered that our adversarial ablation method seemed pretty important for faithfully identifying which nodes in them were causally important for computing the final output. We think this casts some doubt on the faithfulness of subnetworks found by the majority of other subnetwork identification methods in the literature.[3][4] More details and some examples can be found in the paper.

    Additionally, as with our previous technique SPD, VPD does not [...]

    The original text contained 5 footnotes which were omitted from this narration.

    ---

    First published:

    May 5th, 2026


    Source:

    https://www.lesswrong.com/posts/eAQZaiC3PcBhS4HjM/linkpost-interpreting-language-model-parameters


    Linkpost URL:
    https://www.goodfire.ai/research/interpreting-lm-parameters

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

Weitere Gesellschaft und Kultur Podcasts

Über LessWrong (Curated & Popular)

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Podcast-Website

Höre LessWrong (Curated & Popular), Toast Hawaii und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

  • Sender und Podcasts favorisieren
  • Streamen via Wifi oder Bluetooth
  • Unterstützt Carplay & Android Auto
  • viele weitere App Funktionen

LessWrong (Curated & Popular): Zugehörige Podcasts

Rechtliches
Social
v8.8.16| © 2007-2026 radio.de GmbH
Generated: 5/11/2026 - 10:18:10 AM