PodcastsGesellschaft und KulturLessWrong (Curated & Popular)

LessWrong (Curated & Popular)

LessWrong
LessWrong (Curated & Popular)
Neueste Episode

867 Episoden

  • LessWrong (Curated & Popular)

    "Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations" by Subhash Kantamneni, kitft, Euan Ong, Sam Marks

    08.05.2026 | 18 Min.
    Abstract

    We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text description and an activation reconstructor (AR) that maps the description back to an activation. We jointly train the AV and AR with reinforcement learning to reconstruct residual stream activations. Although we optimize for activation reconstruction, the resulting NLA explanations read as plausible interpretations of model internals that, according to our quantitative evaluations, grow more informative over training.

    We apply NLAs to model auditing. During our pre-deployment audit of Claude Opus 4.6, NLAs helped diagnose safety-relevant behaviors and surfaced unverbalized evaluation awareness—cases where Claude believed, but did not say, that it was being evaluated. We present these audit findings as case studies and corroborate them using independent methods. On an automated auditing benchmark requiring end-to-end investigation of an intentionally-misaligned model, NLA-equipped agents outperform baselines and can succeed even without access to the misaligned model's training data.

    NLAs offer a convenient interface for interpretability, with expressive natural language explanations that we can directly read. To support further work, we release training code and trained NLAs [...]

    ---

    Outline:

    (00:15) Abstract

    [... 6 more sections]

    ---

    First published:

    May 7th, 2026


    Source:

    https://www.lesswrong.com/posts/oeYesesaxjzMAktCM/natural-language-autoencoders-produce-unsupervised

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    [Linkpost] "Interpreting Language Model Parameters" by Lucius Bushnaq, Dan Braun, Oliver Clive-Griffin, Bart Bussmann, Nathan Hu, mivanitskiy, Linda Linsefors, Lee Sharkey

    07.05.2026 | 4 Min.
    This is a link post. This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it.

    VPD greatly improves on our previous techniques, Stochastic Parameter Decomposition (SPD) and Attribution-based Parameter Decomposition (APD). We think the parameter decomposition approach is now more-or-less ready to be applied at scale to models people care about.







    Importantly, we show that we can decompose attention layers, which interp methods like transcoders and SAEs have historically struggled with.





    We also build attribution graphs of the model for some prompts using causally important parameter subcomponents as the nodes, and interpret parts of them.

    While we made these graphs, we discovered that our adversarial ablation method seemed pretty important for faithfully identifying which nodes in them were causally important for computing the final output. We think this casts some doubt on the faithfulness of subnetworks found by the majority of other subnetwork identification methods in the literature.[3][4] More details and some examples can be found in the paper.

    Additionally, as with our previous technique SPD, VPD does not [...]

    The original text contained 5 footnotes which were omitted from this narration.

    ---

    First published:

    May 5th, 2026


    Source:

    https://www.lesswrong.com/posts/eAQZaiC3PcBhS4HjM/linkpost-interpreting-language-model-parameters


    Linkpost URL:
    https://www.goodfire.ai/research/interpreting-lm-parameters

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "It’s nice of you to worry about me, but I really do have a life" by Viliam

    05.05.2026 | 6 Min.
    I have two shameful secrets that I probably shouldn't talk about online:

    I love my family.
    I enjoy my hobbies.
    "What an idiot!" you probably think. "Doesn't he realize that at his next job interview, HR will probably use an AI that can match his online writing based on a short sample of written text, and when they ask 'hey AI, is this guy really 100% devoted to his job, and does he spend his entire days and nights thinking about how to make his boss more rich?', the AI will laugh and print: 'beep-boop, negative, mwa-ha-ha-ha'."

    And, hey, I get it. If I had a company, and I could choose between two people who are about equally qualified, but for one of them, working hardest for me is the true meaning of his life, while the other one only hopes to collect his salary and then go home and spend the rest of his day with his wife and children, I would also prefer to hire the former.

    Which is why so many of us pretend to be the former. Even when we are not. Because we prefer that our families not starve. Thus the job interviews [...]

    ---

    First published:

    May 4th, 2026


    Source:

    https://www.lesswrong.com/posts/qRZLEBmNtT6LBuFsE/it-s-nice-of-you-to-worry-about-me-but-i-really-do-have-a

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "Irretrievability; or, Murphy’s Curse of Oneshotness upon ASI" by Eliezer Yudkowsky

    05.05.2026 | 37 Min.
    Example 1: The Viking 1 lander

    In the 1970s, NASA sent a pair of probes to Mars, Viking 1 and Viking 2 missions, at a total cost of 1 billion dollars[1970], equivalent to about 7 billion dollars[2025]. The Viking 1 probe operated on Mars's surface for six years, before its battery began to seriously degrade.

    One might have thought a battery problem like that would spell the irrevocable end of the mission. The probe had already launched and was now on Mars, very far away and out of reach of any human technician's fixing fingers. Was it not inevitable, then, that if any kind of technical problem were to be discovered long after the space launch in August 1975, nothing could possibly be done?

    But the foresightful engineers of the Viking 1 probe had devised a plan for just this class of eventuality, which they had foreseen in general, if not in exact specifics. They had built the Viking 1 probe to accept software updates by radio receiver, transmitted from Earth.

    On November 11, 1982, Earth sent an update to the Viking 1 lander's software, intended to make sure the battery only discharged down to a minimum voltage level [...]

    ---

    Outline:

    (00:13) Example 1: The Viking 1 lander

    (04:25) Example 2: The Mars Observer

    (11:37) Example 3: The Maginot Line

    (15:37) Other supposed refutations of oneshotness

    (24:16) On the extraordinary efforts put forth to misinterpret the idea of oneshotness

    (33:52) The secret sauce of competent engineers in Murphy-cursed fields: only trying projects so incredibly straightforward as to be actually possible.

    The original text contained 7 footnotes which were omitted from this narration.

    ---

    First published:

    May 4th, 2026


    Source:

    https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "Dairy cows make their misery expensive (but their calves can’t)" by Elizabeth

    05.05.2026 | 12 Min.
    How much do cows suffer in the production of milk? I can’t answer that; understanding animal experience is hard. But I can at least provide some facts about the conditions dairy cows live in, which might be useful to you in making your own assessment. My biggest conclusion is that cows made better choices than chickens by making their misery financially costly to farmers.

    Life Cycle

    The life of a dairy cow starts as a calf. She is typically separated from her mother a few hours to a few days after birth and, to reduce disease risk, held in isolation. Cutting edge farms will sometimes house calves in pairs. This isolation is clearly stressful for a baby herd mammal and her mother, but I didn’t find any quantification of that stress that I trusted.

    Calves will be bottlefed until weaning at 6-8 weeks (4-6 months earlier than beef calves). After weaning and vaccinations they can be introduced into a herd. At large farms (where most cows live), they will move in and out of different herds through their lifecycle. This is more stressful than being embedded with your friends for life, but again, I found no [...]

    ---

    Outline:

    (00:44) Life Cycle

    (02:43) How much time do dairy cows spend outside?

    (04:21) By humaneness standard

    (06:00) When indoors, how confined are dairy cows?

    (06:33) What is the disease load of dairy cows?

    (08:15) Euthanasia

    [... 5 more sections]

    ---

    First published:

    May 3rd, 2026


    Source:

    https://www.lesswrong.com/posts/r3PKfvKCjy6jok4qm/dairy-cows-make-their-misery-expensive-but-their-calves-can

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try

Weitere Gesellschaft und Kultur Podcasts

Über LessWrong (Curated & Popular)

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Podcast-Website

Höre LessWrong (Curated & Popular), Zwischen den Zeilen und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

  • Sender und Podcasts favorisieren
  • Streamen via Wifi oder Bluetooth
  • Unterstützt Carplay & Android Auto
  • viele weitere App Funktionen

LessWrong (Curated & Popular): Zugehörige Podcasts

Rechtliches
Social
v8.8.16| © 2007-2026 radio.de GmbH
Generated: 5/8/2026 - 11:48:03 AM