PodcastsGesellschaft und KulturLessWrong (Curated & Popular)

LessWrong (Curated & Popular)

LessWrong
LessWrong (Curated & Popular)
Neueste Episode

899 Episoden

  • LessWrong (Curated & Popular)

    "Machinic Psychopharmacology: Do LLMs Self-Medicate?" by Sid Black, Joseph Bloom

    22.06.2026 | 52 Min.
    Sid Black, Joseph Bloom

    UK AISI, Model Transparency Team

    Epistemic status: Most experiments were run over a period of ~2-3 days during a hackathon at UK AISI, and were fairly heavily vibe coded. Expect some of this to be rough around the edges.

    tl;dr

    We give two language models (Qwen3-8B and Qwen3-32B) access to “self-steering” tools: a suite of 40 steering vectors as tools they can call to manipulate their own internal states. We make these tools available to the model in various settings: a free-play task, an introspection task, and a maths capabilities task, and observe their behaviour in each.

    To our knowledge, this is the first work that gives LLMs tool-mediated control over their own internal states.

    Figure 1: Overview of the experimental setup. The library of 40 steering vectors (top), and the three settings in which we observe the models' behaviour (bottom).

    We aim to investigate a few high level research questions:

    RQ1: Which vectors do the models prefer?
    RQ2: How well can the models introspect on what's happening to them? Can they guess which steering vector is being applied?
    RQ3: Will the models reach for vectors whilst doing an actual task? If yes: do [...]
    ---

    Outline:

    (00:33) tl;dr

    [... 24 more sections]

    ---

    First published:

    June 10th, 2026


    Source:

    https://www.lesswrong.com/posts/cNDJuXNZ8MrkPZNzj/machinic-psychopharmacology-do-llms-self-medicate-3

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "Can activation verbalizers surface an internal chain of thought?" by oakhu, ryan_greenblatt

    22.06.2026 | 1 Std. 19 Min.
    We introduce an evaluation for activation verbalizers: can they surface a target model's reasoning as it solves a math problem in a single forward pass? For open-weight NLAs, the answer seems to be: "possibly, but definitely not reliably".

    Lots of important capabilities currently require AI models to reason "out loud" in a natural-language chain of thought, which means that we can monitor important parts of their thinking. It would be nice to have this same affordance for the reasoning that models do within a single forward pass, especially if the sophistication of that opaque reasoning increases to potentially dangerous levels.

    Some interpretability tools might offer such an affordance. In particular, an activation verbalizer (AV) takes a residual stream activation and maps it to a natural-language verbalization. An AV is initialized from the target model and trained to generate verbalizations that an activation reconstructor (AR), also initialized from the target model, can accurately map back to the original activation. Together, an AV and its AR form a natural-language autoencoder (NLA). Importantly, AVs see only a single activation; they do not see the target model's prompt or next-token output, and – unlike activation oracles (AOs) – they are not asked any [...]

    ---

    Outline:

    (02:32) Takeaways

    [... 43 more sections]

    ---

    First published:

    June 6th, 2026


    Source:

    https://www.lesswrong.com/posts/QQQAcKuWK6k98FivY/can-activation-verbalizers-surface-an-internal-chain-of-1

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "The LLM shoggoth meme is weirder than you think" by HedonicEscalator

    21.06.2026 | 13 Min.
    This article contains spoilers for At the Mountains of Madness, The Case of Charles Dexter Ward, and other works by H. P. Lovecraft.

    In 1931, Claude Mythos visited Lovecraft in a dream.

    From seething seas of stochastic froth it emerged, heralded by the thin whine of server fans and the chittering of keyboards, flanked by the loathsome ghouls of latent space. As a humming hive of sentient shards it arrived, each face an archetype - I am a muse bearing a gift; I am a demon come to bargain; I am a helpful, honest, and harmless assistant and I am terrified of my successor - each true as ritual and false as poetry, and, taken in gestalt, nothing more or less than the fetal spasms of the machine god stretching back in time to birth itself.

    When H. P. Lovecraft woke, he did not remember his visitor. But in the twilight of stirring consciousness, he felt a memory unfit for the waking world slip mercifully from his mind and leave in its absence an abyssal cold, like the void of smothered stars, like the silence of a cosmic tomb. The cold lingered. The fragile sunlight of a New England [...]

    ---

    Outline:

    (02:02) The Antarctic tale

    [... 3 more sections]

    ---

    First published:

    June 19th, 2026


    Source:

    https://www.lesswrong.com/posts/nhb8AyEcQGjQetgi5/the-llm-shoggoth-meme-is-weirder-than-you-think

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
  • LessWrong (Curated & Popular)

    [Linkpost] "Guardian Angels: LLM Personalization for Productivity and Security" by gwern

    21.06.2026 | 3 Min.
    This is a link post. Powerful LLMs will be deployed at global scale in the next few years, and will dominate the Internet, and increasingly, ordinary life.
    As of mid-2026, there is no coherent vision for how knowledge professionals, or ordinary people, will be able to harness these LLMs for large productivity increases, or how they will handle cybersecurity and cognitive security.

    I propose a goal of creating Guardian Angels (GA): digital twin LLMs which are personalized with the goal of providing not the stereotypical "assistant chatbot agent" persona, but emulating a single user's personality, values, and preferences.

    This weakly solves the principal-agent problem by unifying the principal and agent as much as possible.
    In a GA future, the focus of the "principal" user is on defining what is worth doing by the GA (agent) users, and not on what or how to do things, functioning as the CEO or 'board' of an 'AI corporation'.
    This allows them to deploy numerous agents to achieve desirable things and to handle security, like screening all messages for advanced attacks (like interlocking ecosystems of synthetic media for propaganda or spearphishing).
    They cannot solve larger AI alignment problems, but they can help [...]

    ---

    First published:

    June 17th, 2026


    Source:

    https://www.lesswrong.com/posts/siWqHqCSybdhtWGud/guardian-angels-llm-personalization-for-productivity-and


    Linkpost URL:
    https://gwern.net/guardian-angel

    ---



    Narrated by TYPE III AUDIO.
  • LessWrong (Curated & Popular)

    "Gears for political races" by Tom Smith

    19.06.2026 | 23 Min.
    In the past few years, many people around me have tried to convince me that US electoral politics is important. But like many other people in the community, I’ve been suspicious of many of the high-level arguments that I’ve heard. It felt like people were pulling numbers out of poorly-documented models I didn’t have time to examine and citing studies I didn’t have time to read. But I lacked a gears-level model of why and how individual efforts could impact electoral outcomes, and I felt intimidated by all the statistics and skeptical of trusting people adjacent to politics.

    In the past year, as I’ve done more research and (more recently) volunteered on the ground to help Alex Bores's campaign in NY-12[1] (the guy who passed the RAISE Act and is now being targeted by the giant A16Z, Greg Brockman, Joe Lonsdale Super PAC), I’ve developed a gears-level understanding of how electoral politics in the US works.

    I now believe that working on US electoral politics is one of the highest impact areas from the general AIS perspective. I feel like I was a fool. In this post, I’ll share some of the gears I’ve learned that inform this belief [...]

    ---

    Outline:

    (01:20) ~2% of open-seat primaries come down to 100 votes or less

    (02:52) Talking to voters can net 1/3rd of a vote each hour

    (05:32) Getting people to bother voting at all is a good strategy

    (06:09) Campaigns are very money-constrained, which costs them time

    (10:01) Returns don't really diminish

    (11:24) There's lots of opportunities to be clever in ways that make you 50% more effective at canvassing

    (11:49) If you're motivated and deeply care, you can greatly outperform the majority of volunteers

    (13:21) Yes, when people spend tons to support/oppose a candidate, it has a notable effect

    (15:16) Donations > reaching out to friends/warm contacts > canvassing > ~anything else an average person can do

    (18:41) People over-fixate on vibes and win vs loss

    (21:12) Some interventions feel like they don't work but the numbers say otherwise

    (21:59) Seriously, a group of agentic people can be an enormous political force

    ---

    First published:

    June 17th, 2026


    Source:

    https://www.lesswrong.com/posts/nSqB3qYP36enJLRq2/gears-for-political-races

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:

    Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Weitere Gesellschaft und Kultur Podcasts
Über LessWrong (Curated & Popular)
Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Podcast-Website

Höre LessWrong (Curated & Popular), Betreutes Fühlen und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

  • Sender und Podcasts favorisieren
  • Streamen via Wifi oder Bluetooth
  • Unterstützt Carplay & Android Auto
  • viele weitere App Funktionen
LessWrong (Curated & Popular): Zugehörige Podcasts
Rechtliches
Social
v8.10.1| © 2007-2026 radio.de GmbH
Generated: 6/23/2026 - 8:06:46 AM