PodcastsGesellschaft und KulturLessWrong (Curated & Popular)

LessWrong (Curated & Popular)

LessWrong
LessWrong (Curated & Popular)
Neueste Episode

767 Episoden

  • LessWrong (Curated & Popular)

    "Why You Don’t Believe in Xhosa Prophecies" by Jan_Kulveit

    14.2.2026 | 9 Min.
    Based on a talk at the Post-AGI Workshop. Also on Boundedly Rational

    Does anyone reading this believe in Xhosa cattle-killing prophecies?

    My claim is that it's overdetermined that you don’t. I want to explain why — and why cultural evolution running on AI substrate is an existential risk.
    But first, a detour.

    Crosses on Mountains

    When I go climbing in the Alps, I sometimes notice large crosses on mountain tops. You climb something three kilometers high, and there's this cross.

    This is difficult to explain by human biology. We have preferences that come from biology—we like nice food, comfortable temperatures—but it's unclear why we would have a biological need for crosses on mountain tops. Economic thinking doesn’t typically aspire to explain this either.

    I think it's very hard to explain without some notion of culture.

    In our paper on gradual disempowerment, we discussed misaligned economies and misaligned states. People increasingly get why those are problems. But misaligned culture is somehow harder to grasp. I’ll offer some speculation why later, but let me start with the basics.

    What Makes Black Forest Cake Fit?

    The conditions for evolution are simple: variation, differential fitness, transmission. Following Boyd and Richerson, or Dawkins [...]

    ---

    Outline:

    (00:33) Crosses on Mountains

    (04:21) The Xhosa

    (05:33) Virulence

    (07:36) Preferences All the Way Down

    ---

    First published:
    February 13th, 2026

    Source:
    https://www.lesswrong.com/posts/tz5AmWbEcMBQpiEjY/why-you-don-t-believe-in-xhosa-prophecies

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "Weight-Sparse Circuits May Be Interpretable Yet Unfaithful" by jacob_drori

    13.2.2026 | 26 Min.
    TLDR: Recently, Gao et al trained transformers with sparse weights, and introduced a pruning algorithm to extract circuits that explain performance on narrow tasks. I replicate their main results and present evidence suggesting that these circuits are unfaithful to the model's “true computations”.

    This work was done as part of the Anthropic Fellows Program under the mentorship of Nick Turner and Jeff Wu.

    Introduction

    Recently, Gao et al (2025) proposed an exciting approach to training models that are interpretable by design. They train transformers where only a small fraction of their weights are nonzero, and find that pruning these sparse models on narrow tasks yields interpretable circuits. Their key claim is that these weight-sparse models are more interpretable than ordinary dense ones, with smaller task-specific circuits. Below, I reproduce the primary evidence for these claims: training weight-sparse models does tend to produce smaller circuits at a given task loss than dense models, and the circuits also look interpretable.

    However, there are reasons to worry that these results don't imply that we're capturing the model's full computation. For example, previous work [1, 2] found that similar masking techniques can achieve good performance on vision tasks even when applied to a [...]

    ---

    Outline:

    (00:36) Introduction

    (03:03) Tasks

    (03:16) Task 1: Pronoun Matching

    (03:47) Task 2: Simplified IOI

    (04:28) Task 3: Question Marks

    (05:10) Results

    (05:20) Producing Sparse Interpretable Circuits

    (05:25) Zero ablation yields smaller circuits than mean ablation

    (06:01) Weight-sparse models usually have smaller circuits

    (06:37) Weight-sparse circuits look interpretable

    (09:06) Scrutinizing Circuit Faithfulness

    (09:11) Pruning achieves low task loss on a nonsense task

    (10:24) Important attention patterns can be absent in the pruned model

    (11:26) Nodes can play different roles in the pruned model

    (14:15) Pruned circuits may not generalize like the base model

    (16:16) Conclusion

    (18:09) Appendix A: Training and Pruning Details

    (20:17) Appendix B: Walkthrough of pronouns and questions circuits

    (22:48) Appendix C: The Role of Layernorm

    The original text contained 6 footnotes which were omitted from this narration.

    ---

    First published:
    February 9th, 2026

    Source:
    https://www.lesswrong.com/posts/sHpZZnRDLg7ccX9aF/weight-sparse-circuits-may-be-interpretable-yet-unfaithful

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "My journey to the microwave alternate timeline" by Malmesbury

    11.2.2026 | 20 Min.
    Cross-posted from Telescopic Turnip

    Recommended soundtrack for this post

    As we all know, the march of technological progress is best summarized by this meme from Linkedin:

    Inventors constantly come up with exciting new inventions, each of them with the potential to change everything forever. But only a fraction of these ever establish themselves as a persistent part of civilization, and the rest vanish from collective consciousness. Before shutting down forever, though, the alternate branches of the tech tree leave some faint traces behind: over-optimistic sci-fi stories, outdated educational cartoons, and, sometimes, some obscure accessories that briefly made it to mass production before being quietly discontinued.

    The classical example of an abandoned timeline is the Glorious Atomic Future, as described in the 1957 Disney cartoon Our Friend the Atom. A scientist with a suspiciously German accent explains all the wonderful things nuclear power will bring to our lives:

    Sadly, the glorious atomic future somewhat failed to materialize, and, by the early 1960s, the project to rip a second Panama canal by detonating a necklace of nuclear bombs was canceled, because we are ruled by bureaucrats who hate fun and efficiency.

    While the Our-Friend-the-Atom timeline remains out of reach from most [...]

    ---

    Outline:

    (02:08) Microwave Cooking, for One

    (04:59) Out of the frying pan, into the magnetron

    (09:12) Tradwife futurism

    (11:52) Youll microwave steak and pasta, and youll be happy

    (17:17) Microvibes

    The original text contained 3 footnotes which were omitted from this narration.

    ---

    First published:
    February 10th, 2026

    Source:
    https://www.lesswrong.com/posts/8m6AM5qtPMjgTkEeD/my-journey-to-the-microwave-alternate-timeline

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "Stone Age Billionaire Can’t Words Good" by Eneasz

    10.2.2026 | 23 Min.
    I was at the Pro-Billionaire march, unironically. Here's why, what happened there, and how I think it went.

    Me on the far left. From WSJ.

    I. Why?

    There's a genre of horror movie where a normal protagonist is going through a normal day in a normal life. Ten minutes into the movie his friends bring out a struggling kidnap victim to slaughter, and they look at him like this is just a normal Tuesday and he slowly realizes that either he's surrounded by complete psychopaths or the world is absolutely fucked up in some way he never imagined, and somehow this has been lost on him up until this point in his life. This kinda thing happens to me more than I’d like to admit, but normally it's in a metaphorical way. Normally.

    Sometimes I’m at the goth club, fighting back The Depression (and winning tyvm), and I’ll be involved in a conversation that veers into:

    Goth 1: Man, life's tough right now.

    Goth 2: I can’t believe we’re still letting billionaires live.

    Goth 3: Seriously, how corrupt is our government that we haven’t rounded them all up yet?

    Goth 1: Maybe we should kill them ourselves.

    Goth 2 [...]

    The original text contained 2 footnotes which were omitted from this narration.

    ---

    First published:
    February 9th, 2026

    Source:
    https://www.lesswrong.com/posts/BW89BudtySvpzpYni/stone-age-billionaire-can-t-words-good

    ---



    Narrated by TYPE III AUDIO.

    ---

    Images from the article:
  • LessWrong (Curated & Popular)

    "On Goal-Models" by Richard_Ngo

    10.2.2026 | 6 Min.
    I'd like to reframe our understanding of the goals of intelligent agents to be in terms of goal-models rather than utility functions. By a goal-model I mean the same type of thing as a world-model, only representing how you want the world to be, not how you think the world is. However, note that this still a fairly inchoate idea, since I don't actually know what a world-model is.

    The concept of goal-models is broadly inspired by predictive processing, which treats both beliefs and goals as generative models (the former primarily predicting observations, the latter primarily “predicting” actions). This is a very useful idea, which e.g. allows us to talk about the “distance” between a belief and a goal, and the process of moving “towards” a goal (neither of which make sense from a reward/utility function perspective).

    However, I’m dissatisfied by the idea of defining a world-model as a generative model over observations. It feels analogous to defining a parliament as a generative model over laws. Yes, technically we can think of parliaments as stochastically outputting laws, but actually the interesting part is in how they do so. In the case of parliaments, you have a process of internal [...]

    ---

    First published:
    February 2nd, 2026

    Source:
    https://www.lesswrong.com/posts/MEkafPJfiSFbwCjET/on-goal-models

    ---



    Narrated by TYPE III AUDIO.

Weitere Gesellschaft und Kultur Podcasts

Über LessWrong (Curated & Popular)

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Podcast-Website

Höre LessWrong (Curated & Popular), Hotel Matze und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

  • Sender und Podcasts favorisieren
  • Streamen via Wifi oder Bluetooth
  • Unterstützt Carplay & Android Auto
  • viele weitere App Funktionen

LessWrong (Curated & Popular): Zugehörige Podcasts

Rechtliches
Social
v8.5.0 | © 2007-2026 radio.de GmbH
Generated: 2/15/2026 - 2:01:34 AM