Partner im RedaktionsNetzwerk Deutschland
PodcastsGesellschaft und KulturLessWrong (Curated & Popular)

LessWrong (Curated & Popular)

LessWrong
LessWrong (Curated & Popular)
Neueste Episode

Verfügbare Folgen

5 von 578
  • “Narrow Misalignment is Hard, Emergent Misalignment is Easy” by Edward Turner, Anna Soligo, Senthooran Rajamanoharan, Neel Nanda
    Anna and Ed are co-first authors for this work. We’re presenting these results as a research update for a continuing body of work, which we hope will be interesting and useful for others working on related topics.TL;DR We investigate why models become misaligned in diverse contexts when fine-tuned on narrow harmful datasets (emergent misalignment), rather than learning the specific narrow task. We successfully train narrowly misaligned models using KL regularization to preserve behavior in other domains. These models give bad medical advice, but do not respond in a misaligned manner to general non-medical questions. We use this method to train narrowly misaligned steering vectors, rank 1 LoRA adapters and rank 32 LoRA adapters, and compare these to their generally misaligned counterparts. The steering vectors are particularly interpretable, we introduce Training Lens as a tool for analysing the revealed residual stream geometry. The general misalignment solution is consistently more [...] ---Outline:(00:27) TL;DR(02:03) Introduction(04:03) Training a Narrowly Misaligned Model(07:13) Measuring Stability and Efficiency(10:00) ConclusionThe original text contained 7 footnotes which were omitted from this narration. --- First published: July 14th, 2025 Source: https://www.lesswrong.com/posts/gLDSqQm8pwNiq7qst/narrow-misalignment-is-hard-emergent-misalignment-is-easy --- Narrated by TYPE III AUDIO. ---Images from the article:
    --------  
    11:13
  • “Chain of Thought Monitorability: A New and Fragile Opportunity for AI Safety” by Tomek Korbak, Mikita Balesni, Vlad Mikulik, Rohin Shah
    Twitter | Paper PDF Seven years ago, OpenAI five had just been released, and many people in the AI safety community expected AIs to be opaque RL agents. Luckily, we ended up with reasoning models that speak their thoughts clearly enough for us to follow along (most of the time). In a new multi-org position paper, we argue that we should try to preserve this level of reasoning transparency and turn chain of thought monitorability into a systematic AI safety agenda. This is a measure that improves safety in the medium term, and it might not scale to superintelligence even if somehow a superintelligent AI still does its reasoning in English. We hope that extending the time when chains of thought are monitorable will help us do more science on capable models, practice more safety techniques "at an easier difficulty", and allow us to extract more useful work from [...] --- First published: July 15th, 2025 Source: https://www.lesswrong.com/posts/7xneDbsgj6yJDJMjK/chain-of-thought-monitorability-a-new-and-fragile --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
    --------  
    2:15
  • “the jackpot age” by thiccythot
    This essay is about shifts in risk taking towards the worship of jackpots and its broader societal implications. Imagine you are presented with this coin flip game. How many times do you flip it? At first glance the game feels like a money printer. The coin flip has positive expected value of twenty percent of your net worth per flip so you should flip the coin infinitely and eventually accumulate all of the wealth in the world. However, If we simulate twenty-five thousand people flipping this coin a thousand times, virtually all of them end up with approximately 0 dollars. The reason almost all outcomes go to zero is because of the multiplicative property of this repeated coin flip. Even though the expected value aka the arithmetic mean of the game is positive at a twenty percent gain per flip, the geometric mean is negative, meaning that the coin [...] --- First published: July 11th, 2025 Source: https://www.lesswrong.com/posts/3xjgM7hcNznACRzBi/the-jackpot-age --- Narrated by TYPE III AUDIO. ---Images from the article:
    --------  
    12:39
  • “Surprises and learnings from almost two months of Leo Panickssery” by Nina Panickssery
    Leo was born at 5am on the 20th May, at home (this was an accident but the experience has made me extremely homebirth-pilled). Before that, I was on the minimally-neurotic side when it came to expecting mothers: we purchased a bare minimum of baby stuff (diapers, baby wipes, a changing mat, hybrid car seat/stroller, baby bath, a few clothes), I didn’t do any parenting classes, I hadn’t even held a baby before. I’m pretty sure the youngest child I have had a prolonged interaction with besides Leo was two. I did read a couple books about babies so I wasn’t going in totally clueless (Cribsheet by Emily Oster, and The Science of Mom by Alice Callahan). I have never been that interested in other people's babies or young children but I correctly predicted that I’d be enchanted by my own baby (though naturally I can’t wait for him to [...] ---Outline:(02:05) Stuff I ended up buying and liking(04:13) Stuff I ended up buying and not liking(05:08) Babies are super time-consuming(06:22) Baby-wearing is almost magical(08:02) Breastfeeding is nontrivial(09:09) Your baby may refuse the bottle(09:37) Bathing a newborn was easier than expected(09:53) Babies love faces!(10:22) Leo isn't upset by loud noise(10:41) Probably X is normal(11:24) Consider having a kid (or ten)!--- First published: July 12th, 2025 Source: https://www.lesswrong.com/posts/vFfwBYDRYtWpyRbZK/surprises-and-learnings-from-almost-two-months-of-leo --- Narrated by TYPE III AUDIO. ---Images from the article:
    --------  
    11:55
  • “An Opinionated Guide to Using Anki Correctly” by Luise
    I can't count how many times I've heard variations on "I used Anki too for a while, but I got out of the habit." No one ever sticks with Anki. In my opinion, this is because no one knows how to use it correctly. In this guide, I will lay out my method of circumventing the canonical Anki death spiral, plus much advice for avoiding memorization mistakes, increasing retention, and such, based on my five years' experience using Anki. If you only have limited time/interest, only read Part I; it's most of the value of this guide! My Most Important Advice in Four Bullets 20 cards a day — Having too many cards and staggering review buildups is the main reason why no one ever sticks with Anki. Setting your review count to 20 daily (in deck settings) is the single most important thing you can do [...] ---Outline:(00:44) My Most Important Advice in Four Bullets(01:57) Part I: No One Ever Sticks With Anki(02:33) Too many cards(05:12) Too long cards(07:30) How to keep cards short -- Handles(10:10) How to keep cards short -- Levels(11:55) In 6 bullets(12:33) End of the most important part of the guide(13:09) Part II: Important Advice Other Than Sticking With Anki(13:15) Moderation(14:42) Three big memorization mistakes(15:12) Mistake 1: Too specific prompts(18:14) Mistake 2: Putting to-be-learned information in the prompt(24:07) Mistake 3: Memory shortcuts(28:27) Aside: Pushback to my approach(31:22) Part III: More on Breaking Things Down(31:47) Very short cards(33:56) Two-bullet cards(34:51) Long cards(37:05) Ankifying information thickets(39:23) Sequential breakdowns versus multiple levels of abstraction(40:56) Adding missing connections(43:56) Multiple redundant breakdowns(45:36) Part IV: Pro Tips If You Still Havent Had Enough(45:47) Save anything for ankification instantly(46:47) Fix your desired retention rate(47:38) Spaced reminders(48:51) Make your own card templates and types(52:14) In 5 bullets(52:47) ConclusionThe original text contained 4 footnotes which were omitted from this narration. --- First published: July 8th, 2025 Source: https://www.lesswrong.com/posts/7Q7DPSk4iGFJd8DRk/an-opinionated-guide-to-using-anki-correctly --- Narrated by TYPE III AUDIO. ---Images from the article:astronomy" didn't really add any information but it was useful simply for splitting out a logical subset of information." style="max-width: 100%;" />
    --------  
    54:12

Weitere Gesellschaft und Kultur Podcasts

Über LessWrong (Curated & Popular)

Audio narrations of LessWrong posts. Includes all curated posts and all posts with 125+ karma.If you'd like more, subscribe to the “Lesswrong (30+ karma)” feed.
Podcast-Website

Höre LessWrong (Curated & Popular), Based on a true Story – Die Könige von Malle und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

  • Sender und Podcasts favorisieren
  • Streamen via Wifi oder Bluetooth
  • Unterstützt Carplay & Android Auto
  • viele weitere App Funktionen
Rechtliches
Social
v7.21.1 | © 2007-2025 radio.de GmbH
Generated: 7/19/2025 - 3:54:43 AM