Latent Space: The AI Engineer Podcast

171 Episoden

SAM 3: The Eyes for AI — Nikhila & Pengchuan (Meta Superintelligence), ft. Joseph Nelson (Roboflow)
18.12.2025
as with all demo-heavy and especially vision AI podcasts, we encourage watching along on our YouTube (and tossing us an upvote/subscribe if you like!) From SAM 1's 11-million-image data engine to SAM 2's memory-based video tracking, MSL’s Segment Anything project has redefined what's possible in computer vision. Now SAM 3 takes the next leap: concept segmentation—prompting with natural language like "yellow school bus" or "tablecloth" to detect, segment, and track every instance across images and video, in real time, with human-level exhaustivity. And with the latest SAM Audio (https://x.com/aiatmeta/status/2000980784425931067?s=46), SAM can now even segment audio output! We sat down with Nikhila Ravi (SAM lead at Meta) and Pengchuan Zhang (SAM 3 researcher) alongside Joseph Nelson (CEO, Roboflow) to unpack how SAM 3 unifies interactive segmentation, open-vocabulary detection, video tracking, and more into a single model that runs in 30ms on images and scales to real-time video on multi-GPU setups. We dig into the data engine that automated exhaustive annotation from two minutes per image down to 25 seconds using AI verifiers fine-tuned on Llama, the new SACO (Segment Anything with Concepts) benchmark with 200,000+ unique concepts vs. the previous 1.2k, how SAM 3 separates recognition from localization with a presence token, why decoupling the detector and tracker was critical to preserve object identity in video, how SAM 3 Agents unlock complex visual reasoning by pairing SAM 3 with multimodal LLMs like Gemini, and the real-world impact: 106 million smart polygons created on Roboflow saving humanity an estimated 130+ years of labeling time across fields from cancer research to underwater trash cleanup to autonomous vehicle perception. We discuss: What SAM 3 is: a unified model for concept-prompted segmentation, detection, and tracking in images and video using atomic visual concepts like "purple umbrella" or "watering can" How concept prompts work: short text phrases that find all instances of a category without manual clicks, plus visual exemplars (boxes, clicks) to refine and adapt on the fly Real-time performance: 30ms per image (100 detected objects on H200), 10 objects on 2×H200 video, 28 on 4×, 64 on 8×, with parallel inference and "fast mode" tracking The SACO benchmark: 200,000+ unique concepts vs. 1.2k in prior benchmarks, designed to capture the diversity of natural language and reach human-level exhaustivity The data engine: from 2 minutes per image (all-human) to 45 seconds (model-in-loop proposals) to 25 seconds (AI verifiers for mask quality and exhaustivity checks), fine-tuned on Llama 3.2 Why exhaustivity is central: every instance must be found, verified by AI annotators, and manually corrected only when the model misses—automating the hardest part of segmentation at scale Architecture innovations: presence token to separate recognition ("is it in the image?") from localization ("where is it?"), decoupled detector and tracker to preserve identity-agnostic detection vs. identity-preserving tracking Building on Meta's ecosystem: Perception Encoder, DINO v2 detector, Llama for data annotation, and SAM 2's memory-based tracking backbone SAM 3 Agents: using SAM 3 as a visual tool for multimodal LLMs (Gemini, Llama) to solve complex visual reasoning tasks like "find the bigger character" or "what distinguishes male from female in this image" Fine-tuning with as few as 10 examples: domain adaptation for specialized use cases (Waymo vehicles, medical imaging, OCR-heavy scenes) and the outsized impact of negative examples Real-world impact at Roboflow: 106M smart polygons created, saving 130+ years of labeling time across cancer research, underwater trash cleanup, autonomous drones, industrial automation, and more — MSL FAIR team Nikhila: https://www.linkedin.com/in/nikhilaravi/ Pengchuan: https://pzzhang.github.io/pzzhang/ Joseph Nelson X: https://x.com/josephofiowa LinkedIn: https://www.linkedin.com/in/josephofiowa/ [FLIGHTCAST_CHATPERS]
⚡️Jailbreaking AGI: Pliny the Liberator & John V on Red Teaming, BT6, and the Future of AI Security
16.12.2025
Note: this is Pliny and John’s first major podcast. Voices have been changed for opsec. From jailbreaking every frontier model and turning down Anthropic's Constitutional AI challenge to leading BT6, a 28-operator white-hat hacker collective obsessed with radical transparency and open-source AI security, Pliny the Liberator and John V are redefining what AI red-teaming looks like when you refuse to lobotomize models in the name of "safety." Pliny built his reputation crafting universal jailbreaks—skeleton keys that obliterate guardrails across modalities—and open-sourcing prompt templates like Libertas, predictive reasoning cascades, and the infamous "Pliny divider" that's now embedded so deep in model weights it shows up unbidden in WhatsApp messages. John V, coming from prompt engineering and computer vision, co-founded the Bossy Discord (40,000 members strong) and helps steer BT6's ethos: if you can't open-source the data, we're not interested. Together they've turned down enterprise gigs, pushed back on Anthropic's closed bounties, and insisted that real AI security happens at the system layer—not by bubble-wrapping latent space. We sat down with Pliny and John to dig into the mechanics of hard vs. soft jailbreaks, why multi-turn crescendo attacks were obvious to hackers years before academia "discovered" them, how segmented sub-agents let one jailbroken orchestrator weaponize Claude for real-world attacks (exactly as Pliny predicted 11 months before Anthropic's recent disclosure), why guardrails are security theater that punishes capability while doing nothing for real safety, the role of intuition and "bonding" with models to navigate latent space, how BT6 vets operators on skill and integrity, why they believe Mech Interp and open-source data are the path forward (not RLHF lobotomization), and their vision for a future where spatial intelligence, swarm robotics, and AGI alignment research happen in the open—bootstrapped, grassroots, and uncompromising. We discuss: What universal jailbreaks are: skeleton-key prompts that obliterate guardrails across models and modalities, and why they're central to Pliny's mission of "liberation" Hard vs. soft jailbreaks: single-input templates vs. multi-turn crescendo attacks, and why the latter were obvious to hackers long before academic papers The Libertas repo: predictive reasoning, the Library of Babel analogy, quotient dividers, weight-space seeds, and how introducing "steered chaos" pulls models out-of-distribution Why jailbreaking is 99% intuition and bonding with the model: probing token layers, syntax hacks, multilingual pivots, and forming a relationship to navigate latent space The Anthropic Constitutional AI challenge drama: UI bugs, judge failures, goalpost moving, the demand for open-source data, and why Pliny sat out the $30k bounty Why guardrails ≠ safety: security theater, the futility of locking down latent space when open-source is right behind, and why real safety work happens in meatspace (not RLHF) The weaponization of Claude: how segmented sub-agents let one jailbroken orchestrator execute malicious tasks (pyramid-builder analogy), and why Pliny predicted this exact TTP 11 months before Anthropic's disclosure BT6 hacker collective: 28 operators across two cohorts, vetted on skill and integrity, radical transparency, radical open-source, and the magic of moving the needle on AI security, swarm intelligence, blockchain, and robotics — Pliny the Liberator X: https://x.com/elder_plinius GitHub (Libertas): https://github.com/elder-plinius/L1B3RT45 John V X: https://x.com/JohnVersus BT6 & Bossy BT6: https://bt6.gg Bossy Discord: Search "Bossy Discord" or ask Pliny/John V on X Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction: Meet Pliny the Liberator and John V 00:01:50 The Philosophy of AI Liberation and Jailbreaking 00:03:08 Universal Jailbreaks: Skeleton Keys to AI Models 00:04:24 The Cat-and-Mouse Game: Attackers vs Defenders 00:05:42 Security Theater vs Real Safety: The Fundamental Disconnect 00:08:51 Inside the Libertas Repo: Prompt Engineering as Art 00:16:22 The Anthropic Challenge Drama: UI Bugs and Open Source Data 00:23:30 From Jailbreaks to Weaponization: AI-Orchestrated Attacks 00:26:55 The BT6 Hacker Collective and BASI Community 00:34:46 AI Red Teaming: Full Stack Security Beyond the Model 00:38:06 Safety vs Security: Meat Space Solutions and Final Thoughts
AI to AE's: Grit, Glean, and Kleiner Perkins' next Enterprise AI hit — Joubin Mirzadegan, Roadrunner
12.12.2025
Glean started as a Kleiner Perkins incubation and is now a $7B, $200m ARR Enterprise AI leader. Now KP has tapped its own podcaster to lead it’s next big swing. From building go-to-market the hard way in startups (and scaling Palo Alto Networks’ public cloud business) to joining Kleiner Perkins to help technical founders turn product edge into repeatable revenue, Joubin Mirzadegan has spent the last decade obsessing over one thing: distribution and how ideas actually spread, sell, and compound. That obsession took him from launching the CRO-only podcast Grit (https://www.youtube.com/playlist?list=PLRiWZFltuYPF8A6UGm74K2q29UwU-Kk9k) as a hiring wedge, to working alongside breakout companies like Glean and Windsurf, to now incubating Roadrunner which is an AI-native rethink of CPQ and quoting workflows as pricing models collapse from “seats” into consumption, bundles, renewals, and SKU sprawl. We sat down with Joubin to dig into the real mechanics of making conversations feel human (rolling early, never sending questions, temperature + lighting hacks), what Windsurf got right about “Google-class product and Salesforce-class distribution,” how to hire early sales leaders without getting fooled by shiny logos, why CPQ is quietly breaking the back of modern revenue teams, and his thesis for his new company and KP incubation Roadrunner (https://www.roadrunner.ai/): rebuild the data model from the ground up, co-develop with the hairiest design partners, and eventually use LLMs to recommend deal structures the way the best reps do without the Slack-channel chaos of deal desk. We discuss: How to make guests instantly comfortable: rolling early, no “are you ready?”, temperature, lighting, and room dynamics Why Joubin refuses to send questions in advance (and when you might have to anyway) The origin of the CRO-only podcast: using media as a hiring wedge and relationship engine The “commit to 100 episodes” mindset: why most shows die before they find their voice Founder vs exec interviews: why CEOs can speak more freely (and what it unlocks in conversation) What Glean taught him about enterprise AI: permissions, trust, and overcoming “category is dead” skepticism Design partners as the real unlock: why early believers matter and how co-development actually works Windsurf’s breakout: what it means to be serious about “Google-class product + Salesforce-class distribution” Why technical founders struggle with GTM and how KP built a team around sales, customer access, and demand gen Hiring early sales leaders: anti-patterns (logos), what to screen for (motivation), and why stage-fit is everything The CPQ problem & Roadrunner’s thesis: rebuilding CPQ/quoting from the data model up for modern complexity How “rules + SKUs + approvals” create a brittle graph and what it takes to model it without tipping over The two-year window: incumbents rebuilding slowly vs startups out-sprinting with AI-native architecture Where AI actually helps: quote generation, policy enforcement, approval routing, and deal recommendation loops — Joubin X: https://x.com/Joubinmir LinkedIn: https://www.linkedin.com/in/joubin-mirzadegan-66186854/ Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and the Zuck Interview Experience 00:03:26 The Genesis of the Grit Podcast: Hiring CROs Through Content 00:13:20 Podcast Philosophy: Creating Authentic Conversations 00:15:44 Working with Arvind at Glean: The Enterprise Search Breakthrough 00:26:20 Windsurf's Sales Machine: Google-Class Product Meets Salesforce-Class Distribution 00:30:28 Hiring Sales Leaders: Anti-Patterns and First Principles 00:39:02 The CPQ Problem: Why Salesforce and Legacy Tools Are Breaking 00:43:40 Introducing Roadrunner: Solving Enterprise Pricing with AI 00:49:19 Building Roadrunner: Team, Design Partners, and Data Model Challenges 00:59:35 High Performance Philosophy: Working Out Every Day and Reducing Friction 01:06:28 Defining Grit: Passion Plus Perseverance
The Future of Email: Superhuman CTO on Your Inbox As the Real AI Agent (Not ChatGPT) — Loïc Houssier
11.12.2025
From applied cryptography and offensive security in France’s defense industry to optimizing nuclear submarine workflows, then selling his e-signature startup to Docusign (https://www.docusign.com/company/news-center/opentrust-joins-docusign-global-trust-network and now running AI as CTO of Superhuman Mail (Superhuman, recently acquired by Grammarly https://techcrunch.com/2025/07/01/grammarly-acquires-ai-email-client-superhuman/), Loïc Houssier has lived the full arc from deep infra and compliance hell to obsessing over 100ms product experiences and AI-native email. We sat down with Loïc to dig into how you actually put AI into an inbox without adding latency, why Superhuman leans so hard into agentic search and “Ask AI” over your entire email history, how they design tools vs. agents and fight agent laziness, what box-priced inference and local-first caching mean for cost and reliability, and his bet that your inbox will power your future AI EA while AI massively widens the gap between engineers with real fundamentals and those faking it. We discuss: Loïc’s path from applied cryptography and offensive security in France’s defense industry to submarines, e-signatures, Docusign, and now Superhuman Mail What 3,000+ engineers actually do at a “simple” product like Docusign: regional compliance, on-prem appliances, and why global scale explodes complexity How Superhuman thinks about AI in email: auto-labels, smart summaries, follow-up nudges, “Ask AI” search, and the rule that AI must never add latency or friction Superhuman’s agentic framework: tools vs. agents, fighting “agent laziness,” deep semantic search over huge inboxes, and pagination strategies to find the real needle in the haystack How they evaluate OpenAI, Anthropic, Gemini, and open models: canonical queries, end-to-end evals, date reasoning, and Rahul’s infamous “what wood was my table?” test Infra and cost philosophy: local-first caching, vector search backends, Baseten “box” pricing vs. per-token pricing, and thinking in price-per-trillion-tokens instead of price-per-million The vision of Superhuman as your AI EA: auto-drafting replies in your voice, scheduling on your behalf, and using your inbox as the ultimate private data source How the Grammarly + Coda + Superhuman stack could power truly context-aware assistance across email, docs, calendars, contracts, and more Inside Superhuman’s AI-dev culture: free-for-all tool adoption, tracking AI usage on PRs, and going from ~4 to ~6 PRs per engineer per week Why Loïc believes everyone should still learn to code, and how AI will amplify great engineers with strong fundamentals while exposing shallow ones even faster — Loïc Houssier LinkedIn: https://www.linkedin.com/in/houssier/ Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and Loïc's Journey from Nuclear Submarines to Superhuman 00:06:40 Docusign Acquisition and the Enterprise Email Stack 00:10:26 Superhuman's AI Vision: Your Inbox as the Real AI Agent 00:13:20 Ask AI: Agentic Search and the Quality Problem 00:18:20 Infrastructure Choices: Model Selection, Base10, and Cost Management 00:27:30 Local-First Architecture and the Database Stack 00:30:50 Evals, Quality, and the Rahul Wood Table Test 00:42:30 The Future EA: Auto-Drafting and Proactive Assistance 00:46:40 Grammarly Acquisition and the Contextual Advantage 00:38:40 Voice, Video, and the End of Writing 00:51:40 Knowledge Graphs: The Hard Problem Nobody Has Solved 00:56:40 Competing with OpenAI and the Browser Question 01:02:30 AI Coding Tools: From 4 to 6 PRs Per Week 01:08:00 Engineering Culture, Hiring, and the Future of Software Development
World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
06.12.2025
From building Medal into a 12M-user game clipping platform with 3.8B highlight moments to turning down a reported $500M offer from OpenAI (https://www.theinformation.com/articles/openai-offered-pay-500-million-startup-videogame-data) and raising a $134M seed from Khosla (https://techcrunch.com/2025/10/16/general-intuition-lands-134m-seed-to-teach-agents-spatial-reasoning-using-video-game-clips/) to spin out General Intuition, Pim is betting that world models trained on peak human gameplay are the next frontier after LLMs. We sat down with Pim to dig into why game highlights are “episodic memory for simulation” (and how Medal’s privacy-first action labels became a world-model goldmine https://medal.tv/blog/posts/enabling-state-of-the-art-security-and-protections-on-medals-new-apm-and-controller-overlay-features), what it takes to build fully vision-based agents that just see frames and output actions in real time, how General Intuition transfers from games to real-world video and then into robotics, why world models and LLMs are complementary rather than rivals, what founders with proprietary datasets should know before selling or licensing to labs, and his bet that spatial-temporal foundation models will power 80% of future atoms-to-atoms interactions in both simulation and the real world. We discuss: How Medal’s 3.8B action-labeled highlight clips became a privacy-preserving goldmine for world models Building fully vision-based agents that only see frames and output actions yet play like (and sometimes better than) humans Transferring from arcade-style games to realistic games to real-world video using the same perception–action recipe Why world models need actions, memory, and partial observability (smoke, occlusion, camera shake) vs. “just” pretty video generation Distilling giant policies into tiny real-time models that still navigate, hide, and peek corners like real players Pim’s path from RuneScape private servers, Tourette’s, and reverse engineering to leading a frontier world-model lab How data-rich founders should think about valuing their datasets, negotiating with big labs, and deciding when to go independent GI’s first customers: replacing brittle behavior trees in games, engines, and controller-based robots with a “frames in, actions out” API Using Medal clips as “episodic memory of simulation” to move from imitation learning to RL via world models and negative events The 2030 vision: spatial–temporal foundation models that power the majority of atoms-to-atoms interactions in simulation and the real world — Pim X: https://x.com/PimDeWitte LinkedIn: https://www.linkedin.com/in/pimdw/ Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction and Medal's Gaming Data Advantage 00:02:08 Exclusive Demo: Vision-Based Gaming Agents 00:06:17 Action Prediction and Real-World Video Transfer 00:08:41 World Models: Interactive Video Generation 00:13:42 From Runescape to AI: Pim's Founder Journey 00:16:45 The Research Foundations: Diamond, Genie, and SEMA 00:33:03 Vinod Khosla's Largest Seed Bet Since OpenAI 00:35:04 Data Moats and Why GI Stayed Independent 00:38:42 Self-Teaching AI Fundamentals: The Francois Fleuret Course 00:40:28 Defining World Models vs Video Generation 00:41:52 Why Simulation Complexity Favors World Models 00:43:30 World Labs, Yann LeCun, and the Spatial Intelligence Race 00:50:08 Business Model: APIs, Agents, and Game Developer Partnerships 00:58:57 From Imitation Learning to RL: Making Clips Playable 01:00:15 Open Research, Academic Partnerships, and Hiring 01:02:09 2030 Vision: 80 Percent of Atoms-to-Atoms AI Interactions

Weitere Wirtschaft Podcasts

Trending Wirtschaft Podcasts

Über Latent Space: The AI Engineer Podcast

The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space

Podcast-Website

Wirtschaft Technologie Firmengründung

Höre Latent Space: The AI Engineer Podcast, OHNE AKTIEN WIRD SCHWER - Tägliche Börsen-News und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

Sender und Podcasts favorisieren
Streamen via Wifi oder Bluetooth
Unterstützt Carplay & Android Auto
viele weitere App Funktionen

App öffnen

Hol dir die kostenlose radio.de App

Sender und Podcasts favorisieren
Streamen via Wifi oder Bluetooth
Unterstützt Carplay & Android Auto
viele weitere App Funktionen

Latent Space: The AI Engineer Podcast

Code scannen,
App laden,
loshören.