How AI Is Built

Nicolay Gerold

Technologie

Neueste Episode

Verfügbare Folgen

5 von 63

#056 Building Solo: How One Engineer Uses AI Agents to Ship Production Code
Nicolay here,Most AI coding conversations focus on which model to use. This one focuses on workflow - the specific commands, git strategies, and review processes that let one engineer ship production code with AI agents doing 80% of the work.Today I have the chance to talk to Kieran Klaassen, who built Cora (an AI email management tool) almost entirely solo using AI agents.His approach: treat AI agents like junior developers you manage, not tools you operate.The key insight centers on "compound engineering" - extracting reusable systems from every code review and interaction. Instead of just reviewing pull requests, Kieran records his review sessions with his colleague, transcribes them, and feeds the transcriptions to Claude to extract coding patterns and philosophical approaches into custom slash commands.In the podcast, we also touch on:Git worktrees for running multiple AI agents simultaneouslyThe evolution from Cursor Composer to Claude Code and FridayWhy pull request review is the real bottleneck, not code generationHow to structure research phases to avoid AI going off the railsand more💡 Core ConceptsCompound Engineering: Extracting reusable systems, SOPs, and taste from every AI interaction - treating each code review or feature build as an opportunity to teach the AI your standards and decision-making patterns.Git Worktrees for AI Agents: Running multiple AI coding agents simultaneously by checking out different branches in separate file system directories, allowing parallel feature development without conflicts.Research-First AI Development: Starting every feature with a dedicated research phase where AI gathers context, explores multiple approaches, and creates detailed GitHub issues before any code is written.Tiered Code Review Systems: Implementing different review checklists and standards based on risk level (payments, migrations, etc.) with AI assistants handling initial passes before human review.📶 Connect with Kieran:X / TwitterCora📶 Connect with Nicolay:NewsletterLinkedInX / TwitterBlueskyWebsiteMy Agency Aisbach (for ai implementations / strategy)⏱️ Important MomentsThe Sonnet 3.5 Breakthrough Moment: [09:30] Kieran describes vibe-coding a Swift app in one evening, realizing AI could support solo entrepreneurship for the first time.Building Cora's First Prototype: [12:45] One night to build a prototype that drafts email responses - the moment they knew there was something special about AI handling email.The Nice, France Experiment: [13:40] Testing automatic email archiving while walking around town, discovering the "calm feeling" that became Cora's core value proposition.Git Worktrees Discovery: [50:50] How Kieran discovered worktrees by asking AI for a solution to run multiple agents simultaneously, leading to his current parallel development workflow.Cursor 3.7 Breaking Point: [19:57] The moment Cursor became unusable after shipping too many changes at once, forcing the search for better agentic tools.Friday vs Claude Code Comparison: [22:23] Why Friday's "YOLO mode" and end-to-end pull request creation felt more like having a colleague than using a tool.Compound Engineering Philosophy: [33:18] Recording code review sessions and extracting engineering taste into reusable Claude commands for future development.The Research Phase Strategy: [04:48] Why starting with comprehensive GitHub issue research prevents AI agents from going off-rails during implementation.Pull Request Review Bottleneck: [28:44] How reviewing AI-generated code, not writing it, becomes the main constraint when scaling with agents.Multiple Agent Management: [48:14] Running Claude Code work trees in parallel terminals, treating each agent as a separate team member with distinct tasks.🛠️ Tools & Tech MentionedClaude CodeCursorFriday AIGit WorktreesWarp TerminalGitHub CLICharlie AI (PR review bot)Context7 MCPAnthropic Prompt Improver
--------
1:12:24
--------
1:12:24
#055 Embedding Intelligence: AI's Move to the Edge
Nicolay here,while everyone races to cloud-scale LLMs, Pete Warden is solving AI problems by going completely offline. No network connectivity required.Today I have the chance to talk to Pete Warden, CEO of Useful Sensors and author of the TinyML book.His philosophy: if you can't explain to users exactly what happens to their data, your privacy model is broken.Key Insight: The Real World Action GapLLMs excel at text-to-text transformations but fail catastrophically at connecting language to physical actions. There's nothing in the web corpus that teaches a model how "turn on the light" maps to sending a pin high on a microcontroller.This explains why every AI agent demo focuses on booking flights and API calls - those actions are documented in text. The moment you step off the web into real-world device control, even simple commands become impossible without custom training on action-to-outcome data.Pete's company builds speech-to-intent systems that skip text entirely, going directly from audio to device actions using embeddings trained on limited action sets.💡 Core Concepts Speech-to-Intent: Direct audio-to-action mapping that bypasses text conversion, preserving ambiguity until final classification ML Sensors: Self-contained circuit boards processing sensitive data locally, outputting only simple signals without exposing raw video/audio Embedding-Based Action Matching: Vector representations mapping natural language variations to canonical device actions within constrained domains⏱ Important Moments Real World Action Problem: [06:27] LLMs discuss turning on lights but lack training data connecting text commands to device controlApple Intelligence Challenges: [04:07] Design-led culture clashes with AI accuracy limitationsSpeech-to-Intent vs Speech-to-Text: [12:01] Breaking audio into text loses critical ambiguity information Limited Action Set Strategy: [15:30] Smart speakers succeed by constraining to ~3 functions rather than infinite commands 8-Bit Quantization: [33:12] Remains deployment sweet spot - processor instruction support matters more than compression On-Device Privacy: [47:00] Complete local processing provides explainable guarantees vs confusing hybrid systems🛠 Tools & TechWhisper: github.com/openai/whisperMoonshine: github.com/usefulsensors/moonshineTinyML Book: oreilly.com/library/view/tinyml/9781492052036Stanford Edge ML: github.com/petewarden/stanford-edge-ml📚 ResourcesLooking to Listen Paper: looking-to-listen.github.ioLottery Ticket Hypothesis: arxiv.org/abs/1803.03635Connect: [email protected] | petewarden.com | usefulsensors.comBeta Opportunity: Moonshine browser implementation for client-side speech processing inJavaScript
--------
1:05:35
--------
1:05:35
#054 Building Frankenstein Models with Model Merging and the Future of AI
Nicolay here,most AI conversations focus on training bigger models with more compute. This one explores the counterintuitive world where averaging weights from different models creates better performance than expensive post-training.Today I have the chance to talk to Maxime Labonne, who's a researcher at Liquid AI and the architect of some of the most popular open source models on Hugging Face.He went from researching neural networks for cybersecurity to building "Frankenstein models" through techniques that shouldn't work but consistently do.Key Insight: Model Merging as a Free LunchThe core breakthrough is deceptively simple: take two fine-tuned models, average their weights layer by layer, and often get better performance than either individual model. Maxime initially started writing an article to explain why this couldn't work, but his own experiments convinced him otherwise.The magic lies in knowledge compression and regularization. When you train a model multiple times on similar data, each run creates slightly different weight configurations due to training noise. Averaging these weights creates a smoother optimization path that avoids local minima. You can literally run model merging on a CPU - no GPUs required.In the podcast, we also touch on:Obliteration: removing safety refusal mechanisms without retrainingWhy synthetic data now comprises 90%+ of fine-tuning datasetsThe evaluation crisis and automated benchmarks missing real-world performanceChain of thought compression techniques for reasoning models💡 Core ConceptsModel Merging: Averaging weights across layers from multiple fine-tuned models to create improved performance without additional trainingObliteration: Training-free method to remove refusal directions from models by computing activation differencesLinear Merging: The least opinionated merging technique that simply averages weights with optional scaling factorsRefusal Direction: The activation pattern that indicates when a model will output a safety refusal📶 Connect with Maxime:X / Twitter: https://x.com/maximelabonneLinkedIn: https://www.linkedin.com/in/maxime-labonne/Company: https://www.liquid.ai/📶 Connect with Nicolay:LinkedIn: https://www.linkedin.com/in/nicolay-gerold/X / Twitter: https://x.com/nicolaygeroldWebsite: https://www.nicolaygerold.com/⏱ Important MomentsModel Merging Discovery Process: [00:00:30] Maxime explains how he started writing an article to debunk model mergingTwo Main Merging Use Cases: [11:04] Clear distinction between merging checkpoints versus combining different task-specific capabilitiesLinear Merging as Best Practice: [21:00] Why simple weight averaging consistently outperforms more complex techniquesLayer Importance Hierarchy: [21:18] First and last layers have the most influence on model behaviorObliteration Technique Explained: [36:07] How to compute and subtract refusal directions from model activationsSynthetic Data Dominance: [50:00] Modern fine-tuning uses 90%+ synthetic data🛠 Tools & Tech MentionedMergeKit: https://github.com/cg123/mergekitTransformer Lens: https://github.com/TransformerLensOrg/TransformerLensHugging Face Transformers: https://github.com/huggingface/transformersPyTorch: https://pytorch.org/📚 Recommended ResourcesMaxime's Model Merging Articles: https://huggingface.co/blog/mergeModel Soups Paper: https://arxiv.org/abs/2203.05482Will Brown's Rubric Engineering: https://x.com/willccbb/status/1883611121577517092
--------
1:06:55
--------
1:06:55
#053 AI in the Terminal: Enhancing Coding with Warp
Nicolay here,Most AI coding tools obsess over automating everything. This conversation focuses on the rightbalance between human skill and AI assistance - where manual context beats web search every time.Today I have the chance to talk to Ben Holmes, a software engineer at Warp, where they're building theAI-first terminal.Manual context engineering trumps automated web search for getting accurate results fromcoding assistants.Key Insight ExpansionThe breakthrough insight is brutally practical: manual context construction consistently outperformsautomated web search when working with AI coding assistants. Instead of letting your AI tool searchfor documentation, find the right pages yourself and feed them directly into the model's contextwindow.Ben demonstrated this with OpenAI's Realtime API documentation - after an hour ofback-and-forthwith web search, he manually found the correct API signatures and saved them as a reference file.When building newfeatures, he attached this curated documentation directly, resulting in immediatesuccess rather than repeated failures from outdated or incorrect search results.This approach works because you can verify documentation accuracy before feeding it to the AI, whileweb search often returns the first result regardless of quality or recency.In the podcast, we also touch on:Why React Native might become irrelevant as AI translation between native languages improvesModel-specific strengths: Gemini excels at debugging while Claude dominates function callingThe skill of working without AI assistance - "raw dogging" code for deep learningWarp's architecture using different models for planning (O1/O3) vs. coding (Claude/Gemini)💡 Core ConceptsManual Context Engineering: Curating documentation, diagrams, and reference materials directlyrather than relying on automated web search.Model-Specific Workflows: Matching AI models to their strengths - O1 for planning, Claude forfunction calling, Gemini for debugging.Raw Dog Programming: Coding without AI assistance to build fundamental skills in codebasenavigation and problem-solving.Agent Mode Architecture: Multi-model system where Claude orchestrates task distribution tospecialized agents through function calls.📶 Connect with Ben:Twitter/X, YouTube, Discord (Warp Community), Website📶 Connect with Nicolay:LinkedIn, X/Twitter, Bluesky, Website, [email protected]⏱ Important MomentsReact Native's Potential Obsolescence: [08:42] AI translation between native languages couldeliminate cross-platform frameworksManual vs Automated Context: [51:42] Why manually curating documentation beats AI websearchRaw Dog Programming Benefits: [12:00] Value of coding without AI assistance during Ben's firstweek at WarpModel-Specific Strengths: [26:00] Gemini's superior debugging vs Claude's speculative codefixesOpenAI Desktop App Advantage: [13:44] Outperforms Cursor for reading long filesWarp's Multi-Model Architecture: [31:00] How Warp uses O1/O3 for planning, Claude fororchestrationFunction Calling Accuracy: [28:30] Claude outperforms other models at chaining function callsAI as Improv Partner: [56:06] Current AI says "yes and" to everything rather than pushing back🛠 Tools & Tech MentionedWarp Terminal, OpenAI Desktop App, Cursor, Cline, Go by Example, OpenAI Realtime API, MCP📚 Recommended ResourcesWarp Discord Community, Ben's YouTube Channel, Go Programming Documentation🔮 What's NextNext week, we continue exploring production AI implementations with more insights into gettinggenerative AI systems deployed effectively.💬 Join The ConversationFollow How AI Is Built on YouTube, Bluesky, or Spotify. Discord coming soon!♻ Building the platform for engineers to share production experience. Pay it forward by sharing withone engineer facing similar challenges.♻
--------
1:04:30
--------
1:04:30
#052 Don't Build Models, Build Systems That Build Models
Nicolay here,Today I have the chance to talk to Charles from Modal, who went from doing a PhD on neural network optimization in the 2010s - when ML engineers could build models with a soldering iron and some sticks - to architecting serverless infrastructure for AI models. Modal is about removing barriers so anyone can spin up a hundred GPUs in seconds.The critical insight that stuck with me: "Don't build models, build systems that build models." Organizations often make the mistake of celebrating a one-time fine-tuned model that matches GPT-4 performance only to watch it become obsolete when the next foundation model arrives - typically three to six months down the road.Charles's approach to infrastructure is particularly unconventional. He argues that serverless isn't just about convenience - it fundamentally changes how ambitious you can be with scale. "There's so much that gets in the way of trying to spin up a hundred GPUs or a thousand CPU containers that people just don't think to do something big."The winning approach involves automated data pipelines with feedback collection, continuous evaluation against new foundation models, AB testing and canary deployments, and systematic error analysis and retraining.In the podcast, we also cover:Why inference, not training, is where the money is madeHow to rethink compute when moving from traditional cloud to serverlessThe economics of automated resource managementWhy task decomposition is the key ML engineering skillWhen to earn the right to fine-tune versus using foundation models*📶 Connect with Charles:*Twitter - https://twitter.com/charlesirl Modal Labs - https://modal.com Modal Slack Community - https://modal.com/slack *📶 Connect with Nicolay:*LinkedIn - https://linkedin.com/in/nicolay-gerold/ X / Twitter - https://x.com/nicolaygerold Bluesky - https://bsky.app/profile/nicolaygerold.com Website - https://nicolaygerold.com/ My Agency Aisbach - https://aisbach.com/ (for ai implementations / strategy)*⏱️ Important Moments*From CUDA to Serverless: [00:01:38] Charles's journey from PhD neural network optimization to building Modal's serverless infrastructure.Rethinking Scale Ambition: [00:01:38] "There's so much that gets in the way of trying to spin up a hundred GPUs that people just don't think to do something big."The Economics of Serverless: [00:04:09] How automated resource management changes the cattle vs pets paradigm for GPU workloads.Lambda vs Modal Philosophy: [00:04:20] Why Modal was designed for tasks that take bytes and emit megabytes, unlike Lambda's middleware focus.Inference Economics Reality: [00:10:16] "Almost nobody gets paid to make models - organizations get paid to make predictions."The Open Source Commoditization: [00:14:55] How foundation models are becoming undifferentiated capabilities like databases.Task Decomposition as Core Skill: [00:22:00] Why breaking down problems is equivalent to recognizing API boundaries in software engineering.Systems That Build Models: [00:33:31] The critical difference between delivering static weights versus repeatable model production systemsEarning the Right to Fine-Tune: [00:34:06] The infrastructure prerequisites needed before attempting model customization.Multi-Node Training Challenges: [00:52:24] How serverless platforms handle the contradiction of high-performance computing with spiky demand.*🛠️ Tools & Tech Mentioned*Modal - https://modal.com (serverless GPU infrastructure) AWS Lambda - https://aws.amazon.com/lambda/ (traditional serverless)Kubernetes - https://kubernetes.io/ (container orchestration)Temporal - https://temporal.io/ (workflow orchestration)Weights & Biases - https://wandb.ai/ (experiment tracking)Hugging Face - https://huggingface.co/ (model repository)PyTorch Distributed - https://pytorch.org/tutorials/intermediate/ddp_tutorial.html (multi-GPU training)Redis - https://redis.io/ (caching and queues)*📚 Recommended Resources*Full Stack Deep Learning - https://fullstackdeeplearning.com/ (deployment best practices)Modal Documentation - https://modal.com/docs (getting started guide)Deep Seek Paper - https://arxiv.org/abs/2401.02954 (disaggregated inference patterns)AI Engineer Summit - https://ai.engineer/ (community events)MLOps Community - https://mlops.community/ (best practices)💬 Join The ConversationFollow How AI Is Built on YouTube - https://youtube.com/@howaiisbuilt, Bluesky - https://bsky.app/profile/howaiisbuilt.fm, or Spotify - https://open.spotify.com/show/3hhSTyHSgKPVC4sw3H0NUc?_authfailed=1%29 If you have any suggestions for future guests, feel free to leave it in the comments or write me (Nicolay) directly on LinkedIn - https://linkedin.com/in/nicolay-gerold/, X - https://x.com/nicolaygerold, or Bluesky - https://bsky.app/profile/nicolaygerold.com. Or at [email protected]. I will be opening a Discord soon to get you guys more involved in the episodes! Stay tuned for that.
--------
59:22
--------
59:22

Weitere Technologie Podcasts

Über How AI Is Built

Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.

Podcast-Website

Technologie