Partner im RedaktionsNetzwerk Deutschland
PodcastsTechnologieHow AI Is Built
Höre How AI Is Built in der App.
Höre How AI Is Built in der App.
(256.086)(250.186)
Sender speichern
Wecker
Sleeptimer

How AI Is Built

Podcast How AI Is Built
Nicolay Gerold
Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out th...

Verfügbare Folgen

5 von 45
  • Context is King: How Knowledge Graphs Help LLMs Reason
    Robert Caulk runs Emergent Methods, a research lab building news knowledge graphs. With a Ph.D. in computational mechanics, he spent 12 years creating open-source tools for machine learning and data analysis. His work on projects like Flowdapt (model serving) and FreqAI (adaptive modeling) has earned over 1,000 academic citations.His team built AskNews, which he calls "the largest news knowledge graph in production." It's a system that doesn't just collect news - it understands how events, people, and places connect.Current AI systems struggle to connect information across sources and domains. Simple vector search misses crucial relationships. But building knowledge graphs at scale brings major technical hurdles around entity extraction, relationship mapping, and query performance.Emergent Methods built a hybrid system combining vector search and knowledge graphs:Vector DB (Quadrant) handles initial broad retrievalCustom knowledge graph processes relationshipsTranslation pipeline normalizes multi-language contentEntity extraction model identifies key elementsContext engineering pipeline structures data for LLMsImplementation Details:Data Pipeline:All content normalized to English for consistent embeddingsEntity names preserved in original language when untranslatableCustom Gleiner News model handles entity extractionRetrained every 6 months on fresh dataHuman review validates entity accuracyEntity Management:Base extraction uses BERT-based Gleiner architectureTrained on diverse data across topics/regionsDisambiguation system merges duplicate entitiesManual override options for analystsMetadata tracking preserves relationship contextKnowledge Graph:Selective graph construction from vector resultsOn-demand relationship processingGraph queries via standard CypherBuilt for specific use cases vs general coverageIntegration with S3 and other data storesSystem Validation:Custom "Context is King" benchmark suiteRAGAS metrics track retrieval accuracyTime-split validation prevents data leakageManual review of entity extractionProduction monitoring of query patternsEngineering Insights:Key Technical Decisions:English normalization enables consistent embeddingsHybrid vector + graph approach balances speed/depthSelective graph construction keeps costs downHuman-in-loop validation maintains qualityDead Ends Hit:Full multi-language entity system too complexReal-time graph updates not feasible at scalePure vector or pure graph approaches insufficientTop Quotes:"At its core, context engineering is about how we feed information to AI. We want clear, focused inputs for better outputs. Think of it like talking to a smart friend - you'd give them the key facts in a way they can use, not dump raw data on them." - Robert"Strong metadata paints a high-fidelity picture. If we're trying to understand what's happening in Ukraine, we need to know not just what was said, but who said it, when they said it, and what voice they used to say it. Each piece adds color to the picture." - Robert"Clean data beats clever models. You can throw noise at an LLM and get something that looks good, but if you want real accuracy, you need to strip away the clutter first. Every piece of noise pulls the model in a different direction." - Robert"Think about how the answer looks in the real world. If you're comparing apartments, you'd want a table. If you're tracking events, you'd want a timeline. Match your data structure to how humans naturally process that kind of information." - Nico"Building knowledge graphs isn't about collecting everything - it's about finding the relationships that matter. Most applications don't need a massive graph. They need the right connections for their specific problem." - Robert"The quality of your context sets the ceiling for what your AI can do. You can have the best model in the world, but if you feed it noisy, unclear data, you'll get noisy, unclear answers. Garbage in, garbage out still applies." - Robert"When handling multiple languages, it's better to normalize everything to one language than to try juggling many. Yes, you lose some nuance, but you gain consistency. And consistency is what makes these systems reliable." - Robert"The hard part isn't storing the data - it's making it useful. Anyone can build a database. The trick is structuring information so an AI can actually reason with it. That's where context engineering makes the difference." - Robert"Start simple, then add complexity only when you need it. Most teams jump straight to sophisticated solutions when they could get better results by just cleaning their data and thinking carefully about how they structure it." - Nico"Every token in your context window is precious. Don't waste them on HTML tags or formatting noise. Save that space for the actual signal - the facts, relationships, and context that help the AI understand what you're asking." - NicoRobert Caulk:LinkedInEmergent MethodsAsknewsNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Context Engineering 00:24 Curating Input Signals 01:01 Structuring Raw Data 03:05 Refinement and Iteration 04:08 Balancing Breadth and Precision 06:10 Interview Start 08:02 Challenges in Context Engineering 20:25 Optimizing Context for LLMs 45:44 Advanced Cypher Queries and Graphs 46:43 Enrichment Pipeline Flexibility 47:16 Combining Graph and Semantic Search 49:23 Handling Multilingual Entities 52:57 Disambiguation and Deduplication Challenges 55:37 Training Models for Diverse Domains 01:04:43 Dealing with AI-Generated Content 01:17:32 Future Developments and Final Thoughts
    --------  
    1:33:35
  • Inside Vector Database Quantization: Product, Binary, and Scalar | S2 E23
    When you store vectors, each number takes up 32 bits.With 1000 numbers per vector and millions of vectors, costs explode.A simple chatbot can cost thousands per month just to store and search through vectors.The Fix: QuantizationThink of it like image compression. JPEGs look almost as good as raw photos but take up far less space. Quantization does the same for vectors.Today we are back continuing our series on search with Zain Hasan, a former ML engineer at Weaviate and now a Senior AI/ ML Engineer at Together. We talk about the different types of quantization, when to use them, how to use them, and their tradeoff. Three Ways to Quantize:Binary Quantization Turn each number into just 0 or 1Ask: "Is this dimension positive or negative?"Works great for 1000+ dimensionsCuts memory by 97%Best for normally distributed dataProduct Quantization Split vector into chunksGroup similar chunksStore cluster IDs instead of full numbersGood when binary quantization failsMore complex but flexibleScalar Quantization Use 8 bits instead of 32Simple middle groundKeeps more precision than binaryLess savings than binaryKey Quotes:"Vector databases are pretty much the commercialization and the productization of representation learning.""I think quantization, it builds on the assumption that there is still noise in the embeddings. And if I'm looking, it's pretty similar as well to the thought of Matryoshka embeddings that I can reduce the dimensionality.""Going from text to multimedia in vector databases is really simple.""Vector databases allow you to take all the advances that are happening in machine learning and now just simply turn a switch and use them for your application."Zain Hasan:LinkedInX (Twitter)WeaviateTogetherNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)vector databases, quantization, hybrid search, multi-vector support, representation learning, cost reduction, memory optimization, multimodal recommender systems, brain-computer interfaces, weather prediction models, AI applications
    --------  
    52:12
  • Local-First Search: How to Push Search To End-Devices | S2 E22
    Alex Garcia is a developer focused on making vector search accessible and practical. As he puts it: "I'm a SQLite guy. I use SQLite for a lot of projects... I want an easier vector search thing that I don't have to install 10,000 dependencies to use.”Core Mantra: "Simple, Local, Scalable"Why SQLite Vec?"I didn't go along thinking, 'Oh, I want to build vector search, let me find a database for it.' It was much more like: I use SQLite for a lot of projects, I want something lightweight that works in my current workflow."SQLiteVec uses row-oriented storage with some key design choices:Vectors are stored in large chunks (megabytes) as blobsData is split across 4KB SQLite pages, which affects analytical performanceCurrently uses brute force linear search without ANN indexingSupports binary quantization for 32x size reductionHandles tens to hundreds of thousands of vectors efficientlyPractical limits:500ms search time for 500K vectors (768 dimensions)Best performance under 100ms for user experienceBinary quantization enables scaling to ~1M vectorsMetadata filtering and partitioning coming soonKey advantages:Fast writes for transactional workloadsSimple single-file databaseEasy integration with existing SQLite applicationsLeverages SQLite's mature storage engineGarcia's preferred tools for local AI:Sentence Transformers models converted to GGUF formatLlama.cpp for inferenceSmall models (30MB) for basic embeddingsLarger models like Arctic Embed (hundreds of MB) for recent topicsSQLite L-Embed extension for text embeddingsTransformers.js for browser-based implementations1. Choose Your Storage"There's two ways of storing vectors within SQLiteVec. One way is a manual way where you just store a JSON array... [second is] using a virtual table."Traditional row storage: Simple, flexible, good for small vectorsVirtual table storage: Optimized chunks, better for large datasetsPerformance sweet spot: Up to 500K vectors with 500ms search time2. Optimize Performance"With binary quantization it's 1/32 of the space... and holds up at 95 percent quality"Binary quantization reduces storage 32x with 95% qualityDefault page size is 4KB - plan your vector storage accordinglyMetadata filtering dramatically improves search speed3. Integration Patterns"It's a single file, right? So you can like copy and paste it if you want to make a backup."Two storage approaches: manual columns or virtual tablesEasy backups: single file databaseCross-platform: desktop, mobile, IoT, browser (via WASM)4. Real-World Tips"I typically choose the really small model... it's 30 megabytes. It quantizes very easily... I like it because it's very small, quick and easy."Start with smaller, efficient models (30MB range)Use binary quantization before trying complex solutionsPlan for partitioning when scaling beyond 100K vectorsAlex GarciaLinkedInX (Twitter)GitHubsqlite-vecsqllite-vssWebsiteNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)
    --------  
    53:09
  • AI-Powered Search: Context Is King, But Your RAG System Ignores Two-Thirds of It | S2 E21
    Today, I (Nicolay Gerold) sit down with Trey Grainger, author of the book AI-Powered Search. We discuss the different techniques for search and recommendations and how to combine them.While RAG (Retrieval-Augmented Generation) has become a buzzword in AI, Trey argues that the current understanding of "RAG" is overly simplified – it's actually a bidirectional process he calls "GARRAG," where retrieval and generation continuously enhance each other.Trey uses a three context framework for search architecture:Content Context: Traditional document understanding and retrievalUser Context: Behavioral signals driving personalization and recommendationsDomain Context: Knowledge graphs and semantic understandingTrey shares insights on:Why collecting and properly using user behavior signals is crucial yet often overlookedHow to implement "light touch" personalization without trapping users in filter bubblesThe evolution from simple vector similarity to sophisticated late interaction modelsWhy treating search as a non-linear pipeline with feedback loops leads to better resultsFor engineers building search systems, Trey offers practical advice on choosing the right tools and techniques, from traditional search engines like Solr and Elasticsearch to modern approaches like ColBERT.Also how to layer different techniques to make search tunable and debuggable.Quotes:"I think of whether it's search or generative AI, I think of all of these systems as nonlinear pipelines.""The reason we use retrieval when we're working with generative AI is because A generative AI model these LLMs will take your query, your request, whatever you're asking for. They will then try to interpret them and without access to up to date information, without access to correct information, they will generate a response from their highly compressed understanding of the world. And so we use retrieval to augment them with information.""I think the misconception is that, oh, hey, for RAG I can just, plug in a vector database and a couple of libraries and, a day or two later everything's magically working and I'm off to solve the next problem. Because search and information retrieval is one of those problems that you never really solve. You get it, good enough and quit, or you find so much value in it, you just continue investing to constantly make it better.""To me, they're, search and recommendations are fundamentally the same problem. They're just using different contexts.""Anytime you're building a search system, whether it's traditional search, whether it's RAG for generative AI, you need to have all three of those contexts in order to effectively get the most relevant results to solve solve the problem.""There's no better way to make your users really angry with you than to stick them in a bucket and get them stuck in that bucket, which is not their actual intent."Trey Grainger:LinkedInAI Powered Search (Community)AI Powered Search (Book)Nicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)00:00 Introduction to Search Challenges 00:50 Layered Approach to Ranking 01:00 Personalization and Signal Boosting 02:25 Broader Principles in Software Engineering 02:51 Interview with Trey Greinger 03:32 Understanding RAG and Retrieval 04:35 Nonlinear Pipelines in Search 06:01 Generative AI and Retrieval 08:10 Search Renaissance and AI 10:27 Misconceptions in AI-Powered Search 18:12 Search vs. Recommendation Systems 22:26 Three Buckets of Relevance 38:19 Traditional Learning to Rank 39:11 Semantic Relevance and User Behavior 39:53 Layered Ranking Algorithms 41:40 Personalization in Search 43:44 Technological Setup for Query Understanding 48:21 Personalization and User Behavior Vectors 52:10 Choosing the Right Search Engine 56:35 Future of AI-Powered Search 01:00:48 Building Effective Search Applications 01:06:50 Three Critical Context Frameworks 01:12:08 Modern Search Systems and Contextual Understanding 01:13:37 Conclusion and Recommendations
    --------  
    1:14:24
  • Chunking for RAG: Stop Breaking Your Documents Into Meaningless Pieces | S2 E20
    Today we are back continuing our series on search. We are talking to Brandon Smith, about his work for Chroma. He led one of the largest studies in the field on different chunking techniques. So today we will look at how we can unfuck our RAG systems from badly chosen chunking hyperparameters.The biggest lie in RAG is that semantic search is simple. The reality is that it's easy to build, it's easy to get up and running, but it's really hard to get right. And if you don't have a good setup, it's near impossible to debug. One of the reasons it's really hard is actually chunking. And there are a lot of things you can get wrong.And even OpenAI boggled it a little bit, in my opinion, using an 800 token length for the chunks. And this might work for legal, where you have a lot of boilerplate that carries little semantic meaning, but often you have the opposite. You have very information dense content and imagine fitting an entire Wikipedia page into the size of a tweet There will be a lot of information that's actually lost and that's what happens with long chunks The next is overlap openai uses a foreign token overlap or used to And what this does is actually we try to bring the important context into the chunk, but in reality, we don't really know where the context is coming from.It could be from a few pages prior, not just the 400 tokens before. It could also be from a definition that's not even in the document at all. There is a really interesting solution actually from Anthropic Contextual Retrieval, where you basically pre process all the chunks to see whether there is any missing information and you basically try to reintroduce it.Brandon Smith:LinkedInX (Twitter)WebsiteChunking ArticleNicolay Gerold:⁠LinkedIn⁠⁠X (Twitter)Website00:00 The Biggest Lie in RAG: Semantic Search Simplified 00:43 Challenges in Chunking and Overlap 01:38 Introducing Brandon Smith and His Research 02:05 The Motivation and Mechanics of Chunking 04:40 Issues with Current Chunking Methods 07:04 Optimizing Chunking Strategies 23:04 Introduction to Chunk Overlap 24:23 Exploring LLM-Based Chunking 24:56 Challenges with Initial Approaches 28:17 Alternative Chunking Methods 36:13 Language-Specific Considerations 38:41 Future Directions and Best Practices
    --------  
    49:13

Weitere Technologie Podcasts

Über How AI Is Built

Real engineers. Real deployments. Zero hype. We interview the top engineers who actually put AI in production. Learn what the best engineers have figured out through years of experience. Hosted by Nicolay Gerold, CEO of Aisbach and CTO at Proxdeal and Multiply Content.
Podcast-Website

Hören Sie How AI Is Built, Bits und so und viele andere Podcasts aus aller Welt mit der radio.de-App

Hol dir die kostenlose radio.de App

  • Sender und Podcasts favorisieren
  • Streamen via Wifi oder Bluetooth
  • Unterstützt Carplay & Android Auto
  • viele weitere App Funktionen
Rechtliches
Social
v7.6.0 | © 2007-2025 radio.de GmbH
Generated: 2/7/2025 - 1:20:47 AM