Build AI - Word of Lore

Build AI · Dec 11 · 1 min

LLM-as-a-judge: the measurement problem

You've built something and you need to know if it works. So you do what's sensible—you ask an LLM to grade it. Factual accuracy, code quality, agent outputs. The machine judges the machine, and you get a number you can act on. Except that number
Continue reading
Build AI · Dec 8 · 1 min

Claude Opus 4.5: effort control

Claude Opus 4.5 is the newest brainchild from Anthropic, the folks behind the Claude language models. Think of it as their latest and smartest tool for handling really complicated tasks—like having an assistant who can juggle lots of jobs at once, and still keep everything running smoothly. So,
Continue reading
Weekly Edition · Build AI · Dec 2 · 7 min

Apps SDK Brings Custom UI to ChatGPT

This week's edition covers building custom interfaces in ChatGPT, Google's Veo 3.1 video generation with native audio, multi-turn agent evaluation, and monitoring agent reasoning.
Continue reading
Build AI · Nov 15 · 2 min

OpenAI: apps inside ChatGPT

OpenAI has just launched something called the Apps SDK, and it’s a bit like giving developers a new set of building blocks for ChatGPT. Instead of just chatting, you can now create apps that live right inside the conversation, with their own custom look and feel. The SDK builds
Continue reading
Build AI · Nov 12 · 2 min

Veo 3.1: native audio and reference controls

Veo is Google's latest attempt to teach computers how to make videos from scratch. Now in version 3.1, it's available for anyone willing to pay for early access, either through Google AI Studio or Vertex AI. You can choose between the regular version or a
Continue reading
Build AI · Nov 7 · 1 min

Pydantic: Evals

Pydantic Evals is a tool for Python that lets you watch, step by step, how your AI agents go about solving problems. It’s made by the same people who built the popular Pydantic data validation library. What makes it interesting is that it doesn’t just check if your
Continue reading
Build AI · Nov 5 · 1 min

LangSmith: multi-turn evaluation

Imagine you’re chatting with an AI, asking it to help you book a flight. It might give you the right answer to every single question you ask, but somehow, you still end up without a ticket. That’s where multi-turn evaluations come in. Instead of just checking if each
Continue reading
Weekly Edition · Build AI · Nov 4 · 14 min

Give Claude Memory and Skills via API

This week's edition covers Anthropic's new memory and Agent Skills APIs for building agents, Karpathy's transparent LLM training pipeline, on-device inference with Windows ML, and circuit-based interpretability tools that cut data requirements by 150x.
Continue reading
Build AI · Nov 3 · 2 min

Karpathy: nanochat

nanochat is Karpathy’s attempt to strip LLM training down to its bare essentials. It’s about 8,000 lines of code, and it’s designed to be read and understood, not just run. Unlike the big, complicated frameworks you find in production, this one is all about showing you
Continue reading
Build AI · Oct 31 · 2 min

The health tech paradox

Picture this: you buy a shiny new health gadget that claims it will look after you, no effort required. It sounds like the dream. But there’s a problem. Even the most hands-off technology still asks something from you. Mild cognitive impairment, or MCI, is when your memory and thinking
Continue reading
Build AI · Oct 30 · 2 min

Google Research: the language of biology

Imagine you could talk to cells and ask them what they’re up to. That’s more or less what Cell2Sentence-Scale (C2S-Scale) lets you do. Built by Google Research and Yale, it’s an open-source model that takes the huge, messy data from single-cell RNA sequencing—basically, a readout of
Continue reading
Build AI · Oct 23 · 2 min

Anthropic: Agent Skills

Agent Skills is a new way for Claude to learn new tricks. Imagine you could hand Claude a folder full of instructions, code, and resources, and Claude would know exactly when to use them. That’s what Agent Skills does: it lets you teach Claude how to handle specific jobs,
Continue reading
Build AI · Oct 22 · 2 min

SAE steering: Delta Token Confidence

Imagine you’re trying to understand what’s going on inside an AI’s mind. Sparse Autoencoders, or SAEs, are a tool that lets us break down the AI’s thoughts into features we can actually make sense of. Developers use these features to guide what the model does—whether
Continue reading
Build AI · Oct 21 · 2 min

Gemini: computer use

Google DeepMind has just released something called the Gemini 2.5 Computer Use model. In plain English, it’s an AI that can use computers almost like a person does. It can click buttons, type into forms, and scroll through pages – all those little things you do every day on
Continue reading
Build AI · Oct 21 · 2 min

Fraunhofer: circuit-based interpretability

Mechanistic interpretability is all about peering inside a neural network’s mind and asking, ‘How does this thing actually think?’ The usual way is to throw billions of words at the model and then ask another AI to explain what’s going on. That’s slow, expensive, and often gives
Continue reading
Build AI · Oct 17 · 2 min

Claude: memory tool

Imagine you’re building something with Claude, and you want it to remember things from one conversation to the next. Anthropic has just released a memory tool for their Claude API that lets you do exactly that. Instead of Claude forgetting everything between sessions, you can now give it a
Continue reading
Build AI · Oct 15 · 1 min

Harvard: ML systems textbook

Imagine you want to build machine learning systems that actually work in the real world, not just in a classroom. Harvard has put together a textbook for exactly that. It’s based on their CS249r course, and while the full book comes out in 2026, you can already download the
Continue reading
Build AI · Oct 15 · 1 min

Windows ML: on-device inference

Windows ML is a tool from Microsoft that lets developers run AI models right on your own computer. Instead of sending your data off to some distant server, everything happens right there on your device. The big news is that Windows ML is now ready for everyone to use. Developers
Continue reading
Build AI · Oct 8 · 1 min

OpenAI Open-Sources Agentic Commerce Protocol: A Standard for AI Transactions

The Agentic Commerce Protocol is a set of rules that lets AI agents buy things for you. OpenAI and Stripe built it together, along with some merchants. The idea is simple: it tells AIs, users, and businesses how to talk to each other so that buying things is easy, but
Continue reading
Build AI · Oct 2 · 1 min

Google Releases Open Protocol for Agent-Initiated Payments

The Agent Payments Protocol, or AP2, is a new set of rules for how AI agents can move money around safely. Google didn’t do this alone. They worked with more than 60 other companies—big names like Mastercard, American Express, PayPal, Coinbase, Salesforce, and ServiceNow. AP2 builds on earlier
Continue reading
Build AI · Oct 1 · 1 min

Claude Sonnet 4.5: A New AI Model That Excels at Coding and Building Agents

Claude Sonnet 4.5 is the newest AI model from Anthropic, and it’s built for people who want to create smarter apps and digital assistants. It belongs to the Claude 4 family, and you can use it through Anthropic’s website, their mobile app, or even a command-line tool
Continue reading

Posts