Stop building demos. Start shipping production-grade AI agents that actually work when things go wrong.
You get a working demo, a nicely formatted response, and a vague sense that the next step is "just scale it up." What you don't get is any idea what happens when your agent encounters a tool that fails, when you need it to remember something across sessions, or when a stakeholder asks are you sure we should let it do that automatically?
This course fills that gap.
Building Multi-Agent AI Systems with CrewAI is five hands-on Jupyter notebooks developed for O'Reilly Live Training. Each module is self-contained, runs with real API calls, and teaches patterns you can adapt immediately.
You'll meet the three things every agent needs: a role, a goal, and a backstory. Not as abstract concepts — as parameters you'll tune until the agent actually behaves the way you expect. By the end, you'll have a working research agent and a clear mental model for everything that follows.
Agents without tools are just expensive chatbots. This module covers web search, custom tool creation (the stock analyst example is worth the price alone), and chaining three specialized agents into a single coherent workflow. You'll see exactly how context passes between agents and why sequencing decisions matter.
Two capabilities that unlock serious applications. Hierarchical crews let a manager agent delegate to specialists dynamically — no hardcoded task order. Memory gives agents continuity: short-term for the current session, long-term across runs, and entity memory for tracking specific people or objects. There's also a full RAG implementation using Chroma and VoyageAI embeddings if you need agents grounded in a document corpus.
A module a lot of courses skip entirely. You'll learn exactly when to pause execution for human approval, how to build a safe file-writer that shows previews before touching anything, and how to design multi-stage workflows with review checkpoints built in. The decision framework in this module is worth saving separately.
Configuration-driven agents via YAML so you can change behavior without touching code. A retry-with-backoff decorator for resilient tool calls. A multi-LLM strategy that routes tasks to fast (Haiku) or precise (Sonnet) models based on what they actually need. Structured logging and a monitoring wrapper that tracks execution metrics. Everything wired together in a ProductionCrewSystem class you can use as a template.
Developers who are comfortable with Python and have used an LLM API before, but haven't yet built anything with multiple agents coordinating toward a shared goal. You don't need a background in ML — the focus throughout is on architecture and practical patterns.
API costs for running through all notebooks are modest — roughly $1–3, depending on how much you experiment.
Everything here runs on Anthropic's Claude models (Sonnet 4.5 for most tasks, Haiku 4.5 where speed matters). CrewAI handles the multi-agent coordination layer. The stack is opinionated by design — fewer moving parts means the patterns are easier to understand and adapt.
These notebooks accompany the "Probability and Statistics for Everyone" live training series at data4sci.com, taught by Bruno Gonçalves — a physicist turned data scientist with positions at NYU's Center for Data Science, JPMorgan Chase, TRM Labs, among others. The same material has been delivered to thousands of attendees across multiple live cohorts.
Pay what you want — starting at $10