Chapter 01

Introduction

"Every great magic trick consists of three parts, or acts. The first part is called 'The Pledge.' The magician shows you something ordinary. The second act is called 'The Turn.' The magician takes the ordinary something and makes it do something extraordinary. But you wouldn't clap yet. Because making something disappear isn't enough; you have to bring it back." — The Prestige (2006)

A quantitative finance pipeline has the same three-act structure — and its third act is the one most often skipped.

The Pledge is the data: returns, prices, fundamentals, alternative sources. Cleaned, audited, in the right form. This is the act every serious team gets right; it is also the act that takes the most time to set up and the one Chapter 3 is dedicated to.

The Turn is the model: forecast, allocate, infer latent state, learn a policy. This is the act textbooks cover well, and it is what Chapters 2 and 4 through 6 develop.

The Prestige is the operationalisation: the synthetic stress test that the policy survived, the agent that wraps the pipeline in a language interface, the audit log that lets a regulator reproduce the result. Most finance textbooks stop at the Turn and leave the Prestige to folklore. This book commits to all three acts — Chapters 7 through 9 finish what the earlier chapters set up — with the same notation, the same datasets, and the same code throughout.

The cost of this breadth is real. The benefit is that by the end of the book you can build, deploy, and defend a complete financial-AI pipeline, not just one well-trained model.

The pipeline in one line

Prediction → Decision → Dynamics → Automation → Synthesis.

Forecasts (Chapter 4) feed decisions (Chapter 5); decisions live inside dynamics (Chapter 6); LLM agents and RL fine-tuning (Chapters 7–9) wrap the loop in natural-language interfaces and improve it from feedback; synthetic data (Chapter 10) closes the loop by stress-testing what has been learned.

This chapter has one section for each of those five pieces. Each section is short — the goal is to give you enough framing that the later chapters land cleanly, not to substitute for them.

How to read this book

The chapters are designed to be read in order, but each chapter index is self-contained so you can drop in to any topic and still see the plan. Three reading paths cover the most common goals:

Modelling-first (recommended for newcomers). Chapter 2 → Chapter 6 → Chapter 4 → Chapter 5, then the AI chapters last. The right path if you want to understand why a forecast moves before you act on it. Chapter 6's state-space and dynamic- factor view is the foundation everything else rests on.
Forecasting-first. Chapter 2 → Chapter 4 → Chapter 5 → Chapter 6, then Chapters 7–10. The right path if your immediate need is a defensible predictive distribution to feed an existing decision rule.
AI-and-agents-first. Chapter 7 → Chapter 8 → Chapter 9 → Chapter 10, with Chapters 4–6 as reference. The right path if the numerical core already exists and the question is what to wrap around it.

The Chapter 0 preface motivates the book; this chapter sketches the pipeline; Chapter 2 sets the mathematical conventions; Chapter 3 sets the computational ones. Skim Chapters 0–3 if the prerequisites are familiar; do not skip them — the conventions established here recur everywhere later.

Prerequisites

The book assumes:

Probability and statistics at the level of a first graduate course (joint and conditional distributions, MLE, Bayes' rule).
Linear algebra through eigenvalues and singular value decomposition.
Python at intermediate level: NumPy, pandas / Polars, basic matplotlib. Chapter 3 covers Polars in detail; the other chapters assume working comfort with the language.
One machine-learning course of exposure is sufficient. Chapter 3 has a PyTorch primer; Chapters 4–5 develop the model-specific machinery as needed.

Notation conventions

These symbols recur throughout the book and are introduced formally in their first chapter, then reused without redefinition:

Symbol	Meaning
$r_{t}$	Simple return at time $t$
$g_{t}$	Log return $lo g (1 + r_{t})$
$w_{t}$	Portfolio weight vector
$Σ$	Return covariance matrix
$F_{t}$	Information set available at time $t$
$V_{t}$ , $Q_{t} (s, a)$	Value and action-value functions in RL
$π$	Policy (a probability distribution over actions)
$f_{t}$	Latent factor vector
$η_{t}$	Innovation (shock) vector
$U (\cdot)$	Utility function

Bold lowercase letters are vectors; bold uppercase are matrices. Hatted symbols ( $\overset{μ}{^}$ , $\hat{Σ}$ ) are sample estimates of their unhatted counterparts.

Forecasting — the conditional-distribution view of prediction and the three properties (calibration, sharpness, stability) every forecast in the book has to deliver.
Optimal Decision — utility, constraints, and the static / dynamic / function-approximated trio that shapes the policy chapters.
Modelling Dynamics — why explicit dynamics matter when you want to interpret, intervene, or stress-test rather than only predict.
AI Agent — the boundary between LLM-glue tasks and the numerical core.
Synthetic Data — what generated paths buy you beyond resampling, and how they fold back into evaluation.

Forecasting

"It is difficult to make predictions, especially about the future." — attributed to Niels Bohr

Forecasting is the entry point of every quantitative pipeline in this book. Before we can talk about portfolios, dynamics, or agents, we need a defensible answer to the same primitive question: given the information set $F_{t}$ , what is the conditional distribution of the next return?

The framing matters. A forecast is not a number; it is a conditional distribution. A point estimate is one functional of that distribution (the mean), a quantile is another, a tail integral is a third. Every later layer of the book consumes some functional of $p (g_{t + h} ∣ F_{t})$ — and getting the predictive distribution right is what determines whether those downstream consumers produce sensible answers.

What a forecast is for

A forecast is a bridge between data and decision. Anything we want to do downstream — sizing a position, hedging an exposure, stress-testing a book — ultimately reduces to a question about a future quantity. Three flavours recur:

Will tomorrow's return be positive? A binary classification problem; the answer is a probability.
What is the expected log-return next month? A point-estimation problem; the answer is a mean.
What is the 5% quantile of one-week loss? A tail-quantification problem; the answer is a quantile of the predictive distribution.

These three problems are siblings, not unrelated tasks. A model that gives an honest predictive distribution can answer all three; a model trained to optimise one of them in isolation rarely transfers.

The minimum we ask of a forecast

Three properties show up in every chapter, and they are the hooks the rest of the book hangs on:

Calibration. When the model says "5% chance of a tail event," the empirical frequency in held-out data should be near 5%. We measure this with reliability diagrams and proper scoring rules (CRPS, log-score) — Chapter 2 develops the evaluation machinery.
Sharpness. Among well-calibrated forecasts, narrower predictive distributions are preferred. A forecast that is calibrated but uninformative does not move a decision.
Stability across regimes. Markets switch — calm to stressed, trending to mean-reverting. A model that wins only in the regime it was trained on is a regime-fitter, not a forecaster.

These criteria are what every forecasting recipe in Chapter 4 is benchmarked against. They are also what the synthetic-stress workflow in Chapter 10 lets us probe.

Where this leads

By the end of the forecasting track you have:

A clean return series from Chapter 2, with stylised facts (heavy tails, volatility clustering) preserved by construction.
A family of forecasters from Chapter 4 — univariate, multivariate, tree-based, deep, transformer-based, foundation — each with its scope of validity.
Probabilistic outputs that plug directly into the utility-maximising and risk-controlled decisions of Chapter 5.

Forecasting is necessary but not sufficient. The next sections in this chapter show why decisions, dynamics, and automation each need to be treated as first-class citizens, not consequences of a good forecast.

Optimal Decision

"Tomorrow is the most important thing in life. Comes into us at midnight very clean. It's perfect when it arrives and it puts itself in our hands. It hopes we've learned something from yesterday." — paraphrased from John Wayne

A forecast that nobody acts on is a curiosity, not a system. The decision layer is where the predictive distributions of the previous section turn into actual positions, with explicit accounting for utility, risk, and the cost of being wrong.

The right question is not what will happen (forecasting answers that) but what should I do, given a distribution over what will happen? That recasting is what gives the rest of Chapter 5 something to maximise.

From distribution to action

Given a predictive distribution $p (g_{t + 1} ∣ F_{t})$ from the forecasting layer and a utility $U$ over wealth or returns, the agent picks an action $a_{t} \in A$ to maximise expected utility:

a_{t}^{⋆} = ar g a \in A max E_{g \sim p (\cdot ∣ F_{t})} [U (a, g)] .

Three settings of increasing generality recur in Chapter 5:

Static, single-period. Mean–variance and CVaR-style portfolio choice. A single convex optimisation; the answer is the closed-form benchmark every later method is compared against.
Dynamic, finite-horizon. Backward induction (the principle of optimality) gives exact solutions when the state space is small. Useful as a ground-truth check on approximate methods on stylised problems.
Dynamic, function-approximated. Reinforcement learning — actor–critic, Deep Q-Networks, the recursive-utility critic — for problems where the state is high-dimensional or the dynamics are unknown.

What "optimal" actually means

"Optimal" is conditional on three modelling choices, and stating them is half the discipline of producing a defensible policy:

The utility. Logarithmic, CRRA, recursive Epstein–Zin — each encodes a different attitude toward risk and intertemporal substitution. Chapter 5-02 walks through which is appropriate when.
The constraints. Long-only, leverage caps, sector limits, turnover budgets, transaction costs. These are not optional dressing; they materially change the optimum and the stability of estimated solutions.
The reference point. Are we optimising raw returns, excess returns over a benchmark, or risk-adjusted returns? The objective changes, and so does the policy.

Three difficulties that repeat

Three problems show up in nearly every decision setting in this book, and the rest of Chapter 5 is largely about how to manage them:

Estimation noise. Mean-variance optima are notoriously sensitive to small perturbations in the input mean and covariance. Section 02-03 develops shrinkage and robust patches; Section 5-02 imports them.
Non-stationarity. A policy that was optimal last quarter may not be optimal today. Chapter 6 (dynamics) and Chapter 10 (synthetic stress) give us the tools to test policies under regime change.
Curse of dimensionality. Exact dynamic programming becomes intractable beyond a handful of state dimensions; this is precisely where Chapter 5's RL machinery earns its keep.

Chapter 5 threads these issues into a single workflow: specify utility, sample policies under both classical optimisation and RL, evaluate them out-of-sample, and report the gap between expected and realised performance honestly. The next section in this introduction explains why a good policy also needs a model of the dynamics it operates inside.

Modelling Dynamics

"All models are wrong, but some are useful." — George Box

A forecast tells you what is likely to happen next. A model of dynamics tells you why. The two are different products and conflating them produces the kind of system that performs well in-sample and falls apart the moment the regime moves.

The distinction matters because the questions a deployed quant system has to answer are not all forecasting questions. Where is the system right now? What would happen if I shocked one component? Did the underlying regime just change? These are dynamics questions, and they need a generative model of the latent process — what we will call dynamics in this book.

The state-space lens

The state-space view fixes a small set of latent variables $f_{t}$ that summarise everything relevant about the past, plus a transition map and an observation map:

f_{t + 1} = h (f_{t}, η_{t}), y_{t} = g (f_{t}) + ε_{t} .

When $h$ and $g$ are linear and Gaussian, the recursion is the classical Kalman filter. When $g$ is a neural network we recover deep state-space models. When the latent process is constrained to admit identifiable axes (a non-Gaussian innovation prior plus diagonal dynamics), we get the dynamic-factor and identifiable-VAE family that Chapter 6 develops at length.

What a "good" dynamics model adds

Beyond raw forecast accuracy, the dynamics layer is asked to deliver three things — and a deep forecaster without explicit state-space structure typically delivers none of them:

Interpretable components. Factor loadings that line up with sectors, macro themes, or risk premia, rather than rotating freely from run to run. Identifiability is the technical name for this property; Chapter 6's dynamic-factor model is built around it.
Stable inference under noise. Filtered states and uncertainty bands that respect the data-generating story rather than overfitting individual shocks. Kalman smoothing is the canonical mechanism.
Causal hooks. Coordinates that can be intervened on with a defined semantics — what happens to factor 2 if we fix factor 1 at zero? — so that scenario analysis is not just a sampling exercise.

These three properties matter most when the same system is used both for forecasting and for decision-making. Chapter 5 will plug exactly these latent states into the policy state, and Chapter 10 will use them as conditioning signals for the synthetic-data generator.

How this connects to the rest of the book

The modelling track in Chapter 6 builds up from classical state-space representations through linear dynamic factor models to deep dynamic factor models. The latent spaces it produces feed two downstream chapters:

Chapter 5 (Optimal Decision). Policies that condition on a latent state are easier to interpret and easier to constrain than policies that read raw observations directly.
Chapter 10 (Synthetic Data). Generating realistic counterfactual trajectories requires a generative model of the dynamics, not only a predictor of the next observation.

If you are reading the book non-linearly, the path Chapter 2 → Chapter 6 → Chapter 5 captures the modelling-first storyline (the recommended path in the chapter index), with forecasting and AI integration slotted in around it. The next section shows where the AI layer fits — as glue around the numerical core, not as a replacement for it.

AI Agent

"It's not the tool. It's how you use the tool." — paraphrased from countless craftsmen

Once forecasting, decisions, and dynamics are in place, a lot of the daily work in a quantitative team is glue: pulling data, running diagnostics, turning a research note into a back-test spec, summarising risk reports, drafting investment memos. Language-model agents are good at exactly this glue, and badly suited to the parts of the workflow that require numerical guarantees. The job of this section, and of Chapter 8, is to draw that line clearly.

What "agent" means here

We use agent in the LangChain/LangGraph sense: a system that wraps a language model with a fixed set of tools and a control loop, so that the model can choose actions, observe their outputs, and continue reasoning toward a goal. Concretely, an agent is the tuple $(LLM, T, C, S)$ :

$LLM$ is the underlying language model.
$T$ is the set of tools — Python execution, SQL access, retrieval over a research-note corpus, calls into the forecasting and decision models from earlier chapters.
$C$ is the control flow, often as simple as ReAct (think → act → observe) or as elaborate as a state graph in LangGraph. We deliberately use $C$ rather than $π$ here so the symbol does not collide with the policy notation Chapter 5 reserves.
$S$ is the state the agent keeps between turns: scratchpad, intermediate results, the user's goal.

Chapter 8 treats $T$ as the most important design choice. A good tool set with a mediocre control loop usually outperforms a fancy control loop over a poor tool set.

Where agents help

Three patterns recur in financial workflows and are worth wiring an agent around:

Research drafting. Pulling data, plotting, running a hypothesis test, and writing up the result. The agent is the orchestration layer; the underlying numerical work runs on deterministic tools.
Code-as-tool. Translating a researcher's English description into a forecasting model spec, a Polars query, or an evaluation pipeline — then running it under the same code paths a human would use.
Monitoring and triage. Watching for regime changes, alerting on unusual fills or risk-limit breaches, and producing a short, structured summary instead of a full incident write-up.

Where agents do not (yet) belong

Two non-uses are worth being explicit about now, because the pattern-matching from "AI is everywhere" to "let the AI run trading" is the most common mistake in this space:

The forecast itself. Numerical predictions should come from models we understand, evaluate, and version-control. The agent can call them; it should not replace them.
The execution layer. Anything with a non-trivial cost of being wrong — trade execution, capital allocation, hedging — should sit behind explicit approval gates and deterministic checks, not behind an LLM tool call.

A working rule that captures the spirit: if a step would warrant a code review when written manually, it warrants the same scrutiny when an agent writes it.

How this connects to the rest of the book

Chapter 7 covers LLM integration at the model level — reprogramming a frozen foundation model, fine-tuning approaches, and the foundation-model strategies that have stabilised since the GPT-4 era. Chapter 8 covers AI agents at the system level — the LangChain/LangGraph orchestration, tool calling, and the Model Context Protocol (MCP) plumbing that lets external systems expose themselves to a language model in a structured way. Chapter 9 covers RL fine-tuning — reward modelling, DPO, and GRPO — to improve the models and agents themselves from feedback rather than only prompting them. Together they form the automation layer that sits on top of the numerical core the rest of the book builds.

Synthetic Data

"What we observe is not nature itself, but nature exposed to our method of questioning." — Werner Heisenberg

Real market data is precious for a few reasons that all argue against using it everywhere: it is finite, it is mostly drawn from a long bull market, and the interesting tail events are exactly the regions where the data is most stingy. Synthetic data is how we fill in those gaps in a disciplined way.

The "disciplined" qualifier matters. Generating synthetic data is easy; generating synthetic data that preserves the right statistical structure for a downstream task is the hard part. Chapter 10 develops the diffusion-based generators that have become the dominant choice; this section explains why the rest of the book has been quietly saving up problems for them to solve.

Why generate, not only resample

Bootstrap and block-bootstrap methods (Section 02-05) have a place; they are easy to defend and they preserve marginals exactly. But they cannot generate scenarios that never happened. For stress testing, policy evaluation under regime shift, and counterfactual research, we need a generative model that can produce plausible but unseen trajectories.

The book uses synthetic data in three modes, each tied to a chapter that consumes it:

Stress testing. Generate paths that lie deliberately in the tail — a USD spike, a credit blow-out, a vol regime that the training set never saw — and check that the policies of Chapter 5 remain within their risk limits.
Policy evaluation. Replace held-out real trajectories with held-out synthetic trajectories from the same generative model to estimate the variance of an estimator without burning real out-of-sample data. Chapter 5's RL benchmarks do this.
Pretraining and augmentation. When a target dataset is small (a niche fund, an emerging asset class), augment with synthetic samples that share the relevant statistical structure, then fine-tune the forecaster of Chapter 4 on the real data.

What "good" synthetic data looks like

Three properties separate a useful generator from a clean-looking toy:

Stylised-fact preservation. Heavy tails, volatility clustering, and cross-asset dependence all have to survive the generative process. A generator that produces clean i.i.d. Gaussian returns is not useful for any of the three modes above.
Conditioning support. We almost always want to condition on something — a regime indicator, a calendar event, a starting state, or even a textual scenario description from an agent (Chapter 8) — and the generator must accept that conditioning without leaking information.
Identifiable axes. Where the generator exposes latent factors, those factors should mean something stable across samples; otherwise downstream evaluation becomes a moving target. This connects back to the identifiable dynamics of Chapter 6.

Diffusion-based models have become the dominant choice — they handle conditioning well and preserve sharp tails better than the GAN/VAE approaches that came before. Chapter 10 is dedicated to them.

How this closes the loop

The chapters in this book chain into a workflow:

Chapter 2 establishes the return conventions and stylised facts.
Chapter 4 produces forecasts that respect those facts.
Chapter 5 turns forecasts into decisions under explicit utility.
Chapter 6 explains the dynamics the decisions interact with.
Chapters 7–8 wrap the whole thing in AI integration.
Chapter 10 provides synthetic counterfactual data that we feed back into Chapters 4–6 to stress-test the system end-to-end.

The loop is the point. Synthetic data is what makes it possible to test the loop without exposing capital, and what closes the gap between a model that fits history and a system that survives deployment.

Introduction

The pipeline in one line

How to read this book

Prerequisites

Notation conventions

Contents

Forecasting

What a forecast is for

The minimum we ask of a forecast

Where this leads

Optimal Decision

From distribution to action

What "optimal" actually means

Three difficulties that repeat

Modelling Dynamics

The state-space lens

What a "good" dynamics model adds

How this connects to the rest of the book

AI Agent

What "agent" means here

Where agents help

Where agents do not (yet) belong

How this connects to the rest of the book

Synthetic Data

Why generate, not only resample

What "good" synthetic data looks like

How this closes the loop