This case study walks through a full reaction-yield optimization end to end, showing what the expert context looks like at three different points in the run: at setup, after a few iterations of data, and late in the optimization. It uses the Suzuki coupling benchmark that ships with the platform — a real Pd-catalyzed cross-coupling reaction — so the variables, ranges, and chemistry are concrete and reusable.
The goal is not to show a "perfect" expert context. It is to show how a competent first attempt evolves through the run as the data starts to disagree with the initial intuitions.
The Problem
A palladium-catalyzed Suzuki–Miyaura coupling between an aryl halide and phenylboronic acid, producing a biaryl product. The objective is to maximize isolated yield.
Decision variables (continuous):
Pd(PPh₃)₄ — palladium catalyst loading: 0.05–0.2 mmol
PhB(OH)₂ — phenylboronic acid: 6–24 mmol
KOH — base: 12–50 mmol
Temperature: 30–50 °C
Time: 30–120 min
Volume — solvent volume: 25–80 mL
Objective: Yield_ArPh — to be maximized.
Version 1 — Expert Context at Setup
Before the first iteration, the chemist sets up the experiment and writes an expert context that combines (a) clear problem framing, (b) per-variable mechanistic context, and (c) a small set of qualitative expectations from prior experience — phrased as hypotheses, not as facts.
The goal is to maximize the yield (Yield_ArPh) of this Suzuki coupling reaction, which forms a biaryl product from an aryl halide and phenylboronic acid. This is a Pd-catalyzed cross-coupling requiring optimization of multiple interdependent parameters.
Pd(PPh₃)₄ (catalyst): Tetrakis(triphenylphosphine)palladium(0) drives the oxidative addition step. Higher loadings generally accelerate the reaction and improve conversion, but very high loadings give diminishing returns and may favour side reactions and catalyst aggregation.
PhB(OH)₂ (boronic acid): Insufficient amounts limit conversion; large excesses can lead to homocoupling and protodeboronation. A moderate excess relative to the aryl halide usually performs best.
KOH (base): Activates the boronic acid (forming the more reactive boronate), facilitates transmetalation, and neutralizes acidic byproducts. Too little slows transmetalation; too much accelerates protodeboronation and can degrade the catalyst.
Temperature: Higher temperatures accelerate every step of the catalytic cycle, but also promote β-hydride elimination, phosphine dissociation, and catalyst decomposition. Lower temperatures give incomplete conversion.
Time: Drives the extent of conversion. Excessive times may degrade product or catalyst. The optimum depends on temperature and catalyst loading — higher T or higher Pd typically needs less time.
Volume: Inversely controls concentration. Smaller volumes mean higher effective concentrations and faster bimolecular kinetics, but can cause poor stirring or heat-transfer issues. Larger volumes dilute the system.
Hypotheses to test early. Based on related Suzuki couplings, we expect:
A relatively concentrated system (low volume) and a moderate temperature window to perform well.
Yields should improve with catalyst loading up to a plateau.
PhB(OH)₂ in moderate excess (15–20 mmol) tends to outperform either limiting or large-excess regimes.
These are guesses, not confirmed facts. If early data does not support them, this section should be revised.
Notes: Measurement noise occasionally pushes the reported yield slightly above 100% — this is normal.
What this version does well:
Frames the chemistry and the catalytic cycle — gives the AI a mechanistic anchor instead of treating the variables as black-box dials.
Each variable has a description focused on its effect, not on a specific optimal value.
Mentions a known interaction (Time depends on Temperature and Pd loading), which is the kind of cross-parameter knowledge the AI cannot infer from the bounds alone.
Phrases prior intuitions as hypotheses to test and explicitly invites revision — a small bit of writing discipline that pays off later.
After 4–6 Iterations: What the Data Showed
By the end of the initial design phase and the first few model-guided iterations, the experiment has produced enough data to start adjudicating the chemist's hypotheses:
Yields climb with catalyst loading — runs at 0.18–0.2 mmol Pd consistently outperform runs at 0.05–0.1 mmol Pd. The "plateau" hypothesis is qualitatively confirmed, but the plateau has not yet been reached at the upper bound of the range.
Volume preference is real, but milder than expected. Low-volume runs (around 25–30 mL) outperform high-volume runs, but the effect is modest. Above ~60 mL, yields tail off as expected from dilution.
The "moderate temperature" hypothesis is partially wrong. The best runs so far are at the high end of the moderate window (38–42 °C). Runs at 30–34 °C are consistently underperforming, more than the original context suggested.
PhB(OH)₂ window shifted upward. Yields are highest around 16–19 mmol; the lower end (around 10 mmol) is clearly limiting.
The chemist updates the expert context to reflect these findings.
Version 2 — Expert Context Mid-Run
The chemist keeps the chemistry framing and per-variable descriptions intact (the underlying mechanism has not changed), revises the hypothesis section to match the data, and adds a "Current focus" block to steer the next iterations.
[Background, per-variable descriptions, and notes section unchanged from Version 1.]
What the data has shown so far.
Higher Pd loading (0.18–0.2 mmol) consistently outperforms low loading; the plateau, if there is one, lies near the upper bound of the range.
The high end of the moderate-temperature window (38–42 °C) performs better than the lower end. The earlier expectation of a centred moderate optimum was not supported.
Low volume (~25–30 mL) outperforms high volume, but the effect is smaller than expected from concentration arguments alone.
PhB(OH)₂ around 16–19 mmol is performing best; the lower end of the range is clearly limiting.
KOH and Time have not yet shown a clear trend.
Current focus. The next iterations should probe the high-Pd, high-PhB(OH)₂, low-volume, 38–42 °C corner more densely, while varying KOH and Time to resolve the role of those two parameters. Lower-temperature and higher-volume runs are now lower priority and can be deprioritised.
What this version does well:
It revises the intuitions instead of leaving them in. The original hypothesis about "moderate temperature" was partially wrong; leaving it untouched would keep nudging the AI toward a region the data has already moved past.
It separates the stable layer from the steering layer. Background and per-variable mechanism stay put; the new content lives in a clearly-labelled block at the bottom.
It tells the AI what is still unknown. KOH and Time haven't shown a clear trend yet — naming that explicitly invites the AI to spend effort probing them, rather than treating them as solved.
After 10+ Iterations: Approaching the Optimum
Several more iterations narrow the picture further:
The best yields cluster in a region around Pd ≈ 0.18–0.2 mmol, Temperature ≈ 38–42 °C, Volume ≈ 25–30 mL, PhB(OH)₂ ≈ 16–19 mmol.
KOH shows a clearer pattern: yields are highest around 32–42 mmol; below 25 mmol or above 45 mmol they decline.
Time around 75–90 min performs best; shorter times leave conversion incomplete and longer times start to lose product to degradation.
The chemist is now in exploitation mode. The expert context becomes a request for refinement rather than exploration.
Version 3 — Expert Context Late in the Run
[Background, per-variable descriptions, and notes section unchanged from Version 1.]
What the data has shown.
Best yields cluster near Pd 0.18–0.2 mmol, T 38–42 °C, volume 25–30 mL, PhB(OH)₂ 16–19 mmol, KOH 32–42 mmol, Time 75–90 min.
Yields in this region are repeatedly above 95%; the optimum appears robust to small variations rather than sharply peaked.
Current focus. The optimization is now in exploitation mode. Prioritize small variations within the region identified above rather than broader exploration. We are particularly interested in:
Whether the optimum is robust to small reductions in Pd loading (a more atom-economical formulation would be desirable if yield is preserved).
Whether reaction time can be shortened toward 70 min without yield loss, since this would improve throughput.
Broad exploration of regions outside the identified optimum is no longer useful.
What this version does well:
It directs the search without restricting it. The "Current focus" block names the practical questions the chemist still cares about (cost, throughput) so the AI can propose variations that are useful, not just numerically optimal.
It explicitly says broad exploration is no longer useful. Late-run, telling the AI to stop hunting in regions the data has ruled out keeps the remaining iteration budget focused.
Takeaways from This Case Study
The chemistry framing barely changes across versions. The mechanistic description of each variable is what it is; the underlying reaction doesn't shift just because new data arrived. Stable content should stay stable.
The hypothesis layer changes a lot. The original intuition about "moderate temperature" was partially wrong — and removing it from the context was the right move once the data had spoken. Intuitions that don't survive contact with the data should not survive in the expert context either.
The "Current focus" block is where steering lives. It gives you a single section to revise as the optimization progresses, without disturbing the foundation. Treat it as the diff between what the AI knows and what you'd like it to do next.
Honesty about uncertainty is valuable signal. Saying "KOH and Time have not yet shown a clear trend" tells the AI where to spend effort. Hedge-words like "we believe", "we suspect", and "based on related work" let the AI weigh the strength of a claim alongside the data.
This pattern — stable background, revisable hypothesis layer, evolving "Current focus" — works for any single-objective reaction optimization, not just Suzuki couplings. The variables and the chemistry change; the structure of the context does not.
