Category Properties: Helping the Optimizer Understand Your Categories

Attach numerical descriptors to categorical variables so the optimizer knows how similar your options are.

When you define a categorical variable like "Solvent" with options Ethanol, Methanol, and Toluene, the optimizer has no way to know that Ethanol and Methanol are chemically similar while Toluene is very different. Without that information, it treats all categories as equally distant, which can waste experiments.

Properties (also called descriptors) solve this. By attaching numerical values to each category, you give the optimizer a way to measure similarity and make smarter suggestions.

How It Work

You define one or more properties on a categorical variable (e.g. Molecular Weight, Boiling Point).
You assign a numerical value for each property on each category.
The optimizer uses these values as a dense numerical vector to compute distances between categories.

Instead of treating categories as unrelated labels, the optimizer now sees each one as a point in a numerical space, and can infer that nearby points are likely to behave similarly.

Example: Solvents

Without properties: Ethanol, Methanol, Acetone, and Toluene are four unrelated options. The optimizer must try all of them independently.

With properties:

Solvent	Molecular Weight	XLogP	TPSA
Ethanol	46.07	-0.31	20.23
Methanol	32.04	-0.74	20.23
Acetone	58.08	-0.24	17.07
Toluene	92.14	2.73	0.00

Now the optimizer knows that Ethanol and Methanol are similar (close values), while Toluene is very different (high XLogP, zero TPSA). If Ethanol gives a good result, it will prioritize Methanol next rather than Toluene.

Example: Ligands

Ligand	Cone Angle (°)	% Buried Volume	TEP (cm-1)
PPh3	145	27.6	2068.9
PCy3	170	32.4	2056.4
P(tBu)3	182	36.5	2056.1
dppf	180	31.2	2064.3

Steric (cone angle, buried volume) and electronic (TEP) descriptors let the optimizer explore the ligand space efficiently without testing every option.

How to Choose Good Properties

Good descriptors capture the physical or chemical differences that matter for your experiment. Here are some guidelines:

For Chemical Compounds

Molecular weight — basic size descriptor, always relevant.
XLogP (partition coefficient) — captures hydrophobicity/polarity.
TPSA (topological polar surface area) — captures polarity and hydrogen bonding.
Boiling point — relevant for reactions where temperature matters.
pKa — relevant for acid/base chemistry.
Steric descriptors (cone angle, % buried volume) — crucial for catalysis.
Electronic descriptors (Hammett sigma, TEP) — for electronic effects in reactions.

For Non-Chemical Categories

Reactor type: volume (mL), max pressure (bar), max temperature (°C)
Supplier: purity (%), lead time (days), cost ($/kg)
Protocol: duration (min), number of steps, temperature range

General Principles

2–5 properties is typical. More is not always better — noisy or irrelevant descriptors can hurt performance.
Choose properties that differentiate. If all categories have the same value for a property, it adds no information.
Use properties with different scales. A mix of size, polarity, and shape descriptors captures more information than three size descriptors.
Physical relevance matters. Properties related to the mechanism of your reaction work better than arbitrary numbers.

Without Properties vs With Properties

	Without Properties	With Properties
How categories are seen	Unrelated labels (one-hot encoded)	Points in a numerical space
Similarity	All pairs equally distant	Distance reflects real differences
Exploration	Must try every category	Can infer from similar categories
Efficiency	More experiments needed	Fewer experiments to find optimum

Category Properties: Helping the Optimizer Understand Your Categories

Category Properties: Helping the Optimizer Understand Your Categories

How It Work

Example: Solvents

Example: Ligands

How to Choose Good Properties

For Chemical Compounds

For Non-Chemical Categories

General Principles

Without Properties vs With Properties

Further Reading