ChemGP 1D Plots (Convergence, RFF, Profile, FPS)¶
This tutorial produces four plotnine-based figures from ChemGP HDF5 data: a convergence curve, an RFF quality comparison, an energy profile, and an FPS scatter plot.
import h5py
import numpy as np
import pandas as pd
from pathlib import Path
from chemparseplot.plot.chemgp import (
plot_convergence_curve,
plot_rff_quality,
plot_energy_profile,
plot_fps_projection,
)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
Cell In[1], line 1
----> 1 import h5py
2 import numpy as np
3 import pandas as pd
ModuleNotFoundError: No module named 'h5py'
Load sample data¶
The sample HDF5 file contains a table/convergence group with columns
oracle_calls, max_fatom, and method for three NEB methods.
DATA = Path("data")
# --- Convergence data ---
with h5py.File(DATA / "sample_convergence.h5") as f:
tbl = f["table/convergence"]
conv_df = pd.DataFrame({
"oracle_calls": tbl["oracle_calls"][:],
"max_fatom": tbl["max_fatom"][:],
"method": [m.decode() if isinstance(m, bytes) else m for m in tbl["method"][:]],
})
conv_tol = float(f.attrs["conv_tol"])
conv_df.head()
Convergence curve¶
plot_convergence_curve draws oracle calls on the x-axis and a force metric
on the y-axis, with log-scale by default. Pass conv_tol to add a
convergence threshold line.
fig = plot_convergence_curve(conv_df, conv_tol=conv_tol)
fig
Per-method thresholds¶
Pass a dict to conv_tol to draw per-method dashed lines in matching colors:
fig = plot_convergence_curve(
conv_df,
conv_tol={"GP-NEB": 0.5, "AIE": 0.3, "OIE": 0.3},
)
fig
RFF quality¶
plot_rff_quality takes a DataFrame with d_rff, energy_mae, and
gradient_mae columns, plus exact GP baselines. Here we generate synthetic
sweep data.
# Synthetic RFF sweep: MAE decreases as D_rff grows
rng = np.random.default_rng(7)
d_vals = [50, 100, 200, 300, 500]
rff_df = pd.DataFrame({
"d_rff": d_vals,
"energy_mae": [0.5, 0.25, 0.12, 0.08, 0.06],
"gradient_mae": [1.2, 0.6, 0.3, 0.18, 0.12],
})
fig = plot_rff_quality(rff_df, exact_e_mae=0.04, exact_g_mae=0.08)
fig
Energy profile¶
plot_energy_profile draws NEB image energies. The sample data below has
two methods with 7 images each.
n_img = 7
idx = list(range(n_img))
profile_df = pd.DataFrame({
"image": idx * 2,
"energy": [0.0, 0.2, 0.5, 0.8, 0.5, 0.2, 0.0,
0.0, 0.15, 0.4, 0.75, 0.4, 0.15, 0.0],
"method": ["NEB"] * n_img + ["GP-NEB"] * n_img,
})
fig = plot_energy_profile(profile_df)
fig
FPS projection¶
plot_fps_projection shows PCA coordinates of FPS-selected vs pruned points.
rng = np.random.default_rng(42)
sel_pc1 = rng.normal(0, 1, 15)
sel_pc2 = rng.normal(0, 1, 15)
prn_pc1 = rng.normal(0, 1.5, 40)
prn_pc2 = rng.normal(0, 1.5, 40)
fig = plot_fps_projection(sel_pc1, sel_pc2, prn_pc1, prn_pc2)
fig
Next steps¶
2D surface plots tutorial covers contour, GP progression, NLL landscape, variance overlay, trust region, and sensitivity
rgpycrumbs plt-gp CLI for batch figure generation from HDF5 files