FastMDXplora usage examples

Worked examples for both the command-line interface (fastmdx) and the Python API. Everything here runs the same four-phase pipeline — setup → simulation → analysis → report — whether you drive it from one flag or a hundred-run sweep.

A note on input: a system can be a PDB/CIF file path, a 4-character PDB ID (fetched from RCSB, e.g. 1L2Y), or a one-letter sequence. The form is auto-detected, so there is no separate --pdb-id flag.


Command-line interface

The simplest run

Run the whole pipeline on a structure file:

fastmdx explore --system protein.pdb

Or fetch a structure from the PDB by ID:

fastmdx explore --system 1L2Y

-s, -system, and --system are all equivalent — the single-dash -system form matches the GROMACS/AMBER/NAMD convention. The xplore alias works anywhere explore does, for the X-branding:

fastmdx xplore -s 1L2Y

Output lands in ./fastmdxplora_output_<timestamp>/ unless you set --output.

Tuning the run with flags

Per-phase options are namespaced by phase (--setup-…, --simulate-…, --analyze-…, --report-…):

fastmdx explore --system protein.pdb \
    --output ./trpcage_study \
    --setup-ph 7.4 \
    --setup-ion-concentration-M 0.15 \
    --simulate-duration-ns 100.0 \
    --simulate-temperature-K 310.0 \
    --simulate-platform CUDA \
    --analyze-analyses rmsd rmsf rg cluster \
    --report-title "Trp-cage at 310 K"

--simulate-duration-ns is production length; equilibration (NVT/NPT) is independent and has its own defaults.

Choosing phases

Run only part of the pipeline with --include (allowlist) or --exclude (denylist) — they are mutually exclusive:

# Only prepare and simulate; analyze later
fastmdx explore -s protein.pdb --include setup simulation

# Everything except the report
fastmdx explore -s protein.pdb --exclude report

# Convenience flag for the common case
fastmdx explore -s protein.pdb --no-report

Running a single phase

Each phase is also its own subcommand. Here the per-phase flags are bare (no --simulate- prefix), since the phase is already chosen:

fastmdx setup    --system protein.pdb --ph 6.5 --box-shape octahedron
fastmdx simulate --output ./trpcage_study --duration-ns 50.0 --platform CUDA
fastmdx analyze  --output ./trpcage_study --analyses rmsd rg --selection "name CA"
fastmdx report   --output ./trpcage_study --no-slides

Pointing later phases at the same --output lets them pick up the artifacts the earlier phases wrote.

MD engine controls

Integrator, pressure (in bar or atm), GPU device, and checkpointing:

fastmdx explore -s protein.pdb \
    --simulate-integrator langevin_middle \
    --simulate-timestep-fs 2.0 \
    --simulate-pressure-atm 1.0 \
    --simulate-device-index 0 \
    --simulate-checkpoint-interval-steps 5000

Supported integrators: langevin_middle (default), langevin, brownian, verlet, variable_langevin, variable_verlet. Pressure can be given as --simulate-pressure-bar or --simulate-pressure-atm; atm is converted to OpenMM’s native bar (1 atm = 1.01325 bar).

Skipping PDBFixer

If you already have a prepared structure, skip the fixer:

fastmdx explore -s raw.pdb --setup-fixed-pdb prepared.pdb

Config files

For anything beyond a quick run, put it all in a YAML file. Generate a fully-commented template to edit:

fastmdx init-config                          # writes fastmdxplora.yml
fastmdx init-config --minimal -o study.yml   # short starter
fastmdx init-config -o study.yml --force     # overwrite an existing file

Then run it:

fastmdx explore --config study.yml           # -c and -config also work

Flags still override the file, so you can reuse one config and tweak a value per invocation:

fastmdx explore --config study.yml --simulate-duration-ns 50

Preview without running — print the plan (runs, systems, swept values, output directories, phases) and exit:

fastmdx explore --config campaign.yml --dry-run

Every run writes resolved_config.yml — the exact merged configuration that ran — so you can reproduce it later with fastmdx explore --config resolved_config.yml.

Other commands

fastmdx info        # versions, detected backends (OpenMM/PDBFixer), citation
fastmdx --cite      # just the citation
fastmdx --version

Config file format

Input is always a systems: list — even for one system — so the file looks the same shape whether you study one protein or a dozen.

A single study

# study.yml
systems:
  - id: trpcage
    system: trpcage.pdb        # path, PDB ID, or sequence

output: ./trpcage_study
include: [setup, simulation, analysis, report]

setup:
  ph: 7.4
  ion_concentration_M: 0.15

simulation:
  duration_ns: 100.0
  temperature_K: 310.0
  platform: CUDA

analysis:
  include: [rmsd, rmsf, rg, cluster]
  selection: "name CA"
  options:
    cluster:
      methods: [kmeans, hierarchical]
      n_clusters: 5

report:
  title: "Trp-cage at 310 K"

With one system and no sweep, output uses the familiar flat layout (trpcage_study/setup/, trpcage_study/simulation/, …).

Several systems

Add entries to the list. Each can carry its own per-phase overrides:

# compare.yml
systems:
  - id: wildtype
    system: wt.pdb
  - id: mutant
    system: mutant.pdb
    setup: { ph: 6.5 }         # this system only

output: ./comparison
simulation:
  duration_ns: 50.0            # shared by all systems

A parameter sweep

A sweep: block varies parameters across runs. Each axis is a dotted phase.option key mapped to a list of values; multiple axes form the full cross-product:

# campaign.yml
systems:
  - id: trpcage1
    system: trpcage.pdb
  - id: trpcage2
    system: trpcage.pdb
    setup: { ph: 6.5 }

output: ./trpcage_campaign

sweep:
  simulation.temperature_K: [300, 310, 320]
  simulation.pressure_bar: [1.0, 1.2]

This is 2 systems × 3 temperatures × 2 pressures = 12 runs. With more than one run, each goes in runs/<id>/ and a top-level batch_manifest.json indexes them all:

trpcage_campaign/
  batch_manifest.json
  runs/
    trpcage1__temperature_K-300__pressure_bar-1.0/
    trpcage1__temperature_K-300__pressure_bar-1.2/
    ...

Within each run, option precedence is: base config < per-system override < swept value.

Parallel execution

By default runs go one at a time. An execution: block runs several at once:

execution:
  mode: parallel        # sequential (default) | parallel
  workers: 2            # how many runs at once
  devices: [0, 1]       # GPU indices — one run pinned per device
  continue_on_error: true

On GPU the safe pattern is one run per GPU: list your devices and each worker is pinned to a distinct index. Don’t set workers higher than the number of devices on GPU — oversubscribing one GPU is slower than running sequentially. When workers is unset it defaults to one per device (GPU) or the CPU count capped at the run count (CPU).


Python API

A single study

from fastmdxplora import FastMDXplora

fmdx = FastMDXplora(system="protein.pdb")
results = fmdx.explore()

for r in results:
    print(r.name, r.status)        # e.g. "setup ok", "simulation ok"

The recommended import alias mirrors the CLI name:

import fastmdxplora as fastmdx
fastmdx.FastMDXplora(system="1L2Y").explore()

With options and phase selection

options is keyed by phase; explore() takes include/exclude and a report convenience flag:

from fastmdxplora import FastMDXplora

fmdx = FastMDXplora(
    system="1L2Y",                       # fetched from RCSB
    output_dir="./trpcage_study",
    options={
        "setup":      {"ph": 7.4, "ion_concentration_M": 0.15},
        "simulation": {"duration_ns": 100.0, "temperature_K": 310.0,
                       "platform": "CUDA", "integrator": "langevin_middle"},
        "analysis":   {"include": ["rmsd", "rmsf", "rg", "cluster"]},
    },
)

results = fmdx.explore(include=["setup", "simulation", "analysis"])
run = results[0]                          # one study -> a list of one
print("run status:", run.status)
for phase in run.phases:
    print(" ", phase.name, phase.status)

include/exclude/options can be set on the constructor or passed to explore(); arguments to explore() take precedence.

explore() always returns a list of RunResult — a single study is a list of one, a sweep is a list of many. Each RunResult carries run_id, system, status ("ok"/"error"), output_dir, sweep_values, and phases (the list of PhaseResult for that run). The iteration idiom is the same no matter how many runs there are:

for run in results:
    print(run.run_id, run.status)
    for phase in run.phases:
        print("  ", phase.name, phase.status)

Running a single phase

Each phase has a method that returns a PhaseResult:

from fastmdxplora import FastMDXplora

fmdx = FastMDXplora(system="protein.pdb", output_dir="./study")

setup_result = fmdx.setup(ph=6.5, box_shape="octahedron")
print(setup_result.status, setup_result.artifacts)

sim_result = fmdx.simulate(duration_ns=50.0, platform="CUDA")
fmdx.analyze(include=["rmsd", "rg"], selection="name CA")
fmdx.report(slides=False)

A PhaseResult carries name, status ("ok", "skipped", or "error"), output_dir, artifacts, and a message.

Driving from a config file (one system or many)

A config file — single system, several systems, or a sweep — runs through the same FastMDXplora(config=...).explore() interface. A single-system config writes the flat layout; many runs go in runs/<id>/.

from fastmdxplora import FastMDXplora

# One study from a file
FastMDXplora(config="study.yml").explore()

# A whole campaign (systems × sweep) from a file — same interface
results = FastMDXplora(config="campaign.yml").explore()

for run in results:
    print(run.run_id, run.status, run.sweep_values)

explore() returns the same list[RunResult] here as for a single study — one element per run. Each carries run_id, system, status, output_dir, sweep_values, and its phases.

Building a config in code

You don’t need a file on disk — pass a config dict directly with config_data:

from fastmdxplora import FastMDXplora

config = {
    "output": "./scan",
    "include": ["setup", "simulation", "analysis"],
    "systems": [
        {"id": "trpcage", "system": "trpcage.pdb"},
    ],
    "sweep": {
        "simulation.temperature_K": [290, 300, 310, 320],
    },
    "execution": {"mode": "parallel", "workers": 2, "devices": [0, 1]},
}

results = FastMDXplora(config_data=config).explore()
n_ok = sum(r.status == "ok" for r in results)
print(f"{n_ok}/{len(results)} runs succeeded")

Previewing a run with --dry-run

To see exactly what a config will do — every run, its system, swept values, output directory, and the phases that will execute — without running anything, use a dry run. On the CLI:

fastmdx explore --config campaign.yml --dry-run

In Python, pass dry_run=True:

from fastmdxplora import FastMDXplora

planned = FastMDXplora(config="campaign.yml").explore(dry_run=True)
for run in planned:
    print(run.run_id, run.sweep_values, "->", run.output_dir)
    # run.status == "planned"; nothing was executed

A dry run prints the plan and returns a list[RunResult] with status "planned" and no populated phases. Nothing is written to disk.


Cross-run comparison report

When a study has more than one run, FastMDXplora automatically builds a comparison/ report at the batch root that aggregates the runs:

my_campaign/
  batch_manifest.json
  comparison/
    overlay_rmsd.png          # all runs' RMSD traces on one axes
    overlay_rg.png
    trend_rmsd.png            # mean RMSD vs the swept parameter
    trend_rg.png
    comparison_summary.csv    # one row per run, summary scalars
    comparison_report.md      # the written report
  runs/
    ...

Nothing extra is required — running a sweep produces it:

fastmdx explore --config campaign.yml

For per-frame analyses (RMSD, Rg, Q-value, total SASA) it draws an overlay of every run’s trace, and — when the sweep axis is numeric — a trend of each run’s summary scalar against that axis. The comparison_summary.csv is convenient for your own plotting:

import pandas as pd
df = pd.read_csv("my_campaign/comparison/comparison_summary.csv")
print(df[["temperature_K", "rmsd_mean", "rg_mean"]])

To turn the report off, set it in the config’s report block:

report:
  comparison: false

You can also (re)build it — for instance after re-running some of the runs, or for a batch that finished earlier — with compare():

from fastmdxplora import FastMDXplora

# Right after a run, compare() operates on the study just produced:
fmdx = FastMDXplora(config="campaign.yml")
fmdx.explore()
fmdx.compare()

# Or rebuild for an existing batch directory:
FastMDXplora(config="campaign.yml").compare(output_dir="my_campaign")

Reproducibility

Every run writes a resolved_config.yml capturing the fully-merged configuration that actually executed (defaults + file + overrides). It is itself a valid config, so feeding it back reproduces the run exactly:

fastmdx explore --config some_run/resolved_config.yml

For a batch, batch_manifest.json at the output root records every run, its swept values, status, and output directory — the index for the whole campaign.