External Pipelines
Tomviz pipelines can be executed outside the GUI in two equivalent ways:
The
tomviz-pipelinecommand line tool.The
tomviz.pipeline.runfunction from the smalltomviz-pipelinePython package that ships with the application.
Both consume the same state files (.tvsm JSON or .tvh5 HDF5) you save
from the GUI, and both let you override the inputs declared in those state
files. That last bit is what makes this useful for batch processing: build a
pipeline interactively once, save it as a template or a state file, then
point the runner at any number of new datasets to get the same processing
applied to each.
Installation
Both interfaces live in the tomviz-pipeline Python package under
tomviz/python in the tomviz repository.
Create a virtual environment and install it:
git clone --recursive https://github.com/openchemistry/tomviz
cd tomviz/tomviz/python
pip install .
This puts the tomviz-pipeline executable on your PATH and makes the
tomviz.pipeline module importable.
Running a Pipeline As-Is
In its simplest form, the runner takes a state file and an output directory. The pipeline executes once, against whatever inputs are pinned in the state file:
tomviz-pipeline -s pipeline.tvsm -o results/
from tomviz.pipeline import run
run("pipeline.tvsm", "results/")
Visualization nodes are ignored. The leaves of what remains — every output
port whose data isn’t consumed by another node — are written under
results/ as typed files (EMD for image data, CSV for tables, XYZ for
molecules), named <id>_<label>__<port>.<ext>.
Overriding Inputs for Batch Processing
The more interesting case is replacing the inputs declared in the state file with new data. The shape of the override depends on whether the pipeline has one source or many.
Single-Source Pipelines
When the pipeline contains exactly one source node, the override is just a file, a glob, or a list of files. Each matched file produces one run.
# One file → one run.
tomviz-pipeline -s pipeline.tvsm -o results/ --input data.emd
# Glob → one run per matched file.
tomviz-pipeline -s pipeline.tvsm -o results/ --input 'data/*.emd'
# Explicit list (comma-separated, no spaces).
tomviz-pipeline -s pipeline.tvsm -o results/ --input a.emd,b.emd,c.emd
from tomviz.pipeline import run
run("pipeline.tvsm", "results/", inputs="data.emd")
run("pipeline.tvsm", "results/", inputs="data/*.emd")
run("pipeline.tvsm", "results/", inputs=["a.emd", "b.emd", "c.emd"])
Multi-Source Pipelines
When the pipeline has more than one source, each override has to identify which source it targets by node id. Node ids are stable integers assigned by the pipeline and visible inside the state file.
On the CLI, prefix every --input value with NODE_ID::
tomviz-pipeline -s pipeline.tvsm -o results/ \
--input '1:data/*.emd' \
--input '3:reference.emd'
In Python, pass a dict keyed by node id:
from glob import glob
from tomviz.pipeline import run
run("pipeline.tvsm", "results/", inputs={
1: sorted(glob("data/*.emd")), # five matches → five runs
3: "reference.emd", # broadcast across all five runs
})
A length-1 value (a single file or a glob that matches one file) is broadcast to the longest non-broadcast list, so a constant reference input can be paired with a sweep over many primary inputs. Lists of length two or more must all agree on length.
Output Layout
For a single run, outputs land directly under the output directory:
results/
3_Reconstruction__output.emd
5_AnalyzeStructures__results.csv
For two or more runs, each run gets its own zero-padded subdirectory:
results/
run_0/
3_Reconstruction__output.emd
run_1/
3_Reconstruction__output.emd
...
The subdirectory prefix defaults to run and can be changed with
--run-prefix on the CLI or run_dir_prefix= in Python.
Bundled State Output
By default, each leaf output port is written as its own typed file (the
port output format). Two other formats are available:
state— write a singleoutput_state.tvh5per run that bundles the pipeline state with the volume payloads of every populated, non-sink output port. The resulting file can be re-opened in tomviz.state+port— both: the bundled tvh5 plus the typed per-port files.
tomviz-pipeline -s pipeline.tvsm -o results/ --output-format state
run("pipeline.tvsm", "results/", output_format="state")