Gpredomics is a high-performance Rust implementation of the Predomics framework for discovering interpretable predictive signatures in omics data (metagenomics, microbiome, metabolomics). It learns Binary/Ternary/Ratio (BTR) models with discrete coefficients {-1, 0, 1} for maximum interpretability, making it ideal for clinical diagnostics and biomarker discovery.
- Interpretable languages: binary (subset sum), ternary (−1/0/1 algebraic sum), ratio (sum positive over sum negative), pow2 (ternary with powers of two coefficients).
- Data encodings: raw values, prevalence via epsilon thresholding, and log transforms with epsilon flooring for numerical stability.
- Optimizers: Genetic Algorithm (ga2 Predomics style), Beam search (LimitedExhaustive and ParallelForward), and MCMC with Sequential Backward Selection (beta).
- Fitness targets: AUC, specificity, sensitivity, MCC, F1-score and G means with optional penalties on model size (k_penalty) and false‑rates (fr_penalty).
- Confidence interval for classification threshold, allowing to discover divisive models and to avoid uncertain classifications.
- Cross‑validation: stratified folds, Family of Best Models extraction, and MAD permutation importance aggregation across folds.
- GPU acceleration: wgpu‑based scoring with configurable memory policy and safe CPU fallback when device limits are reached.
- Quick start
- Configuration:
- Usage
- Data management
- Parameters (coming soon)
- Cross-validation
- Concepts:
- Algorithms:
- Genetic Algorithm (coming soon)
- Beam Search (coming soon)
- MCMC (coming soon)
- To go further:
- Differences with legacy Predomics (coming soon)
- Technical documentation
- You may be interested in:
Install a recent Rust toolchain and build in release mode for performance on CPU and GPU.
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
At the root of this repository, compile gpredomics:
cargo build --release
If you want the binary to embed a git hash in logs and experiment metadata, set GPREDOMICS_GIT_SHA at build time. If it is not set, it defaults to unknown.
Example:
GPREDOMICS_GIT_SHA=$(git rev-parse --short HEAD) cargo build --release
The executable loads param.yaml from the current working directory on startup. This configuration file contains information about the inputs and experiments to be launched.
To launch gpredomics, simply type:
cargo run --release
Below are minimal TSV schemas that match the loader’s expectations.
X.tsv : features by rows and samples by columns; first column contains feature names, subsequent columns contain numeric values per sample.
| feature | sample_a | sample_b | sample_c |
|---|---|---|---|
| feature_1 | 0.10 | 0.20 | 0.30 |
| feature_2 | 0.00 | 0.05 | 0.10 |
| feature_3 | 0.90 | 0.80 | 0.70 |
y.tsv: two‑column TSV mapping sample to class; header line is ignored; classes: 0 (negative), 1 (positive), 2 (unknown, ignored in metrics).
| sample | class |
|---|---|
| sample_a | 0 |
| sample_b | 1 |
| sample_c | 1 |
A new parameter now allows you to accept a transposed X format, which is standard in ML. To do this, set features_in_rows to false in param.yaml.
CLI commands can be specified to reload a saved experiment or evaluate a new dataset using the models selected during the experiment:
- Default run: execute the binary in a directory that contains
param.yaml; the program initializes logging and dispatches GA/Beam/MCMC according to general.algo. - Specific configuration: execute the binary using another configuration file using --config config.yaml.
- Reload and display: use --load <experiment.(json|mp|bin)> to deserialize a saved Experiment; the format is auto‑detected at load time.
- Evaluate on external data: combine --load with --evaluate and provide --x-test and --y-test to score the saved run on a new dataset.
Flags are defined with clap:
# default execution (param.yaml in CWD)
gpredomics
# specific config
gpredomics --config ./path/config.yaml
# reload a saved experiment and print results
gpredomics --load 2025-01-01_12-00-00_run.msgpack
# evaluate on an external test set
gpredomics --load 2025-01-01_12-00-00_run.msgpack \
--evaluate --x-test /path/X_test.tsv --y-test /path/y_test.tsvNote that --evaluate requires --load and needs --x-test and --y-test.
Termination signals are handled for clean shutdown.
The supported GPUs are:
- Apple Silicons (Metal),
- All GPU supported by Vulkan.
For Apple, Metal is supported out of the box, however you need a recent version of LLVM, more recent at least than the default one. Here is the procedure:
You're supposed to already have the developpers tools (installed with xcode-select --install, which you need anyway for Rust).
The recommanded procedure is to use Homebrew (at least for LLVM, and probably for rustup).
brew install rustup llvm
# the following line is only required if you had already installed Rust with the https://rustup.sh site
mv ~/.cargo ~/.cargo.backup # optionnally remove the line in the .zshrc that load the .cargo/env environment file
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
echo << EOF >> ~/.zshrc
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
EOF
rustup default nightly
rustup updateThen build normally with cargo build --release
For Linux, you must install Vulkan:
sudo apt install vulkan-tools libvulkan1 For Nvidia cards, you will need also a driver, so for instance:
sudo apt install libnvidia-gl-550-server nvidia-driver-550 nvidia-utils-550Check with vulkaninfo that your card is correctly detected.
NB under Linux, I always do a fully optimized build, but that is not mandatory, a simple cargo build --release is enough:
RUSTFLAGS="-C target-cpu=native -C opt-level=3" cargo build --releaseIf you use Gpredomics in your research, please cite it as follows:
Original Method:
Prifti, E., Chevaleyre, Y., Hanczar, B., Belda, E., Danchin, A., Clément, K., & Zucker, J. D. (2020). Interpretable and accurate prediction models for metagenomics data. GigaScience, 9(3), giaa010. https://doi.org/10.1093/gigascience/giaa010
Software:
Lesage, L., de Lahondès, R., Puller, V., & Prifti, E. (2025). Gpredomics (Version 0.7.7). GMT Science / IRD. https://github.com/predomics/gpredomics
A CITATION.cff file is available in this repository. If you use GitHub, you can use the "Cite this repository" option in the "About" section to export this citation in BibTeX or APA formats.
If you have any questions, comments, or have found a bug, please contact us at the following address:
- Email: contact@predomics.com
- GitHub Issues: Gpredomics Issues
Last updated: v0.7.7
