SpecBridge: Bridging Mass Spectrometry and Molecular Representations via Cross-Modal Alignment

SpecBridge provides a DreaMS-conditioned adapter for spectra->molecule mapping and a training pipeline with synthetic and real (MGF) data.

Installation

Quick Setup

Create conda environment:

conda env create -f environment.yml
conda activate specbridge

Install SpecBridge:
```
pip install -e .
```
Install DreaMS dependency:
```
cd DreaMS
pip install -e .
cd ..
```

For detailed setup instructions, troubleshooting, and alternative installation methods, see SETUP_ENVIRONMENT.md.

Getting Started

Pre-trained Models

DreaMS pre-trained weights are available at: https://zenodo.org/records/10997887

SpecBridge pre-trained adapters, datasets, and candidate files are available at: https://zenodo.org/records/18357418 (DOI: 10.5281/zenodo.18357418)

SpecBridge model weights are also available on Hugging Face: https://huggingface.co/Spony/SpecBridge (weights only)

Datasets and Candidate Files

The following datasets are supported with their corresponding candidate files. Download from Zenodo:

MassSpecGym (MSGYM):
- MGF: SpecBridge_MassSpecGym_dataset.mgf
- Candidates: SpecBridge_MSGYM_candidates.pkl
Spectraverse:
- MGF: SpecBridge_Spectraverse_dataset.mgf
- Candidates: SpecBridge_Spectraverse_candidates.pkl
MSnLib:
- MGF: SpecBridge_MSnLib_dataset.mgf
- Candidates: SpecBridge_MSnLib_candidates.pkl

Quick Download: Use the provided script to download all files:

./download_from_zenodo.sh

This will automatically download files to the correct directories:

Checkpoints → runs/msgym/, runs/msnlib/, runs/spectraverse/
Datasets → data/SpecBridge_*_dataset.mgf
Candidates → data/SpecBridge_*_candidates.pkl

Files keep their Zenodo names. Make sure to use the matching candidate file for your MGF dataset when running evaluation.

Training

To train the adapter on your MGF data:

python -m specbridge.train.train \
    --mgf path/to/your/data.mgf \
    --dreams-ckpt path/to/ssl_model.ckpt \
    --fold train \
    --batch-size 128 \
    --epochs 2 \
    --cond-dim 2048 \
    --mapper-hidden 2048 \
    --no-gaussian \
    --supcon-k 4 \
    --w-con 0 \
    --w-con-mapped 0 \
    --w-map 5.0 \
    --w-ortho 1e-3 \
    --w-supcon 1.0 \
    --supcon-temp 0.07 \
    --log-every 50 \
    --save-every 200 \
    --outdir runs/your_experiment_name \
    --mol-space chemberta \
    --chemberta-model Derify/ChemBERTa_augmented_pubchem_13m \
    --lr 1e-4 \
    --n-blocks 8 \
    --unfreeze-last 2 \
    --unfreeze-after 0

Batch Evaluation of Checkpoints

After training, you can use eval_all.sh to automatically evaluate all checkpoints in a run directory:

# Edit eval_all.sh to configure:
# - RUN_DIR: path to your training run directory
# - MGF: path to your MGF dataset
# - CANDS: path to candidate file matching your dataset
# - FOLD: evaluation fold (train/val/test)
# - Other parameters (batch size, embedding space, etc.)

# Run evaluation
./eval_all.sh

The script will:

Loop through all checkpoints (ckpt_*.pt) in the run directory
Evaluate each checkpoint on the specified dataset
Generate a summary CSV file (eval_summary_${FOLD}_all.csv) with metrics (R@1, R@5, R@20, MRR, median_rank)
Skip checkpoints that have already been evaluated
Display the top 5 checkpoints by R@5

Evaluation

To evaluate a single checkpoint on a dataset with candidates:

python -m specbridge.eval.candidates \
    --mgf path/to/your/data.mgf \
    --dreams-ckpt path/to/ssl_model.ckpt \
    --adapter-ckpt path/to/adapter/ckpt.pt \
    --candidates path/to/candidates.pkl \
    --fold-query test \
    --use-mapped \
    --deterministic-map \
    --no-gaussian \
    --batch-size 32 \
    --cond-dim 2048 \
    --mapper-hidden 2048 \
    --mol-space chemberta \
    --chemberta-model Derify/ChemBERTa_augmented_pubchem_13m

Package Structure

specbridge/ - Core package code
- adapters/ - DreaMS adapter implementation
- data/ - Data loading and processing
- eval/ - Evaluation scripts
- models/ - Model definitions
- losses/ - Loss functions
- train/ - Training scripts
  - train.py - Main training script
- utils/ - Utility functions
DreaMS/ - DreaMS dependency

Pre-trained SpecBridge Checkpoints

Pre-trained SpecBridge adapter checkpoints, datasets, and candidate files are available on Zenodo:

📦 Download from Zenodo | DOI: 10.5281/zenodo.18357418

Model weights are also available on Hugging Face (weights only; datasets and candidates are on Zenodo).

The Zenodo dataset includes:

Best Performing Checkpoints (Validation Set)

MSGYM: SpecBridge_MSGYM_checkpoint.pt (step 1200, R@5=0.91528, MRR=0.87518)
MSnLib: SpecBridge_MSnLib_checkpoint.pt (step 26000, R@5=0.59368, MRR=0.56004)
Spectraverse: SpecBridge_Spectraverse_checkpoint.pt (step 20600, R@5=0.43532, MRR=0.39055)

Datasets (MGF Files)

SpecBridge_MassSpecGym_dataset.mgf - MassSpecGym dataset
SpecBridge_MSnLib_dataset.mgf - MSnLib dataset with train/val/test folds
SpecBridge_Spectraverse_dataset.mgf - Spectraverse dataset

Candidate Files

SpecBridge_MSGYM_candidates.pkl - MSGYM candidate dictionary (SMILES format)
SpecBridge_MSnLib_candidates.pkl - MSnLib candidate dictionary
SpecBridge_Spectraverse_candidates.pkl - Spectraverse candidate dictionary

Note: Files are restricted access. Please request access through the Zenodo record page if needed.

Download Script: Use ./download_from_zenodo.sh to automatically download all files to the correct local directories.

Requirements

See pyproject.toml for dependencies. Main requirements:

Python >= 3.10
PyTorch >= 2.1
NumPy >= 1.24
pyteomics >= 4.7.5

Optional dependencies:

rdkit-pypi for molecular processing
wandb for experiment tracking

Data Files

Downloading from Zenodo

All pre-trained checkpoints, datasets, and candidate files are available on Zenodo:

📦 Zenodo Dataset | DOI: 10.5281/zenodo.18357418

Quick Download: Use the provided script to download all files:

./download_from_zenodo.sh

The script will automatically:

Download checkpoints to runs/specbridge_align_chemberta_pub_v3g_*/ckpt_*.pt
Download datasets to data/*.mgf
Download candidates to data/*.pkl
Create necessary directories
Skip files that already exist

File Organization

After downloading, files will be organized as:

Checkpoints:
- runs/msgym/SpecBridge_MSGYM_checkpoint.pt
- runs/msnlib/SpecBridge_MSnLib_checkpoint.pt
- runs/spectraverse/SpecBridge_Spectraverse_checkpoint.pt
Datasets:
- data/SpecBridge_MassSpecGym_dataset.mgf
- data/SpecBridge_MSnLib_dataset.mgf
- data/SpecBridge_Spectraverse_dataset.mgf
Candidates:
- data/SpecBridge_MSGYM_candidates.pkl
- data/SpecBridge_MSnLib_candidates.pkl
- data/SpecBridge_Spectraverse_candidates.pkl

Citation

If you use SpecBridge in your research, please cite:

@misc{wang2026specbridgebridgingmassspectrometry,
      title={SpecBridge: Bridging Mass Spectrometry and Molecular Representations via Cross-Modal Alignment}, 
      author={Yinkai Wang and Yan Zhou Chen and Xiaohui Chen and Li-Ping Liu and Soha Hassoun},
      year={2026},
      eprint={2601.17204},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.17204}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpecBridge: Bridging Mass Spectrometry and Molecular Representations via Cross-Modal Alignment

Installation

Quick Setup

Getting Started

Pre-trained Models

Datasets and Candidate Files

Training

Batch Evaluation of Checkpoints

Evaluation

Package Structure

Pre-trained SpecBridge Checkpoints

Best Performing Checkpoints (Validation Set)

Datasets (MGF Files)

Candidate Files

Requirements

Data Files

Downloading from Zenodo

File Organization

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
DreaMS		DreaMS
assets		assets
scripts		scripts
specbridge.egg-info		specbridge.egg-info
specbridge		specbridge
.gitignore		.gitignore
README.md		README.md
SETUP_ENVIRONMENT.md		SETUP_ENVIRONMENT.md
download_from_zenodo.sh		download_from_zenodo.sh
environment.yml		environment.yml
eval_all.sh		eval_all.sh
pyproject.toml		pyproject.toml
specbridge_entry.py		specbridge_entry.py

HassounLab/SpecBridge

Folders and files

Latest commit

History

Repository files navigation

SpecBridge: Bridging Mass Spectrometry and Molecular Representations via Cross-Modal Alignment

Installation

Quick Setup

Getting Started

Pre-trained Models

Datasets and Candidate Files

Training

Batch Evaluation of Checkpoints

Evaluation

Package Structure

Pre-trained SpecBridge Checkpoints

Best Performing Checkpoints (Validation Set)

Datasets (MGF Files)

Candidate Files

Requirements

Data Files

Downloading from Zenodo

File Organization

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages