# Install from PyPI
pip install nomad-hpc
# Initialize configuration
nomad init
# Try the demo with synthetic data
nomad demo
# Or connect to your SLURM cluster
nomad collect --daemon
NØMAD-HPC
Node Monitoring And Diagnostics
Open-source monitoring and diagnostics
for High-Performance Computing clusters.
"Where raw infrastructure metrics become actionable guidance."
Why NØMAD?
Monitor performance, diagnose issues, and educate users—all in one place.
A complete solution for understanding and optimizing your HPC cluster.
Real-time Monitoring
Collect metrics from compute
nodes, storage systems, and
network infrastructure. 15+
collectors for visibility.
Smart Diagnostics
Automated health analysis
with actionable recommendations.
Identify problems before
users report them.
Job Similarity Analysis
Graph neural networks identify
patterns in job behavior. Find
anomalies by comparing jobs
to their "neighbors".
Trend Detection
Derivative-based analysis
catches degradation early.
Know when storage is filling
or performance is dropping.
Educational Feedback
Help users become better HPC
citizens. Proficiency scoring
and suggestions improve
resource efficiency.
Flexible Alerts
Email, Slack, and webhook
notifications. Define thresholds
that matter to your environment.
See It In Action
Real-time insights into your HPC cluster
Interactive Dashboard
Real-time cluster overview
with job network visualization
and node status
Modular Architecture
Collectors, analyzers, and
alerts working together
seamlessly
Educational Feedback
Help users improve their
resource requests with
actionable insights
Job Similarity Network
Visualize relationships
between jobs to identify
patterns and anomalies
NØMAD-HPC: Node Monitoring And Diagnostics for High-Performance Computing
Accepted in Journal of Open Source Software (JOSS)
15+
Data Sources
50+
Metrics Tracked
4
ML Models
100%
Open Source
Transparent Tech Stack
We believe in showing exactly how things work
Core
- Python 3.9+
- Click (CLI framework)
- SQLite (embedded database)
- TOML (configuration)
Data Collection
- SLURM (sacct, squeue)
- vmstat, mpstat, iostat
- nfsiostat, zpool
- nvidia-smi (GPUs)
Analysis
- NumPy & Pandas
- SciPy (statistics)
- Derivative calculus (trends)
- Graph similarity networks
ML (Optional)
- scikit-learn
- PyTorch
- PyTorch Geometric (GNN)
- LSTM autoencoders
Visualization
- Vanilla JavaScript
- React (dashboard)
- D3.js (graphs)
- Pure CSS animations
Alerts
- SMTP (email)
- Slack webhooks
- Generic webhooks
- Custom backends
Quick Start
Get up and running in minutes