NØMAD-HPC

Node Monitoring And Diagnostics

Open-source monitoring and diagnostics
for High-Performance Computing clusters.

"Where raw infrastructure metrics become actionable guidance."

Why NØMAD?

Monitor performance, diagnose issues, and educate users—all in one place.
A complete solution for understanding and optimizing your HPC cluster.

📊

Real-time Monitoring

Collect metrics from compute
nodes, storage systems, and
network infrastructure. 15+
collectors for visibility.

🔍

Smart Diagnostics

Automated health analysis
with actionable recommendations.
Identify problems before
users report them.

🧠

Job Similarity Analysis

Graph neural networks identify
patterns in job behavior. Find
anomalies by comparing jobs
to their "neighbors".

📈

Trend Detection

Derivative-based analysis
catches degradation early.
Know when storage is filling
or performance is dropping.

🎓

Educational Feedback

Help users become better HPC
citizens. Proficiency scoring
and suggestions improve
resource efficiency.

🔔

Flexible Alerts

Email, Slack, and webhook
notifications. Define thresholds
that matter to your environment.

See It In Action

Real-time insights into your HPC cluster

NØMAD Dashboard

Interactive Dashboard

Real-time cluster overview
with job network visualization
and node status

Architecture

Modular Architecture

Collectors, analyzers, and
alerts working together
seamlessly

Educational Feedback

Educational Feedback

Help users improve their
resource requests with
actionable insights

Network Analysis

Job Similarity Network

Visualize relationships
between jobs to identify
patterns and anomalies

📄

NØMAD-HPC: Node Monitoring And Diagnostics for High-Performance Computing

Accepted in Journal of Open Source Software (JOSS)

15+

Data Sources

50+

Metrics Tracked

4

ML Models

100%

Open Source

Transparent Tech Stack

We believe in showing exactly how things work

Core

  • Python 3.9+
  • Click (CLI framework)
  • SQLite (embedded database)
  • TOML (configuration)

Data Collection

  • SLURM (sacct, squeue)
  • vmstat, mpstat, iostat
  • nfsiostat, zpool
  • nvidia-smi (GPUs)

Analysis

  • NumPy & Pandas
  • SciPy (statistics)
  • Derivative calculus (trends)
  • Graph similarity networks

ML (Optional)

  • scikit-learn
  • PyTorch
  • PyTorch Geometric (GNN)
  • LSTM autoencoders

Visualization

  • Vanilla JavaScript
  • React (dashboard)
  • D3.js (graphs)
  • Pure CSS animations

Alerts

  • SMTP (email)
  • Slack webhooks
  • Generic webhooks
  • Custom backends

Quick Start

Get up and running in minutes

# Install from PyPI
pip install nomad-hpc

# Initialize configuration
nomad init

# Try the demo with synthetic data
nomad demo

# Or connect to your SLURM cluster
nomad collect --daemon

Read the Documentation