NeuralBench: Meta AI's Unified Open-Source Framework for Benchmarking NeuroAI Models

For years, evaluating artificial intelligence (AI) models trained on brain signals has been a messy, fragmented affair. Different research groups employ distinct preprocessing pipelines, train models on diverse datasets, and report results only on narrow tasks. This inconsistency makes it nearly impossible to determine which model truly performs best, or under what conditions. Enter NeuralBench, a new open-source framework from Meta AI designed to bring standardization and clarity to this field.

What Problem Does NeuralBench Solve?

The intersection of deep learning and neuroscience—often called NeuroAI—has exploded in recent years. Techniques like self-supervised learning, originally developed for language, speech, and images, are now adapted to build foundation models of brain activity. These models are pretrained on unlabeled recordings (e.g., EEG, MEG, fMRI) and fine-tuned for tasks ranging from clinical seizure detection to decoding what a person sees or hears. However, the evaluation landscape has been badly fragmented.

NeuralBench: Meta AI's Unified Open-Source Framework for Benchmarking NeuroAI Models — Source: www.marktechpost.com

Existing benchmarks often fall short. For instance, MOABB covers up to 148 brain-computer interfacing (BCI) datasets but restricts evaluation to just five tasks. Other efforts—like EEG-Bench, EEG-FM-Bench, and AdaBrain-Bench—each have significant constraints. For modalities like magnetoencephalography (MEG) and functional magnetic resonance imaging (fMRI), there is no systematic benchmark at all. The result: claims about foundation models being 'generalizable' or 'foundational' frequently rest on cherry-picked tasks with no common reference point.

Introducing NeuralBench-EEG v1.0

NeuralBench aims to end this fragmentation. Its first release, NeuralBench-EEG v1.0, is the largest open benchmark of its kind. It includes:

36 downstream tasks
94 unique datasets
9,478 subjects
13,603 hours of electroencephalography (EEG) data
14 deep learning architectures evaluated under a standardized interface

This scale makes NeuralBench-EEG v1.0 a comprehensive resource for comparing and advancing NeuroAI models.

How Does NeuralBench Work?

The framework is built on three core Python packages that form a modular pipeline:

1. NeuralFetch

Handles dataset acquisition, pulling curated data from public repositories including OpenNeuro, DANDI, and NEMAR. It ensures data provenance and easy access for the benchmark.

2. NeuralSet

Prepares data as PyTorch-ready dataloaders. It wraps existing neuroscience tools like MNE-Python and nilearn for preprocessing, and HuggingFace for extracting stimulus embeddings (for tasks involving images, speech, or text). This step transforms raw recordings into standardized formats.

3. NeuralTrain

Provides modular training code built on PyTorch-Lightning, Pydantic, and the exca execution and caching library. It handles hyperparameters, loss functions, and evaluation metrics, allowing reproducible experiments.

Once installed via pip install neuralbench, the framework is controlled through a command-line interface (CLI). Running a task is as simple as three commands: download the data, prepare the cache, and execute. Every task is configured via a lightweight YAML file that specifies the data source, train/validation/test splits, preprocessing steps, target processing, training hyperparameters, and evaluation metrics.

Benefits for the NeuroAI Community

NeuralBench offers several key advantages:

Standardized evaluation: All models are tested on the same tasks, datasets, and splits—removing cherry-picking.
Reproducibility: The modular design and CLI ensure that experiments can be easily repeated by any researcher.
Scalability: With 94 datasets and 36 tasks, it provides a much broader coverage than any previous benchmark.
Extensibility: The framework is open-source and designed to accommodate new datasets, tasks, and models as they emerge.
Cross-modal potential: Although currently focused on EEG, the pipeline can be extended to MEG, fMRI, and other modalities.

Conclusion

NeuralBench by Meta AI represents a major step forward for the NeuroAI field. By unifying data acquisition, preprocessing, training, and evaluation under one open-source framework, it provides a fair and comprehensive platform for benchmarking brain-inspired models. Researchers can now move beyond fragmented efforts and focus on what truly matters: building better models that decode and understand brain activity. The framework is available on GitHub and via Meta AI's research page, inviting the entire community to collaborate and contribute.

Tags: