Skip to content

Benchmark Workflow

A benchmark in MedPerf is a collection of assets that are developed by the benchmark committee that aims to evaluate medical ML on decentralized data providers.

The process is simple yet effective enabling scalability.

Step 1. Establish Benchmark Committee

The benchmarking process starts with establishing a benchmark committee of healthcare stakeholders (experts, committee), which will identify a clinical problem where an effective ML-based solution can have a significant clinical impact.

Step 2. Register Benchmark

MLCubes are the building blocks of an experiment and are required in order to create a benchmark. Three MLCubes (Data Preparator MLCube, Reference Model MLCube, and Metrics MLCube) need to be submitted. After submitting the three MLCubes, alongside with a sample reference dataset, the Benchmark Committee is capable of creating a benchmark. Once the benchmark is submitted, the Medperf admin must approve it before it can be seen by other users. Follow our Hands-on Tutorial for detailed step-by-step guidelines.

Step 3. Register Dataset

Data Providers that want to be part of the benchmark can register their own datasets, prepare them, and associate them with the benchmark. A dataset will be prepared using the benchmark's Data Preparator MLCube and the dataset's metadata is registered within the MedPerf server.

flow_preparation.gif
Data Preparation

The data provider then can request to participate in the benchmark with their dataset. Requesting the association will run the benchmark's reference workflow to assure the compatibility of the prepared dataset structure with the workflow. Once the association request is approved by the Benchmark Committee, then the dataset becomes a part of the benchmark.

Step 4. Register Models

Once a benchmark is submitted by the Benchmark Committee, any user can submit their own Model MLCubes and request an association with the benchmark. This association request executes the benchmark locally with the given model on the benchmark's reference dataset to ensure workflow validity and compatibility. If the model successfully passes the compatibility test, and its association is approved by the Benchmark Committee, it becomes a part of the benchmark.

Step 5. Execute Benchmark

The Benchmark Committee may notify Data Providers that models are available for benchmarking. Data Providers can then run the benchmark models locally on their data.

This procedure retrieves the model MLCubes associated with the benchmark and runs them on the indicated prepared dataset to generate predictions. The Metrics MLCube of the benchmark is then retrieved to evaluate the predictions. Once the evaluation results are generated, the data provider can submit them to the platform.

Step 6. Aggregate and Release Results

The benchmarking platform aggregates the results of running the models against the datasets and shares them according to the Benchmark Committee's policy.

The sharing policy controls how much of the data is shared, ranging from a single aggregated metric to a more detailed model-data cross product. A public leaderboard is available to Model Owners who produce the best performances.