MedPerf requires some files to be hosted on the cloud when running machine learning pipelines. Submitting MLCubes to the MedPerf server means submitting their metadata, and not, for example, model weights or parameters files. MLCube files such as model weights need to be hosted on the cloud, and the submitted MLCube metadata will only contain URLs (or certain identifiers) for these files. Another example would be benchmark submission, where demo datasets need to be hosted.
The MedPerf client expects files to be hosted in certain ways. Below are options of how files can be hosted and how MedPerf identitfies them (e.g. a URL).
This can be done with any cloud hosting tool/provider you desire (such as GCP, AWS, Dropbox, Google Drive, Github). As long as your file can be accessed through a direct download link, it should work with medperf. Generating a direct download link for your hosted file can be straight-forward when using some providers (e.g. Amazon Web Services, Google Cloud Platform, Microsoft Azure) and can be a bit tricky when using others (e.g. Dropbox, GitHub, Google Drive).
Direct download links must be permanent
You can make sure if a URL is a direct download link or not using tools like
wget <URL> will download the file if the URL is a direct download link. Running
wget <URL> may fail or may download an HTML page if the URL is not a direct download link.
When your file is hosted with a direct download link, MedPerf will be able to identify this file using that direct download link. So for example, when you are submitting an MLCube, you would pass your hosted MLCube manifest file as follows:
Files in this case are supposed to have anonymous public read access permission.
Direct download links of files on GitHub¶
It was a common practice by the current MedPerf users to host files on GitHub. You can learn below how to find the direct download link of a file hosted on GitHub. You can check online for other storage providers.
It's important though to make sure the files won't be modified after being submitted to medperf, which could happen due to future commits. Because of this, the URLs of the files hosted on GitHub must contain a reference to the current commit hash. Below are the steps to get this URL for a specific file:
- Open the GitHub repository and ensure you are in the correct branch
- Click on “Commits” at the right top corner of the repository explorer.
- Locate the latest commit, it is the top most commit.
- If you are targeting previous versions of your file, make sure to consider the right commit.
- Click on this button “<>” corresponding to the commit (Browse the repository at this point in the history).
- Navigate to the file of interest.
- Click on “Raw”.
- Copy the url from your browser. It should be a UserContent GitHub URLs (domain raw.githubusercontent.com).
When your file is hosted on Synapse, MedPerf will be able to identify this file using the Synapse ID corresponding to that file. So for example, when you are submitting an MLCube, you would pass your hosted MLCube manifest file as follows (note the prefix):
Note that you need to authenticate with your Synapse credentials if you plan to use a Synaspe file with MedPerf. To do so run
medperf auth synapse_login.
You must authenticate if using files on Synapse. If this is not necessary, this means the file has anonymous public access read permission. In this case, Synapse allows you to generate a permanent direct download link for your file and you can follow the previous section.