To help innovators systematically evaluate the suitability of medical machine learning (ML) data, Schwabe, Becker, Seyferth et al. (2024) proposed the METRIC-framework – a structured set of dimensions that defines what “data quality” means in the context of medical ML. This offers a clear conceptual map of what to examine. But innovators also need concrete tools – ways to quantify data issues, detect risks early, and justify choices to regulators and clinical partners. The Metric Hub deliver exactly that: a practical, use‑case‑driven system for selecting and applying data‑quality metrics in medical AI.

Why data quality is a make‑or‑break factor in medical AI
Machine‑learning systems in medicine have moved far beyond research prototypes – they are now being deployed to support diagnosis, monitoring, and treatment. But their acceptance by clinicians and patients hinges on one essential requirement: trustworthiness. Increasingly, international bodies and regulators emphasize that trustworthiness in medical AI is inseparable from the quality and governance of the underlying data.
But model performance alone does not prove that your dataset is good – it blends data properties with model design, hyperparameters, and training choices. Hidden data issues such as label noise, demographic imbalance, device heterogeneity, missingness patterns, or distribution drift can undermine safety, fairness, and external validity in clinical use. Upcoming regulatory expectations – for example under the EU AI Act – explicitly call for documented, fit‑for‑purpose datasets that are relevant, representative, and as error‑free and complete as possible for the intended use.
Introducing Metric Hub: From regulation to application
In their recent paper, Metric Hub: A metric library and practical selection workflow for use‑case‑driven data quality assessment in medical AI, Becker, Oppelt, Zech et al. operationalize the METRIC-framework and introduce Metric Hub, an online platform ed by the Physikalisch‑Technische Bundesanstalt (PTB) that makes data‑quality assessment practical and usable in real development pipelines.
The platform serves as the central access point for:
- METRIC-framework: An easy-to-read overview of the data quality framework grounded in the clustered dimensions proposed by Schwabe, Becker, Seyferth et al. (2024).
- Metric library with 60 Metric cards: Concise, cheat‑sheet‑style summaries for each quantitative metric covering the 14 measurable dimensions of the METRIC‑framework, including definitions, applicability, pitfalls, and interpretation guidance.
- Decision trees for metric selection (coming soon): A use‑case‑driven tool to help you identify the most relevant metrics for your specific requirements, such as modality, task, annotation setup, reference availability, or expected update cadence.
Read the paper – and let’s talk about your dataset
Do you want to see how this works on a real‑world dataset? The authors demonstrate their workflow on PTB‑XL, a large 12‑lead ECG corpus, showing how selected metrics respond to changes in sex balance, device distribution, and class imbalance. For a deeper look at the methodology and results, we recommend reading the full paper.
Planning a validation study or preparing a regulatory submission?
We can help you select the right data quality metrics, acquire clinical‑grade datasets, and generate clear, defensible documentation for reviewers and partners. Please do reach out.






Add comment