Product

AllMetrics

AllMetrics is an open‑source Python library designed to standardize machine learning metric evaluation. It provides consistent metric implementations, robust data validation, and a modular API supporting regression, classification, clustering, segmentation, and image‑to‑image translation tasks.

AllMetrics

About AllMetrics

AllMetrics is an open-source, Python-native library designed to solve the critical problem of inconsistency in machine learning performance evaluation. While existing libraries are often fragmented and plagued by “Implementation Differences” (ID) and “Reporting Differences” (RD), AllMetrics provides a single, robust ecosystem for standardized metric computation.

Designed for scalability and precision, our modular API supports a comprehensive suite of tasks—including regression, classification, clustering, segmentation, and image-to-image translation. By integrating rigorous input validation mechanisms and reconciling computational logic across disparate frameworks (Python, MATLAB, and R), AllMetrics ensures that your model performance reports are not only reproducible but mathematically trustworthy. Whether you are operating in healthcare, finance, or real estate, AllMetrics eliminates evaluation ambiguity, allowing you to focus on model optimization rather than metric validation.

Resources

  • 0.0.0

    The library is continuously being updated.

Conference Article - Recommended to Cite

AllMetrics: A Unified Python Library for Standardized Metric Evaluation in Machine Learning

Machine learning (ML) models rely heavily on consistent and accurate performance metrics to evaluate and compare their effectiveness. However, existing libraries often suffer from fragmentation, inconsistent implementations, and insufficient data validation protocols, leading to unreliable results. Existing libraries have often been developed independently and without adherence to a unified standard, particularly concerning the specific tasks they aim to support. As a result, each library tends to adopt its conventions for metric computation, input/output formatting, error handling, and data validation protocols. This lack of standardization leads to inconsistencies in both implementation and reporting, making it difficult to compare results across frameworks or ensure reliable evaluations. To address these issues, we introduce AllMetrics, a unified Python library designed to standardize metric evaluation across diverse ML tasks, including regression, classification, clustering, segmentation, and image-to-image translation. The library implements class-specific reporting for multi-class tasks through configurable parameters (e.g., average='macro'/'micro'/'none') to cover all use cases, while incorporating task-specific parameters (e.g., window_size in structural similarity index measure (SSIM)) to resolve metric computation discrepancies across implementations. Various datasets from domains like healthcare, finance, and real estate were applied to our library and compared with components in Python, Matlab, and R to identify which yield similar results. AllMetrics combines a modular Application Programming Interface (API) with robust input validation mechanisms to ensure reproducibility and reliability in model evaluation. This paper presents its design principles, architectural components, and empirical analysis demonstrating the ability to mitigate evaluation errors and enhance the trustworthiness of ML workflows.

Citations

  1. Alizadeh, M., M. Oveisi, S. Falahati, G. Mousavi, M. A. Meybodi, S. S. Mehrnia, I. Hacihaliloglu, A. Rahmim, and M. R. Salmanpour. "AllMetrics: A Unified Python Library for Standardized Metric Evaluation in Machine Learning." In 2025 IEEE Nuclear Science Symposium (NSS), Medical Imaging Conference (MIC) and Room Temperature Semiconductor Detector Conference (RTSD), pp. 1-2. IEEE, 2025.
arXiv Paper

AllMetrics: a unified python library for standardized metric evaluation and robust data validation in machine learning

Machine learning (ML) models rely heavily on consistent and accurate performance metrics to evaluate and compare their effectiveness. However, existing libraries often suffer from fragmentation, inconsistent implementations, and insufficient data validation protocols, leading to unreliable results. Existing libraries have often been developed independently and without adherence to a unified standard, particularly concerning the specific tasks they aim to support. As a result, each library tends to adopt its conventions for metric computation, input/output formatting, error handling, and data validation protocols. This lack of standardization leads to both implementation differences (ID) and reporting differences (RD), making it difficult to compare results across frameworks or ensure reliable evaluations. To address these issues, we introduce AllMetrics, an open-source unified Python library designed to standardize metric evaluation across diverse ML tasks, including regression, classification, clustering, segmentation, and image-to-image translation. The library implements class-specific reporting for multi-class tasks through configurable parameters to cover all use cases, while incorporating task-specific parameters to resolve metric computation discrepancies across implementations. Various datasets from domains like healthcare, finance, and real estate were applied to our library and compared with Python, Matlab, and R components to identify which yield similar results. AllMetrics combines a modular Application Programming Interface (API) with robust input validation mechanisms to ensure reproducibility and reliability in model evaluation. This paper presents the design principles, architectural components, and empirical analyses demonstrating the ability to mitigate evaluation errors and to enhance the trustworthiness of ML workflows

Citations

  1. Alizadeh, Morteza, Mehrdad Oveisi, Sonya Falahati, Ghazal Mousavi, Mohsen Alambardar Meybodi, Somayeh Sadat Mehrnia, Ilker Hacihaliloglu, Arman Rahmim, and Mohammad R. Salmanpour. "AllMetrics: a unified python library for standardized metric evaluation and robust data validation in machine learning." arXiv preprint arXiv:2505.15931 (2025).