# Measuring AI Performance

By [Mercer AI](https://paragraph.com/@mercerai) · 2024-04-03

ai, data science, machine learning, deep learning

---

![](https://storage.googleapis.com/papyrus_images/5d11c2feead8826f11d85c15747bfb30.png)

This is your AI metrics cheat sheet:

### **Classification Metrics: When sorting data into categories.**

*   **Precision:** Out of all the items the model labeled as positive, how many truly are?  
    _\- Think of it as the model’s trustworthiness when it claims something is positive._
    
*   **Recall (Sensitivity):** Out of all actual positive items, how many did the model correctly identify?  
    _\- It’s about not missing out on true positives._
    
*   **F1 Score:** The balance between Precision and Recall.  
    _\- When you want a single metric that considers both false positives and false negatives._
    
*   **AUC-ROC:** The model’s ability to distinguish between classes.  
    _\- A score of 1.0 is perfect. Above 0.9 is excellent. Below 0.5 means the model is worse than random guessing._
    

### **Regression Metrics: When predicting continuous values.**

*   **Mean Absolute Error (MAE):** The average difference between predicted and actual values.  
    \- _Lower is better. If MAE is zero, be cautious: it could indicate overfitting or other issues._
    
*   **Root Mean Squared Error (RMSE):** Like MAE, but punishes large errors more.  
    _\- Critical when significant mispredictions can have major consequences._
    
*   **R-squared:** How well the model’s predictions match the real data.  
    \- _Ranges from 0 to 1. Closer to 1 means the model explains more of the variability._
    

### **Clustering Metrics: When you’re grouping data.**

*   **Silhouette Score:** Measures how similar items are within a cluster compared to other clusters.  
    _\- Ranges from -1 to 1. Higher values indicate better-defined clusters._
    
*   **Davies-Bouldin Index:** Lower values indicate better partitioning of clusters.  
    _\- Zero is the theoretical ideal score._
    

### **Data Quality Metrics:**

*   **Completeness:** Percentage of non-missing data points.  
    _\- Aim for 100%, but be wary of artificially complete data._
    
*   **Consistency:** Ensuring data doesn’t contradict itself.  
    _\- Inconsistent data can lead to misleading model results._
    
*   **Outlier Detection:** Identify data points that deviate significantly.  
    _\- Outliers can skew model training._
    

### **Model Robustness:**

*   **Cross-Validation Score:** A measure of a model’s performance on different subsets of data.  
    _\- Ensures the model isn’t just memorizing the training data._
    
*   **Bias-Variance Tradeoff:** Balance between the model’s adaptability and its ability to generalize to new data.  
    _\- Ideally a model that captures patterns without overfitting or being overly simplistic._
    

### **Model Interpretability:**

*   **Feature Importance:** Identifies the input variables that have the most influence on the model’s predictions.  
    _\- Understanding what the model deems important._
    
*   **SHAP Values:** Breaks down the contribution of each feature to specific model predictions.  
    \- _Clarifies the reasoning behind individual predictions._
    

### **Overfitting & Underfitting: Ensuring the model performs well on both training and unseen data.**

*   **Training vs. Validation Error:** A low training error coupled with a high validation error suggests potential overfitting.
    
*   **Overfitting:** Model performs well on training data but poorly on new data.
    
*   **Underfitting:** Model performs poorly on both training and new data.
    

This guide serves as a simple starting point for how to measure the effectiveness of different models!

---

*Originally published on [Mercer AI](https://paragraph.com/@mercerai/measuring-ai-performance)*
