The Case Against Vanity Metrics in ML

Your model has 95% accuracy. Congratulations — that number means almost nothing without context. The obsession with single-metric performance is one of the most common failure modes in ML projects.

The Accuracy Trap

Consider a fraud detection model where only 1% of transactions are fraudulent. A model that predicts "not fraud" for everything achieves 99% accuracy while being completely useless.

This example is obvious, but subtler versions of this trap catch experienced teams. Imbalanced classes, distribution shift, and proxy metrics all create gaps between reported performance and real-world value.

Business-Aligned Metrics

Start with the business outcome, not the technical metric. If you are building a recommendation system, care about revenue impact and customer lifetime value, not just click-through rates.

Work backwards from euros or rupees. What is the cost of a false positive? A false negative? These numbers should directly inform your evaluation criteria and threshold selection.

Beyond the Test Set

Test set performance is necessary but not sufficient. A model that performs well in evaluation can fail spectacularly in production due to data drift, adversarial inputs, or edge cases not represented in your test data.

Implement monitoring that tracks business outcomes alongside technical metrics. When they diverge, you have an early warning of trouble.

The Human Element

The best ML teams maintain healthy skepticism of impressive numbers. They ask "what could go wrong?" before celebrating success.

Build a culture where questioning metrics is encouraged, not punished. The goal is value delivery, not dashboard vanity.

The Case Against Vanity Metrics in ML

The Accuracy Trap

Business-Aligned Metrics

Beyond the Test Set

The Human Element

Want to implement these patterns?