App metrics, especially performance metrics, are only useful if they are predictive or actionable.

By predictive, I mean you can, with varying degrees of certainty, assume changes to other metrics or outcomes if that metric were to change, for better or for worse. If P95 changes for request latency changes for some end point, does it matter?

By actionable, I mean that observing a change in that metric means you can take direct counter measures to either remedy the regression, or some how mitigate its impact. If your app gets rated by a set of unknown, unvetted, self-selected individuals who give it a star rating between 1 to 5 (*ahem*), would you know what to do if your rating drops from 3.8 to 3.6 month to month?

While there can be degrees of predictiveness and actionability, if a metric doesn’t provide at least a modicum of value along at least one of these dimensions, you’re probably better off not using it, lest it lead you down the wrong path. These are what I call vanity metrics – they seem good on the surface, but if you drill in, they provide little actual value other than look good in performance theatre, i.e. the act of working on performance just so you say you work in performance rather than make actual, measurable impact.

Sometimes, people measure things because they are easy and they *seem* to be useful. In fact, that’s the criteria for many of the statistical categories common in sports. Pitcher Wins. Quarterback Wins. Shots on target. Game winning goals. But a lot of these stats are based as much on the context around a player or team rather than the inherit performance or ability of said player or team, so assuming they are predictive of future results might lead you to bad personnel decisions.

If all you have are vanity metrics, I can’t blame you from trying to squeeze some value out of them. That shade I threw at Play/App Store ratings earlier – if you’re an indie dev with no other means of getting user feedback, store ratings may be the only indication you have of whether folks are satisfied with your app. As limited they are, beggars can’t be choosers. But often, there are better, more useful alternatives if you only looked deeper and didn’t just go with the status quo.

In the world of mobile client performance, a popular metric is “Crash-Free Session Rate”. But what does that really tell you, in all but extreme cases? If that rate drops for your app from 99% to 98% month to month, what does it actually mean for your users and how they use your app? You can perhaps use correlational analysis to find out if other metrics changed along with the rate drop, providing some clues as to a possible casual relationship that underpins it and those other metrics. Maybe you can even do an A/B where you induce a change to users in an experiment bucket whereby they crash more often and observe the difference between it and the control bucket in terms of other metrics.

But real talk: how many people who pay attention to crash-free sessions rates have actually done that level of rigour when trying to assess their impact to users?

Further, if all you know is the rate, what can you do about it? You need to know the details of the crashes that are causing the rate to go down before you can look to improve it. And if you have the total number of crashes broken down by source, what value would knowing the crash-free session rate buy you?

Sure, the advantage of normalizing against usage means you can make a more apples-to-apple comparisons between time periods. But is the 99% of last month the same as the 99% of this month given that not all crashes have the same impact to user experience? If your app crashed at startup and prevented a user from even launching the app, that is probably more important than a crash happening while your app has been backgrounded. Again, to know what that 99% means, you’ll need a breakdown of what specific crashes are happening, to figure out the composition of issues that led to 1% of sessions to end with a crash.

It is this reason why it’s hard to find a direct relationship between crash rate and other metrics – it’s because not all crashes are created equal. Certainly at Twitter, even with the usage that app had, we weren’t able to find even a correlation between crashes and core metrics.

That’s not to say you shouldn’t work on fixing crashes just because you can’t find a direct impact to other metrics. The inability to find a statistical relationship doesn’t mean they don’t cause real user dissatisfaction. It’s just that the rate at which they happen is often too small, you may not be able to find a statistically significant relationship to other metrics due to the small sample size.

Instead, when working to reduce crashes, use more actionable metrics – like the rates for specific crashes that point to the cause. Or look for new crashes introduced by a new app version. Neither of these are predictive, probably, but both are actionable, so they provide value.

Crash-free sessions rate though? For most, that’s a vanity metric, one that is easy to measure but don’t provide much value. Use something better when monitoring crashes. Like all vanity metrics, find better alternatives that are more predictive and/or actionable.

Leave a Reply

Your email address will not be published. Required fields are marked *