Lets get something straight. Observability isn’t the same as monitoring. Yes, both are powered by telemetry collected from production that are crunched down into metrics, shown as some sort of time-series, graph, or table of numbers. But the key difference is that o11y requires flexibility in aggregation, and on mobile in particular, o11y should give you a reasonably accurate estimation of how perf changes directly impact user behaviour and business KPIs.

The flexibility of aggregation bit is relatively well understood at this point by folks who pay attention to o11y: if you have to specify the dimensions of aggregation ahead of time, either at collection time or when the telemetry is processed, you’ll only be able to cut the data in pre-defined ways. That’s OK if all you want is to be alerted when your SLOs are violated (i.e. monitoring), but if you can’t do ad hoc aggregation of your data to answer questions about WHY your dashboards are all red, that ain’t o11y in the most meaningful definition of the term.

The point about the ability to estimate user behaviour and business KPI impact is what I’m going to expound on in greater detail here. This is precisely the reason why mobile o11y can be so transformative.

Since my experience with o11y in the large distributed system context is non-existent limited, I’m not going to claim that it’s a necessary part of a system being observable. After all, the focus for backend o11y is understanding the internals of a complex system, so if the slicing and dicing can tell you why parts of it is breaking down for non-obvious reasons, that’s a job well done and worth of being called o11y.

Why? Because that understanding will allow you to take action to remedy any problems you can see or anticipate. It does what it says on the box: through it, you can learn about your system and solve problems using the just the data you collect.

But on mobile, the app you are observing isn’t a big-ass, complex system: it’s a bunch of little instances running on heterogenous devices of unpredictable proportions under vastly different environments. Aggregates of lower-level metrics like heap size on foreground or even p75 of app startup would, at best, only tell what is happening in the app; it doesn’t tell you how the varying levels are impacting your users and their usage of your app. I mean, knowing the percentage of users who experienced a crash in the last 24 hours tells you what, exactly, other that the literal thing it’s tracking?

No, the target of observation for observable mobile apps isn’t the just the app itself: it’s the users too. Every individual person trying to order food on your app but also the entire population in aggregate. How your users are using your app and how perf issues impact that usage is what you want to understand, so you need to collect both. What’s more, data and metadata that don’t give you an indication of user experience, perf dimensions that affect it, or factors that explain the two, why are you collecting that stuff at all?

If your telemetry doesn’t tell you if your users are able to load a page or order their Pad Thai, or explain directly why they aren’t getting the value they expect to get from your app, it’s little more than trivia,

I’m not saying understanding the inner workings of a mobile app through production telemetry isn’t useful. You can find regressions or exemplars of hard to reproduce bugs using aggregate metrics and session replay timelines, respectively. Getting some slice and dicing in there can even reveal or explain hard to isolate cohorts that face unique perf challenges due to unexpected factors. All that is extremely useful and nothing to sneeze at.

But we can do so much more on mobile because not only do we have the ability to understand how the app is working — we have direct access to individual users and their actions, so we are able to find relationships between the former and the latter. Correlation at the very least, but causation too if you play your cards right with A/B testing.

We can measure the rate of abandonment for page loads and see the correlation between it and the time it takes for that page to load. Already built into this is user expectation of perf, not in aggregate like some arbitrary line we draw above which perf is unacceptable. No, the individual perf expectations are built in by virtue of whether users stayed long enough to allow the workflow the complete. Success rate is magical like that.

So that’s why I set a higher standard for mobile observability. Here, I think it’s imperative that we be user-centric in the o11y practice. Lets observe not only the app, but the users as well, so we can explain user behaviour changes through performance. We’d be letting ourselves down if we didn’t do that.


Leave a Reply

Your email address will not be published. Required fields are marked *