A Call to ARMs: Bringing Observability 2.0 to Mobile (Part 1)

The Observability-Free Zone

For most companies, “mobile observability” is a misnomer. I say this because the data that most mobile devs have to “observe” their app in production is laughably simplistic. The vast majority make due with basic crash tracking and precomputed metric aggregates as the only lens into how their app is performing in the wild.

Those who want more than the bare minimum, like actually-useful data to hunt down ANRs or network request latency from the client’s perspective can pay vendors that provide SDKs and UI that will track user and app events in greater detail. Some of these products are better than others, though very few provide the capabilities that backend SREs are used to when they are troubleshooting production issues.

For a long time, this kind of basic production monitoring was good enough for most companies who ship mobile apps. Simply knowing whether a new version has caused crashes to spike, or whether the P50 of cold app startup is below some arbitrary value, was seen to be enough. If those numbers looked good – by some definition of good – the app was considered stable.

But increasingly, people are becoming unconvinced, as users complain about app quality issues that simply don’t show up in those wonderful databoards they have that supposedly tell them how their app is performing in production. Because if all you have are aggregate crash counts and a handful of percentiles, your app isn’t really observable.

So when I flippantly say that mobile observability is a misnomer, that’s what I really mean: observability is much more than just having graphs of a few key metrics. It is, as Hazel Weakly puts it, about having the ability to ask meaningful questions and get useful answers that you can then act upon.

If your tooling can only tell you that some amount of people are experiencing slow app startups but doesn’t give you the ability to figure out who they are or why just them, that’s not observability to me – that’s just monitoring.

Some folks who are hip to this turn to vendors like my current employer to provide mobile app performance data in production that is actually actionable. Others achieve similar results by building or assembling all the pieces on their own, like my previous employer.

Still, even as commercial and homegrown mobile observability solutions become more prevalent, most mobile devs continue to be stuck in the dark ages, even as their SRE colleagues rack up eye-watering bills for ~~logging~~ traditional backend observability.

A large part of what has caused the arrested development of mobile observability is the lack of demand. Small, already overworked mobile teams aren’t out here demanding more problems to solve, no matter how passionate they are about performance.

But I think things are about to change: true observability of the 2.0 variety is coming to the mobile world en masse for all who want it. Not only is the demand increasing at a breakneck speed, the ecosystem of tooling is mature enough for mobile solutions to not only work within their own silos, but also enhance existing observability data collected for the backend.

To me, mobile is the final Infinity Stone of the Observability Gauntlet.

So how did we get here? It’s simple, really: good old supply and demand.

The Demand Problem

Mobile teams are chronically under-staffed. By that, I don’t mean they are always small – they are just always asked to do more than what their staffing levels can support in actuality.

To some degree, this predicament is understandable: mobile platforms and ecosystems these days are so sophisticated, an outsider might think that you can ship, maintain, and add features to an app with just a small team of relatively junior devs.

And they’re not totally wrong. From powerful platforms and tooling, automated functional and performance testing frameworks, robust CI/CD pipelines, and hands-off distribution channels like the Play and App Stores, shipping a v1 of an app has never been easier. However, shipping v1 is just the start.

The work that consumes most mobile teams after v1 is maintaining a stable user experience as the app and the world changes around it. Adding new features, sure, that takes time, but supporting new devices and OS versions without regressions, all the while features are being added, sometimes haphazardly to meet deadlines, can be deceptively tricky, and not always accounted for when calculating staffing needs.

This is because the execution environment of mobile apps is so unpredictable, it takes an outsized effort to properly plan, create, and maintain the battery of automated tests that are necessary to ensure that most – not even all – code paths and workflow are properly covered.

Even when you leave out the different combinations of hardware and software an app has to run on, factors like network connection status, battery, available system resources like CPU, memory, disk, etc. means that unit, integration, and performance tests have a lot of combinations to cover.

And that’s before you introduce the chaos that an end user can have on how an app runs. Or the seemingly arbitrary decisions that mobile OSes make that further adds to the entropy.

“Oh, you think your background threads are going to finish running when the user takes a phone call? Sorry, there isn’t enough free memory, so I’m just going to kill your app. I sure hope you handled the case when your serialization to disk can be interrupted mid-stream!”

Simply put, if your mobile test suite does its job well, maintaining it as your code base changes is going to take up a lot of your time. If it doesn’t, production bugs and regressions are going to keep you even busier,

The hamster wheel can feel draining for mobile teams. Complaints of features taking too long to ship will inevitably lead to a Sophie’s Choice between not writing enough tests or having to fix production bugs later. The last thing teams like this need is tooling that tells them their apps have more issues than just what their Crashlytics dashboard shows.

Even those who crave more product data to help them debug issues tend to look for tactical solutions for specific problems, to help them fix the bugs that they already know about. For that, the status quo is perfectly fine.

So what’s changed? Why is there suddenly demand for real mobile observability?

First of all, the demand has always been there. It’s just been… silo’d. Some folks in the industry have realized that it’s essential for mobile apps to be truly observable – SLOs involving user workflows don’t make sense when you don’t include data from the apps themselves. Latency on the client are measured in seconds – shaving a couple hundred milliseconds in the backend will barely register if the request is running on a 2G network.

Big Tech, my friends, have understood the importance of client side performance for years – they’ve just built all the tech in-house rather than use vendors. How do I know this? Because that’s what I spent a couple years doing back at Ye Olde Hell Site, before, you know…

This is where, in a previous draft, I spent a thousand words or so talking about the cross-platform, OpenTelemetry-esque production client tracing framework that I helped conceptualize, build, and rollout, but that’s a tangent I’m skipping for now. Suffice it to say that companies with mobile performance specialists have been all over this. Slack even blogged about it.

And now, what motivated the early adopters will begin to motivate others: because upper management ~~was made to~~ understand its importance.

You see, busy mobile teams no longer need to be internally motivated to better understand how their apps are performing in production – they’ll be explicitly told to do so by their Directors and VPs, by folks who want to know how mobile apps directly contribute to company KPIs so that they can more optimally allocate their engineering budget.

All this, because money is no longer cheap, and engineering orgs need to justify their existence and prove their value to the bottom line. The end of ZIRP is the catalyst to the beginning of real mobile observability.

The End Is the Beginning is the End

For the uninitiated, ZIRP stands for “zero interest-rate policy”, and during that period (which ended around 2022), the cost to borrow money was very low. The effect it had on tech is that VC investment became abundant as rich people wanted a better return than traditional vehicles. This led to big funds looking to put money into tech startups at a rapidly increasing rate, and buoyed by successes by the parade of unicorns that made so many people wealthy, the money kept on coming.

In those halcyon days, VC money flowed freely, especially for darling tech companies on the come up. R&D had free reign to spend as long as the company or engineering were perceived to be heading in the right direction. Whether this meant staffing new teams to stand up new products or signing large vendor contracts that provided very specific services, you don’t have to go that high up in the org chart to get approval for projects with significant financial commitments.

But coming out of COVID, with a macroeconomic climate that featured rising interest rates, the costs for VCs to invest got a lot more expensive, so they’ve become more discerning. With the taps turned off, tech opulence turned into austerity, and that started a domino effect of budget cuts and layoffs. The industry bled, and the new normal is that if your project or team can’t justify their existence or provide a high enough ROI, you may not be around for long.

While that may seem antithetical to the addition of a new line item in the budget for mobile observability, it is actually quite the opposite. The reason is that mobile performance has always affected app usage, but it’s just not a very exciting thing to back.

When competing for attention and money with sexier initiatives like new products and features whose importance and hype are tied directed to the clout of its pushers and their fancy slide decks, it’s really hard for something so relatively uninteresting to be prioritized – if anyone was even pushing for it in the first place.

In a vibes-based stack-ranking exercise, something boring like “slow apps make people use them less” don’t tend to end up near the top. But in a world where prioritization is actually data-driven, where you are asked to show your receipts, initiatives that can demonstrably affect the bottom line tend to win out. In that environment, people will go out of their way to find proof of ROI and efficacy for their projects.

And where would you find better ROI than in a key part of your customer’s journey, a part in which you have limited performance data if any, where a whole class of issues creating friction for your users are invisible to you? Mobile performance is low-hanging fruit galore, and adding observability in your app will bring you truck loads.

When you ship a regression in the app that slows down certain workflows on a mobile app, it will not be directly reflected in your dashboards if they are based solely on telemetry generated from your servers. Those well-calibrated SLO alerts that your SREs rely on to detect emerging incidents? They won’t fire.

If the only telemetry you have for your apps in production are crashes and pre-aggregated metrics, you will have so many blindspots where app performance regressions could be killing you on the margins. Even if you see your KPIs drop, you wouldn’t even know that they were caused by your app being materially slower for some percentage of your users in production because you lack the data to diagnose that.

So yeah. Every mobile team should want this. Every SRE team should demand that their mobile team use this. The question is: how? If you’re not a big tech company who can throw people and money at the problem, how can you ease yourself into mobile observability without being tied down to a specific vendor’s solution?

I’ll discuss this further in Part 2. Hint: it starts with O and ends with penTelemetry.

hanson.wtf