On mobile devices, the lifetime of a process that backs an instance of an app isn’t necessarily mapped to its usage lifecycle. Even if you ignore implementation details like Android’s process forking and specialization via zygotes, the app process is often already created when a user taps on the app icon to launch the app, and it often stays active in some respect even after the user puts their phone back in their pocket.

For many developers who collect production telemetry from mobile apps for monitoring and observability, they are only interested in knowing what’s going on when the user is using the app. If theres’s an error and there is no user to see it, does it really matter? Sometimes it does, but more often than not, it doesn’t. Or at least it matters a whole lot less. Mobile telemetry is most useful when it’s user-centric, after all.

This is why mobile folks have loosely converged on a concept often referred to as a “Session”, which represents a contiguous chunk of time when the app is in use. Data collected on a device during that chunk of time are usually grouped together for the purposes of visualization (e.g. “Session Replay”) and rate-based analysis (e.g. daily sessions per user metric), among others. Such a grouping has some nice properties when you dig into the data. It is often a more interesting unit of measurement for usage compared to counting raw time in app.

Grouping telemetry together as belonging to a session also allows you to better correlate app and user actions. It tells you that things happening share the same usage context. The sequence of app and user actions flow from one to the other, and they’re not only connected by temporal proximity, but are also connected in a user’s brain. Frustration built up at the beginning of a session will often manifest itself as less patience towards the end. Errors that impede progress early on affect what can be done later on.

If you were wondering why conversion rate has dropped, it may be useful to look at what happened throughout the session. Did the performance of UI loads drop, leading to greater user abandonment? Were there malfunctions in the credit card adding widget, which prevented users from proceeding to the checkout page? Potential causation like that is difficult to tease out if you don’t directly link telemetry together in such a way that you can partially reconstruct a user’s head space; sessions allow you do that in a crude way. (There are other means, but that’s for a different post.)

Another way a session can be useful is that it will often give insight into the context of usage that can’t be fully baked into each signal. Device metadata that is too expensive or impossible to acquire or encode when telemetry is being recorded can be applied retroactively. If the OS was throttling the CPUs during the execution of some workflow, unbeknownst to the instrumentation, having an association with a session will allow the trace to be associated indirectly with the reason that could explain its slowness.

Having a session also allows you to buffer telemetry so that you can send its data only when you know there is no more coming. The atomicity of delivery offers guarantees that could simplify your backend when processing the data, allowing you to skip tedious back-filling that might be needed otherwise.

But when does a session start and when does a session end? This is where it gets interesting. For me, the answer to that is basically: “when you want it to, so long as you’re consistent”.

There are a few reasonable ways to define the boundaries of a session. At Embrace, we define it as the time between when an app foregrounds to when it backgrounds. The OTel Android Agent ends a session after some period of inactivity. You can also use a strict time-based start/end scheme if that’s more conducive to how your app is used. The key is predictability, so your analysis can more closely be comparing apples to apples.

At the end of the day, it’s a matter of taste. Or it is the result of implementation details that may not apply to everyone. There’s no right answer to this. Dealer’s choice. What matters is that there ARE sessions. Even if you define it as the lifetime of a process, you need something to tie together related telemetry from the same device. Why? If it’s not obvious to you, maybe I’ll write about it some more later.


Leave a Reply

Your email address will not be published. Required fields are marked *