• On Moneyball and the Real Lessons Learned

    The mystique around the 2002 Oakland A’s was built on the foundation laid by Michael Lewis’ excellent 2003 book Moneyball. In it, A’s GM Billy Beane, handcuffed by a cheapskate owner, had to use non-traditional techniques to identify and acquire players that other organizations undervalued. Namely, he and his team used data over traditional scouting to find good players who were not expensive because they had skills that were not valued. The movie of the same name took the hype to the next level, and soon the word “Moneyball” on everyone’s lips, beyond sports, beyond America, where it was used to describe anything from using data in decision-making to exploiting market inefficiencies to over-perform your budget.

    As the term took off outside baseball, it kind of went out vogue within it, eventually displaced by the term analytics as a catch-all for using data in scouting, player development, and in-game decision-making. 20 years in, using data is no longer novel within baseball – it’s de rigueur. The same goes for other sports, where it’s not about whether or not data is used, but just how much. And while the likes of Bill James were doing statistical analysis long before Beane flopped as a Mets draft pick, it’s hard to argue that he (along with Lewis) brought it to a mainstream audience.

    But here’s the thing: while Beane was notable for his use of On-Base Percentage as an undervalued metric for evaluating players, among other data-based revelations, and making Scott Hatteberg more famous than he had any right to be, his real secret weapon was wage suppression. All of the team’s stars were young players whose earnings were artificially suppressed because they were under “team control” – meaning they can’t negotiate higher wages or go to another team because their rights were held by the A’s.

    The most egregious cases were the teams’ 3 young star pitchers, Mark Mulder, Tim Hudson, and Barry Zito, who made less than $2M combined that season, less than 5% of what was already a paltry payroll. While the A’s were able to improve on the margins cheaply because they valued walks as much as hits, the biggest reason for their success was the fact that their 5 best players – sluggers Eric Chavez and Miguel Tejada were the other two – made less than $8M combined that year. Without that level of extra value created by an unfair system, there was no way those A’s would’ve won 103 games that year with that payroll.

    So the moral of the story is that just because something was seen to be groundbreaking, so much so that it changed multiple industries indelibly, it doesn’t mean the story behind the story is exactly like it was told.

    Never Stop Never Stopping

    When I first read Moneyball all those yeas ago, my first reaction was to find novel market inefficiencies everywhere so that I can exploit them in order to over-perform at whatever I’m doing. But it didn’t take long for me to realize that this was just not it. Even the idea of using data to find nuggets of insight that only you would know, that’s never going to last. If what you do is profitable, important, or otherwise interesting to other people, there will bound to be other really smart people out there trying to do exactly what you’re doing. It’s only a matter of time until everyone else catches up.

    Special sauces might give you a head start in whatever field you’re in. It did for Beane – until the rest of the league caught up. OBP became properly valued and he and the A’s had to move on to a new thing. And when the rest of the league spun up analytics departments too, the A’s actually fell behind because they lack the resources to fully exploit the power of data. When you deal in the realm of knowledge, trade secrets can only take you so far.

    So the lesson I landed on after properly digesting Moneyball is this: in order to create sustained success, you’ve got to differentiate yourself and never stop getting better. The only sustained competitive advantage in an information economy is non-stop innovation. Lead the field and let everyone else follow.

    And on top of that, you have to take full advantage of the environment around you. I don’t want to use the word “exploit” because there is a moral line that I wouldn’t cross, but within reason, you have to leverage what’s at hand. For Beane, it’s about fully leaning into the collectively bargained agreement that MLB and the players’ union signed that allows for below-market salaries to be imposed on young studs – and staying away from inefficient contracts for free agents whose best are behind them. Only by combining those two strategies can you maximize your outcome.

    Epilogue

    For Billy Beane and the Oakland A’s, despite Moneyballing their way to some amount of regular season success, they never could quite make it over the line and win the World Series. There’s also a lesson there: just because you’ve maximized your potential, it doesn’t mean you’ll always achieve the kind of success you want. In addition to luck, sometimes your best just isn’t good enough, especially when you come up against folks that are also doing Moneyball things but have access to a lot more resources. Think the LA Dodgers in baseball, and Manchester City in football. But you have to be OK with that. You do the best you can and let the chips fall where they may. The only thing you can control is the process, and to judge yourself only by the outcome (which you can’t control) is shortsighted.


  • Rethinking the App Startup Metric

    The granddaddy of Android performance metrics is probably application cold start. Setting aside the complications of how to precisely measure it (PY breaks it down really well in this post), lets think about what it represents. Beginning with a user-triggered action, the app process is created by the OS by forking the zygote, leading to the creation of the Application object, after which the Activity lifecycle kicks in (usually), finishing with the UI being rendered, ready to be interacted with.

    This is usually the first mobile performance metric app devs look at when they want to understand how well their app is performing in production. Google treats it as one of the golden perf metrics, important enough to be one of the few that are collected automatically and displayed on the Google Play Console. You’ll even find a litany of blog posts and conference talks about how to improve it. I myself have even spoken about this at droidcon SF on how improving it at Twitter helped us grow DAU tremendously.

    But do users actually care about this?

    Focus On App Launch

    I mean of course they care about how long they have to wait for an app to become usable when they tap the app icon. But what they care about is the app launch time as they perceive it, not what is happening under the hood.

    The only time users care about cold start duration is when they are in the middle of one, and only in the context of it delaying the app launch they’ve started. And you know what is faster than a cold start? A warm or hot one. Google explains the difference between the types here, but the point is that the fastest cold start is to not have a cold start to begin with. So if you want to improve a user’s experience, don’t only focus on how to make cold starts fast – also think about how you can minimize them.

    Ultimately, what users care about is having the shortest app launch time possible. Cold, warm, or hot, they just want whatever’s the fastest. If they had a choice, they’d probably want the device to read their mind, launch the app, and have it be ready when they look down at their phone. If they don’t mind having their minds read, that is. While we can’t quite do that, we can minimize the time it takes for an app to launch – and that goes beyond improving cold start duration.

    One Metric or Many?

    So you might be asking yourself: Is this guy advocating that we collect all types of app starts under one app launch metric, irrespective of, um, temperature? Oh HELL no. A million times no. Don’t you DARE put that on me!

    Munging different operations together into one metric means we can’t understand what actually happens if that number changes. Which workflows got faster or slower? A composite number like that is much less predictive and actionable than its constituent parts, making it more of a vanity metric than if we were to track them separately. So in order for us to measure precise changes to each workflow, we cannot collect one “app launch time” metric that includes all three. These must be different metrics.

    That’s not to say we can’t somehow remix the three app startup times into a singular metric somehow. In fact, if you do it right, it can be very powerful. Like the idea of OPS and OPS+ in baseball, there could be further insights gleaned if all the app startups are combined in the right way. But the constituent parts have to be measured separately so that you can determine the underlying changes and take appropriate action if warrented.

    So what’s a good way of doing this kind of metrics remixing?

    A Dead-End (For Me)

    One possible way is to simply smush the datasets for all three types of app startups together, basically the munging I had so objected to earlier. But this time, we will also have each tracked separately. The biggest advantage of this is that it’s easy. However, its usefulness may be limited given the slower startup times dominate the formula, and depending on the distribution of your particular app, the trend for this combined metric may not be far off from just what cold start is.

    You can also do a weighted average to dampen the effects of the dominating component – but how will you choose the weighting factors? I suppose you can play around with different ones, and see if you can find correlations with other metrics, then do experiments to validate the relationships. But this process depends heavily on the underlying user base and distributions of data that it produces, so not only will the results not be generalizable, they may change without you even knowing.

    Perhaps better folks than me can use this methodology to derive sustainable value from this type of Voltron-ing of the various app startup times, but it may be beyond my capabilities at this point. (If you’re able to do this, please blog about it or just ping me because I’m dying to know!)

    A Possible Way Forward?

    The one combination of startup metrics that I’ve been noodling on, one that I’ve yet to prove is useful with real data or convinced myself will be too difficult to make work, is creating a ratio of cold starts over total app launches and tracking that as a metric to represent how often a full cold start is required when a user launches an app. We can call it… the cold start rate? The higher this number, the worse it is for the user, so the idea is that it should be kept low (or be reduced if an improvement is sought), and any material increase should be treated as a regression much like how folks treat cold startup time regressions right now.

    Will cold start rate be predictive? It’s hard to say without data. Will it be actionable? Maybe – if you can see why the ratio has changed, and have other metrics that would give you clues as to why that maybe the case.

    Perhaps you have a memory leak and lmkd is more aggressive with killing your app when it’s in the background because of the process’s higher oom_score_adj value. Seeing a slight increase in your Out-Of-Memory-Exception metric combined with a higher rate of cold starts might lead you to fire up LeakCanary to find the leak to address the problem. In that hypothetical scenario, the increase in cold start rate gave you proof that users are impacted by the leak, and that fixing it also fixes the regression. Simply tracking how long cold starts take will not give you that insight.

    Anyway, contrived example aside, the point is not whether this cold start rate metric turns out to be something generally useful. The issue to highlight is that the way many of us monitor app launch times is kind of incomplete, focusing on the absolute value of the slowest kind of app launch, rather than optimizing for the general case. We absolutely can and should do better.


  • On Sustainability

    Watford FC has enjoyed several years of Premier League success followed by several years of what can generously be described as mediocrity. This last period of non-success (promotion notwithstanding) have not only been taxing on fans, but on the bank account of the club too.

    Every year we were not in the Premier League, we lose out on roughly £60-70M of revenue from broadcasting, match day, and commercial sources. And that’s after parachute payments, and not counting the discount we take in player trading because we have less leverage to demand higher fees.

    And despite record turnover, we weren’t breaking even during our run in the Premier League between 2015 and 2020. In fact, Gino had to lend the club an additional £35M in the 2017/18 season so we can increase our investments in the squad, and even more after the first relegation. But we made it work thanks in no small part to some shrewd player trading.

    Financially, we were relatively sustainable with our strategy of buying young players from undervalued markets and selling them for profit (with or without them playing for us). But that only worked because we were in the Premier League, to attract top players, fund the spending, and foot the bill for transfer deficits and other costs associated with having source club funding externally.

    It’s that last point we haven’t not talked about enough as a fanbase, the reason we cannot spend very much at all in this transfer window: we have bills to pay, and no other means of paying them other than reducing the total amount of funding to the club. The money we bring in from selling the likes of Joao Pedro need to go to loan payments and past-transfer fee instalments. And we can only afford to do that while covering wages and expenses because of parachute payments.

    Club Funding and Relegation

    I can probably write a whole post or two on how football clubs are funded. I’m not talk about paying wages and operational expenses, but rather investments in infrastructure and players. There’s probably a more specific term to describe this – it’s a combination of working capital, share equity, outstanding loans, and net transfer payables, all going to fund operational assets like player registrations. It’s money tied up with the club so that it can operate.

    Premier League clubs can sustain a large amount of club funding even if their owners are wealthy benefactors with hundreds of millions invested in the club via share equity and shareholder loans. Higher revenues allow them to spend more money on player transfers, sustain a higher level of loan interest payments (and borrow more), and be given more leniency in deferring transfer fee payments – so more money can be tied up in operation assets. With relegation, the level of funding a club can sustain shrinks precipitously, and how a club manages that reduction says a lot about how financially responsible their owners are.

    Clubs that want to “bounce straight up” try to keep funding at an unsustainable level. They largely keep their squad intact, or reinvest the money they recoup on player sales in new players. They may take out riskier loans with higher interest rates to get the cash in to make up for the shortfall of revenue. They’re basically trying to hold it together so they can resume their regular levels of spending if they get promoted again.

    This is not a terrible strategy to take for one season if you’ve had a long run in the Premier League and your finances are in decent shape. This is the path that Gino chose after the relegation in 2020, at least a cautious version, and it succeeded! Somewhat.

    The problem with stretching yourself thin like that is it reduces your ability to improve the squad responsibly even if you achieve promotion. You’ve effectively already spent some of your future revenues to fund the promotion, so you have less to spend to stay up. If your existing squad is already pretty good, and you don’t need to buy a tonne of players in order to be competitive, then congratulations: you have a fighting chance of Premier League survival.

    The squad Watford was promoted with in 2021 was not Premier League ready. The money we were able to spend did not change things materially, and that’s with getting a bargain in Dennis and essentially having Cucho come in for free. That midfield and defense were not close to being good enough after years of poor recruitment, and we simply did not have the money to make it Premier League quality.

    With the sustained funding in 2020/21, Gino basically gambled and won. But the prize we got was a miserable campaign in the Prem that ended with an expected relegation. Insert Thanos “But at what cost?” meme.

    Relegation Redux

    With the second relegation, Gino literally could not afford to go down the same path. He kind of sort of tried, but with a squad that had to be weakened further, one that was already worse than the squad that was relegated in 2020, it was always going to be tough. The quick pivot away from Rob Edwards did not help, but that was not the root cause. The players simply weren’t good enough, and we couldn’t spend the money to make up the difference. The fact that the team underperformed even that lower bar made for a horrendous season for us fans.

    Facing a second year in the Championship, I think the overall strategy has changed, even if Gino is willing to admit it or not. Neither he nor Scott have been talking up promotion this year, and this is because they know it’s unrealistic. We have no money to improve the squad, so the best thing to do is consolidate at this level of funding and build back up with younger players who will hopefully be ready for the Premier League if we ever get back there.

    The money from Brighton for JP will go to paying off Macquarie, reducing our club funding even further. It will also bring us to some definition of being “debt-free” in 12 months, as Gino said. We will still owe money to vendors as future payables, to other clubs for past transfer fees, and also to Gino, who will once again be the primary source of funding for the club. But there will be no money owned to banks that requires interest payments, which means if we can keep the cashflow even with our player trading and normal Championship-level turnover, the club will remain financially sustainable going forward.

    I don’t want to gloss over the fact that this is really good news for Watford FC fans.

    The one thing that has always worried me and a lot of supporters is the existential threat of the club going out of business because of poor financial management. This is the main reason many don’t clamour loudly for the club to change owners – it’s because we don’t know what’s behind that door. You don’t have to go far in the EFL to find examples of perfectly fine clubs being driven to the ground by owners who overspend and then lost interest. Or just bad eggs looking to cash in on some juicy real estate. Having Watford Football Club to support in any form is the most important thing, bar none, and if that means having to adjust my expectations, so be it.

    This Season

    Now, about that. What do I expect in 2023/24? If I’m being honest: very little.

    Very little in terms of transfer spending. Very little in terms of hope for promotion. Very little in term of chill from the fans because Watford are reverting back to what I’m mostly used to us being for the better part of the last 3 decades: mid-table in the second tier, with a bigger chance to be in a relegation fight than a promotion push.

    All this talk of South Korean internationals and or up-and-coming hotshots in the EFL? It’s all just fantasy to me because I don’t think we can outbid anyone. We’ll get players no bigger clubs want, players that are either unproven, have obvious flaws, or come really cheap. Manga and Costa will be shopping in the bargain bins, which will test our patience as well as theirs, but we have to live with that because we don’t have the financial means to do better. We might get one marquee signing, but I wouldn’t hold my breath for a blockbuster.

    So strap in, Watford fans. This year is going to be like no other in the Pozzo era. At best, hope for consolidation and improvement at the margins, building towards something better with a younger squad we can get behind, a distinct playing style under one head coach all year, and no financial trap doors that make us worry needlessly. If we get all that but still finish 13th, I will ecstatic.


  • On Vanity Metrics Like Crash-Free Session Rate

    App metrics, especially performance metrics, are only useful if they are predictive or actionable.

    By predictive, I mean you can, with varying degrees of certainty, assume changes to other metrics or outcomes if that metric were to change, for better or for worse. If P95 changes for request latency changes for some end point, does it matter?

    By actionable, I mean that observing a change in that metric means you can take direct counter measures to either remedy the regression, or some how mitigate its impact. If your app gets rated by a set of unknown, unvetted, self-selected individuals who give it a star rating between 1 to 5 (*ahem*), would you know what to do if your rating drops from 3.8 to 3.6 month to month?

    While there can be degrees of predictiveness and actionability, if a metric doesn’t provide at least a modicum of value along at least one of these dimensions, you’re probably better off not using it, lest it lead you down the wrong path. These are what I call vanity metrics – they seem good on the surface, but if you drill in, they provide little actual value other than look good in performance theatre, i.e. the act of working on performance just so you say you work in performance rather than make actual, measurable impact.

    Sometimes, people measure things because they are easy and they *seem* to be useful. In fact, that’s the criteria for many of the statistical categories common in sports. Pitcher Wins. Quarterback Wins. Shots on target. Game winning goals. But a lot of these stats are based as much on the context around a player or team rather than the inherit performance or ability of said player or team, so assuming they are predictive of future results might lead you to bad personnel decisions.

    If all you have are vanity metrics, I can’t blame you from trying to squeeze some value out of them. That shade I threw at Play/App Store ratings earlier – if you’re an indie dev with no other means of getting user feedback, store ratings may be the only indication you have of whether folks are satisfied with your app. As limited they are, beggars can’t be choosers. But often, there are better, more useful alternatives if you only looked deeper and didn’t just go with the status quo.

    In the world of mobile client performance, a popular metric is “Crash-Free Session Rate”. But what does that really tell you, in all but extreme cases? If that rate drops for your app from 99% to 98% month to month, what does it actually mean for your users and how they use your app? You can perhaps use correlational analysis to find out if other metrics changed along with the rate drop, providing some clues as to a possible casual relationship that underpins it and those other metrics. Maybe you can even do an A/B where you induce a change to users in an experiment bucket whereby they crash more often and observe the difference between it and the control bucket in terms of other metrics.

    But real talk: how many people who pay attention to crash-free sessions rates have actually done that level of rigour when trying to assess their impact to users?

    Further, if all you know is the rate, what can you do about it? You need to know the details of the crashes that are causing the rate to go down before you can look to improve it. And if you have the total number of crashes broken down by source, what value would knowing the crash-free session rate buy you?

    Sure, the advantage of normalizing against usage means you can make a more apples-to-apple comparisons between time periods. But is the 99% of last month the same as the 99% of this month given that not all crashes have the same impact to user experience? If your app crashed at startup and prevented a user from even launching the app, that is probably more important than a crash happening while your app has been backgrounded. Again, to know what that 99% means, you’ll need a breakdown of what specific crashes are happening, to figure out the composition of issues that led to 1% of sessions to end with a crash.

    It is this reason why it’s hard to find a direct relationship between crash rate and other metrics – it’s because not all crashes are created equal. Certainly at Twitter, even with the usage that app had, we weren’t able to find even a correlation between crashes and core metrics.

    That’s not to say you shouldn’t work on fixing crashes just because you can’t find a direct impact to other metrics. The inability to find a statistical relationship doesn’t mean they don’t cause real user dissatisfaction. It’s just that the rate at which they happen is often too small, you may not be able to find a statistically significant relationship to other metrics due to the small sample size.

    Instead, when working to reduce crashes, use more actionable metrics – like the rates for specific crashes that point to the cause. Or look for new crashes introduced by a new app version. Neither of these are predictive, probably, but both are actionable, so they provide value.

    Crash-free sessions rate though? For most, that’s a vanity metric, one that is easy to measure but don’t provide much value. Use something better when monitoring crashes. Like all vanity metrics, find better alternatives that are more predictive and/or actionable.


  • Decoding Gino, 1/?

    Gino Pozzo has been in charge of Watford FC for over a decade. Success came relatively early, culminating in the FA Cup final in 2019, but the last few years have been lean to say the least. His methods have always been curious and foreign to the fanbase, but when it worked on the pitch, we mostly didn’t care and would in fact defend them when they were routinely criticized by outsiders, gleefully doing so especially when it’s coming from reactionary British pundits.

    But since the relegation in 2020, we didn’t need Martin Samuel to talk shit about Gino and his ways: we have an increasingly vocal group of fans willing to carry the torch. Much of the criticism is merited, if not leaning a bit too much into recency bias. His seeming stubbornness in sticking to what he knows in a rapidly changing football landscape is perhaps the overarching theme that is hard to fully refute by even the most ardent Pozzo-Ins.

    Having been pretty much silent since he took over, Gino finally spoke to fans directly last month in a fans forum that was much more controversial than it really should have been. Coming out of it, the consensus was that he and Scott Duxbury said nothing of much interest, which didn’t surprise me. In a Q&A format like that where question-askers couldn’t drill-in with follow ups, one that is live-blogged to the rest of the Watford world, how deep could we really get, especially with a man not known to be super open about his philosophy and the reasons behind it?

    That said, when I finally listened to the audio, I was quite surprised at how much I actually did learn. Perhaps not directly from the things that he said, but from the way he said them, the points that he emphasized – and ones he dismissed or glossed over. Basically, it was more informative than I had first thought.

    So what did I learn? I don’t think I can cover it all in one post, but what struck me the most is his conviction, stubbornness if you will, of the process by which he sees his football philosophy getting turned into reality. Let me unpack that a little.

    By process, I mean the methods he puts in place, repeatable elements that he relies on to accomplish short term goals, that when combined, allow him and the club to achieve longer term goals. To him, the rightness of that process supersede whether the results achieved actually meets his and the fans’ expectations. While randomness and swings in luck can alter on-pitch results dramatically one way or another, the methods he installs and their application are a lot less flakey if the right checks and balances are put in place.

    This is why, despite many fans’ insistence that he ought to have learned something from the results of the past two relegations, this was never going to happen. At least that’s the impression I got after listening to him speak. The aforementioned appearance of stubbornness, the lack of contrition, comes from rejecting the notion that on-pitch failures are necessarily caused by the mistakes of his process.

    Some would take this as arrogance, that he thinks his process is perfect and that the two relegations in three years is not on him. I actually don’t think he feels that way. While I don’t think he believes that his process is at fault, I think he thinks that mistakes were made in the application of it.

    Finding the right head coach that aligns with his vision of what a successful style of football management looks like is part of the process. He felt Vladimir Ivic and Rob Edwards can bring that to the table, but for different reasons, neither met his expectations when they came in, so he dispensed with them quickly. I don’t think it’s the results that doomed them, but rather what Gino saw behind the scenes, that how they ran training, managed the individuals, etc. didn’t meet his expectations. And given the results were middling, sacking them fast instead of waiting around for what to him as the inevitable was a no-brainer to him.

    The process was not at fault. But hiring Ivic and Edwards was. At least that’s how I think he feels.

    Now, this may sound like I’m splitting hairs, but in fact I think this is precisely the kind of nuance you have to parse in order to understand why Gino does what he does. Am I reading the tea leaves a bit? Sure, maybe. But he’s never going to tell us directly (a point I’ll elaborate on in a later post), so I’m just going to have to do my best to decode him.