Monday, September 7, 2020

How to Misuse Global COVID-19 Statistics

Have you heard about Somalia's COVID-19 policy?

In the official statistics, Somalia has just 3,362 confirmed infections and 97 confirmed deaths from COVID-19. On a per capita basis, the country has a death toll just barely above New Zealand, and below other widely reported success stories like South Korea, Japan, Norway, Germany, etc.

Yet, my guess is you probably have not heard much about how Somalia defeated the coronavirus. Why is that?

The reason is that, implicitly, no one thinks Somalia's statistics reflect the underlying reality. Still the recurring target of US drone strikes and dealing with internal conflict, Somalia--like many other impoverished or conflict-torn countries around the world--has little capacity to deal with COVID-19.

Thus, their numbers probably aren't low because they came up with a uniquely successful strategy. They're low because they aren't counting.

The Implications of Incomplete Data

All of this may seem obvious. However, the implications of this observation are routinely forgotten in reports on the coronavirus.

The problem is two-fold: First, Somalia is not an isolated example. There are numerous countries around the world with extremely low case numbers that are almost certainly caused by a lack of testing and counting, not actual policy success. Second, all of these same statistics roll up into the official global COVID-19 totals.

While everyone seems to recognize that numbers from Somalia and others are not real, they forget that this necessarily means that the global totals are also compromised.

We see versions of this error constantly, but here are a couple examples to watch out for.

CNN: The US has 4% of the world's population but 25% of its coronavirus cases

Depending on the date of the article and metric chosen (cases vs. deaths), the second percentage in this claim will fluctuate somewhat. The point is to show that the US accounts for a disproportionate share of damage caused by the coronavirus. That general claim is valid, but the uncritical use of global statistics greatly exaggerates the disparity.

According to Worldometers, approximately 1.5 billion people live in countries where very limited testing (less than 1% of population) has been done. Additionally, some large countries like India only recently ramped up their testing program, and still have totals well below the US for the moment.

This line is a standard inclusion in many stories on the coronavirus. The article used above is slightly unique only because it led with it in the headline.

NYT: America's Death Gap

Here, The New York Times can be commended for at least making a passing reference to the old missile gap canard. Hopefully, this tipped readers off to the fact that what they were about to read was not true.

In the piece, the Times offers the following thought experiment:

If the United States had done merely an average job of fighting the coronavirus — if the U.S. accounted for the same share of virus deaths as it did global population — how many fewer Americans would have died?

The answer: about 145,000. 
That’s a large majority of the country’s 183,000 confirmed coronavirus-related deaths.

The problem with this is not that their math is wrong; the problem is that they're relying on figures that everyone--including them--knows or should know to be unreliable.

As of this writing, the official global average COVID-19 death toll is 114 per 1M people. But this figure is severely diluted by all the countries that have large populations and limited testing.

If we were to treat this global average as a real number, we are left with some rather implausible conclusions.

Yes, the US is much worse than official average with 583 deaths per 1M. Fair enough.

But the same goes for countries like Canada and Switzerland which have 242 and 232 deaths per 1M, respectively. Does anyone think Canada has performed twice as bad as the average country in this pandemic?

Likewise, oft-praised countries like Germany (112)  and Denmark (108) appear to be only marginally better than average. Are we to believe these countries weren't very successful after all?

No one thinks these other conclusions are true, but it follows from the Times' line of analysis. The absurdity of the global average only becomes obvious when it's compared to the results of other countries that are known as success stories.

Conclusion

The point here is not to defend the US's track record on the coronavirus. It has performed abysmally on every conceivable metric, albeit not literally the worst. (That dubious honor belongs to Peru, with Belgium a close second.)

Rather, this is an argument for basic data literacy. If the details of a data set are unreliable, they don't magically become reliable when you sum them up or pass them through an econometric model. Garbage in means garbage out.

We live in a time when almost every politician and pundit says we need to "follow the data". Perhaps we should start by understanding its limitations.