By Jacob L. Shapiro

The world is obsessed with data. Governments use data to design and implement optimal social and economic policies. Intelligence organizations sift through massive amounts of data to identify threats before they materialize. Sports franchises use data analytics to modify practice schedules and assemble more efficient teams. Smartphones track everything from the number of steps you take in a day to how often you move in your sleep. The idea behind this data gathering is that all problems have solutions, and accurate data is the key to discovering them. If something is real, it can be experienced and measured. And if something can be experienced and measured, it can be controlled.

It’s a seductive idea in theory, but in practice it is difficult to implement. It is not enough to have data about a problem – you have to have the right data to solve the problem. But how do you know if you have the right data? The British government, citing copious amounts of data, predicted that a vote for Brexit would result in an immediate spike in unemployment. Her Majesty’s government was wrong. Unemployment stayed relatively stable. (The value of the pound, on the other hand, plummeted.) The British government wasn’t lying – it was just looking at the wrong data. But even if you have the right data, you aren’t assured of getting the right answer. Data can be manipulated, or even fabricated, for reasons ranging from political expediency to incompetence.

There is an additional problem when it comes to gathering data on a country-size scale. Generally, the larger the sample size, the more accurate the statistics. In a small sample size, a fluke can skew an entire data set. The larger the sample size, the less likely an outlier will unduly influence the data. But the larger the sample size, the harder it is to collect the necessary data in the first place. In the 17th and 18th centuries, when newly minted nation-states in Europe were elevating the collection and analysis of data to the status of a religious principle, population sizes were large enough to produce data that could generate profound insights and yet not so large as to prevent the collection of the data in the first place. For larger countries, this kind of data collection is fraught with difficulty.

Ironically, the collection of information has always posed a special challenge for one large country whose data releases are now breathlessly anticipated worldwide: China. Many in the world look to these annual statistics as a gauge of how the global economy will perform in the year ahead. China releases its gross domestic product figures for the previous year during the third week of January (they are set to be released Jan. 18). A flurry of media reports have appeared with predictions of those GDP figures. Even Chinese Premier Li Keqiang last week predicted GDP growth of “around 6.9 percent.” Xinhua, China’s news agency, followed up with corroboration of that figure by Chinese experts on Jan. 14. Reuters conducted its own poll on Jan. 15, predicting growth of 6.8 percent.

Pitfalls of Centralized Control

China is a vast and populous country, and for the most part it always has been. As a result, its history oscillates between periods of centralized control and violent regionalism. One of the hallmarks of a period of centralized control is the ability of a Chinese political regime to collect information from around the country and to implement decisions based on that information. To do that, strong Chinese leaders depend on large, bureaucratic systems, which means the very solution to the problem of China’s vastness sows the seeds of the problem’s inevitable return. China’s rulers become tied to the effectiveness and efficiency of local bureaucratic organs, and when you start depending on bureaucracy to be effective and efficient, you’re fighting a losing battle.

Even strong Chinese regimes face this problem. Mao Zedong, perhaps the strongest Chinese leader of the last millennium, could not extricate himself from this trap. Mao detested China’s bureaucracy and destroyed it, only to find that China was ungovernable without it. So he rebuilt the bureaucracy and depended on it to implement the Great Leap Forward, his sweeping plan to transform China’s industry and agriculture practically overnight. Terrified of Mao’s retribution, the bureaucracy fed his government the data it wanted to see rather than the data that reflected reality in the countryside. This led to years of famine and the deaths of an estimated 20 million to 45 million people.

(click to enlarge)

Now Xi Jinping faces the same challenge. Last year, Liaoning province officials confessed to inflating GDP growth numbers for the province, while local officials in Heilongjiang and Jilin provinces admitted to falsifying foreign investment data. This year’s GDP figure release has been marred by provincial governments in Inner Mongolia and Tianjin admitting to data manipulation. Inner Mongolia reportedly overstated its 2016 industrial growth by a whopping 40 percent and its government revenue by 26 percent. Meanwhile, in Tianjin province, Binhai New Area – a special economic zone designed to attract foreign investment – revised its 2016 GDP figures down by 33.4 percent, or approximately $102 billion.

Other Data Misreported

The problem of misreporting in China isn’t limited to GDP figures. Last month, Xi made it clear to the Communist Party of China that Beijing was serious about reducing pollution, so much so that environmental progress would affect political advancement. Already there is evidence of local officials manipulating data to try to secure promotions. In Jiangxi and Henan provinces, officials were punished for using large mist cannons near monitoring stations to improve air quality readings. Independent reports by organizations such as Greenpeace bear this out. In some parts of China, the situation is improving. In other parts, however, minimal gains have been made, and air pollution is even worse, despite what local authorities report.

It might be tempting to ascribe these statistical variances to the economic or environmental conditions of these particular provinces. After all, the GDP manipulators are in northeastern China, where heavy manufacturing and coal mining are the dominant sectors of the economy, and both of these sectors have faced downward pressure in recent years. It might also be tempting to point out that China is a unique case, where the incentives leading to purposeful misreporting of data are particularly attractive to the bureaucratic elements that govern everyday life in China. But the issue is not confined to misreporting by a few rogue provinces. As the Brexit example shows, even without manipulation, it is difficult to gather data and correctly understand its implications.

Consider a report published on Jan. 5 by the International Monetary Fund. The report called into question China’s overall GDP figures, not because of shoddy reporting, but because of excess domestic credit. In this case, the IMF is not concerned with misreporting; it is concerned with understanding whether GDP growth rates are a reliable indicator of China’s economic trajectory in the first place. The report notes that in 2007-08, $1 trillion of new credit resulted in an additional $800 billion to China’s nominal GDP. In 2015-16, roughly $3 trillion was needed to produce the same nominal GDP growth. That credit is being used to prop up unprofitable businesses, which means there are serious inefficiencies in the Chinese economy that GDP figures don’t account for. The study estimated that China’s sustainable growth rate in 2012-16 is actually closer to 5 percent, not the government’s 7.3 percent figure.

It may seem odd to single out China for struggling to gather accurate data. China is often seen as at the forefront of collecting information about its citizens, unimpeded as it is by privacy laws (not that the existence of such laws has stopped other countries from gathering personal data about their own citizens). Just last month, China made headlines around the world when authorities in Xinjiang province began collecting DNA samples, fingerprints, eye scans and blood types of all people ages 12-65. Perhaps over time China will learn to use modern technology to overcome the perennial lack of reliable information about what’s going on in its own country. For now, however, the situation is the same as it was in the 13th century, when this problem was immortalized in a proverb: “Heaven is high, and the emperor is far away.”

GPF Team
Geopolitical Futures is a company that charts the course of the international system. It’s an ambitious mission, maybe even foolhardy, but hear us out.