Big Data From Whistleblowers Keeps Getting Bigger
We’re living in a golden age of surveillance, but we’re also living in a golden age of disclosure. Yes, government and corporations have the power – exercised or not – to peer ever deeper into our personal lives. But ordinary citizens are also, through these massive dumps of private or classified data, able to peer right back: into the inner workings of the state, the shady dealings of international business elites, and even the mechanics of diplomacy and war. Of course, insight into what the powerful are doing is not the same as creating accountability for them – but knowing what’s going on is an important step.
The following list measures the biggest leaks in the history of data journalism not by impact or newsworthiness or importance. It’s not intended to be a comprehensive list of important leaks. This list is intended to show how major data and documents leaks are increasing in size over time: the raw amount of information collected and released, the number of documents or approximate size in megabytes, gigabytes or terabytes. This is interesting not only because the trendline clearly is going up, but because big data also means data that touches more people’s lives, spread more widely across the planet, and also requires more effort from journalists and the public to make sense of.
The Panama Papers
Year: 2016
Size of leak: 2.6 Terabytes / 11.5 million documents
The Panama Papers – which blew the whistle on global offshore banking and shell companies, yielded not one but several major scandals, and brought down a European president – are without a doubt the biggest data leak in history up to now in sheer size, and what prompted me to make this list. Edward Snowden himself called it “the biggest leak in the history of data journalism” and the staggering size of the documents prompted pieces like the one by Slate that helped people visualize scale size by comparing it to the contents of 200 Blu-Ray discs.
It all began when an anonymous contact reached out to Bastian Obermayer at German newspaper Süddeutsche Zeitung. The leaker, an insider at Panamanian law firm Mossack Fonseca, then slowly spilled a gigantic trove of documents revealing the inner workings of nothing less than a shadow banking system, sprawling across the planet, designed for the global rich to avoid taxation and hide ownership.
The magnitude of the data led Obermayer to reach out for help, and eventually the resulting journalistic collaboration by the International Consortium of Investigative Journalists (ICIJ) made its own kind of history, as perhaps the biggest collaborative journalism effort to date.
In 2019, it was reported that more than $1.2 billion in funds had been recovered as a result of Panama Papers disclosures that would otherwise have been lost to tax evasion through offshore banking.
Offshore Leaks
Year: 2013
Size of leak: 260 GB / 2.5 million files
Before the Panama Papers there was Offshore Leaks which in many ways, prefigured it: an enormous dump of files revealing offshore banking and tax evasion, obtained by the ICIJ and reported on by a coordinated group effort to make sense of a new era of information disclosure. Where Panama Papers threw the door wide to the world of anonymously owned shell companies, the Offshore Leaks investigation cracked it open.
This leak revealed offshore accounts held by all sorts of characters the world over who want to hide their money: Chinese government elites, Danish bankers, a French aristocrat, an African televangelist, a Swiss law firm. Politicians in Malaysia, India, Pakistan, Paraguay and elsewhere were exposed for their offshore bank accounts. Also celebrities – including Paul Hogan, the Australian actor known as “Crocodile Dundee” who was taken for upwards of $34 million by a former tax adviser who stashed the funds.
Offshore Leaks sparked dozens of investigations, and fallout is ongoing. What’s more, it resulted in a searchable online database making some of the findings public. The ICIJ database that started with Offshore Leaks has grown and now encompasses the data from other investigations, including the just-mentioned Panama Papers. But the initial trove was substantial: ICIJ claimed in 2013 it was “more than 160 times larger than the leak of U.S. State Department documents by Wikileaks in 2010.”
The NSA Files
Year: 2013
Size of leak: 60 GB in full (source), 636 MB fully public (source)
Also known as the Snowden Archive, the National Security Agency (NSA) documents spilled by contractor Edward Snowden are not the biggest trove but easily among the most significant. The mind-boggling volume of data in question here is the amount of data the NSA is collecting – the data we are all leaking, which Snowden’s leak revealed. The Guardian’s NSA Files: Decoded still provides the best explanation of the issues of mass surveillance of metadata that Snowden’s leak revealed.
The story of the Snowden leaks is at this point, part of popular mythology: his meeting in Hong Kong with The Guardian journalist Glen Greenwald, his subsequent stranding in Russia, where he remains both in exile and a subject of controversy – as does Greenwald, who resides in Brazil. A big part of this is the history-making disclosures by Snowden – and, media and government reactions – were captured in real time by documentary filmmaker Laura Poitras, released as the feature “Citizen Four”; the same events were later made into the Oliver Stone movie, “Snowden.”
Cablegate
Year: 2010
Size of leak: 1.7 GB / 250,000+ documents
The United States Diplomatic Cables leak was the biggest document dump ever at the time it was revealed by Wikileaks in November 2010. Consisting of more than 250,000 confidential documents, the communications stretched back to the 1960’s and lifted the veil on American diplomacy and international political intrigue. Not all the documents were dumped straight into public view: while the entire leak was shared with journalism organizations, about 3,000 were published openly.
In the wake of Cablegate and other Wikileaks disclosures, Julian Assange became both a celebrity and a pariah: and so did Private Bradley Manning (now Chelsea Manning), identified as the source of the leak (along with other leaks). Mike Huckabee called for their execution, while other U.S. politicians stopped at calling the leaks “terrorism.” Manning was imprisoned and held in solitary confinement; Assange is currently in prison in the U.K. after a long, bizarre exile in the Ecuadorian Embassy in London.
Revelations from Cablegate included news of the U.S. secret bombing campaign in Yemen; Chinese government sponsored hacking attacks; and ties between Vladimir Putin and then-Prime Minister of Italy, Silvio Berlusconi (among countless other details, more highlights here and here). Reverberations of Cablegate, which showed the mechanics of American diplomacy in embarrassing transparency, are still felt today.
The Iraq War Logs and Afghanistan War Diary
Earlier in 2010, Wikileaks released two caches of documents known as the Iraq War Logs and Afghanistan War Diary to major newspapers, the largest classified military leaks in history. First came the Afghanistan documents, in July, with over 92,000 files; then in October came nearly 400,000 files. Both leaks revealed U.S. military operations and casualties, including devastating revelations of civilian deaths that shocked the world – even one jaded by years of American military intervention. (More here.)
Size, in megabytes or gigabytes, of these leaks is not readily available, but the scale of these leaks was unprecedented at the time. The Iraq and Afghanistan war leaks almost certainly will remain the biggest leaks of classified military information for a long time, and remain a deeply disturbing look into the modern warfare and U.S. interventions abroad.
The trend: data haystacks ever bigger
The past decade has been one of unprecedented growth in the size of data leaks. The following chart requires a bit of scrolling but demonstrates this clearly:
While this is still only the broadest outline of the significance of these data and document leaks, clearly the challenge to interpret mass amounts of information is growing in proportion with the radical transparency that sometimes breaks through in an era where data – about ordinary people or about the powerful – proliferates.
Obviously, overall file size is no proxy for importance. Even as they have offered journalists and the public unprecedented levels of access, the sheer size of these document and data disclosures make it easy for what’s truly important in them to get lost: the needles in increasingly large haystacks.