Lost in Big Data? The Misguided Idea Ruling the Data Universe.


“. . . In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it.[…]”

“On Exactitude in Science”
Jorge Luis Borges

Borges’s story imagines an Empire addicted to the idea of creating a perfect representation of its world. The fictional Empire has immersed itself completely in the task of creating a map that coincides with its land point for point. Today, I cannot help but think that we find ourselves in a very similar environment: data is profoundly changing our world and how we perceive it. We find ourselves in the midst of a data revolution so vast, pervasive and young that it is hard to take it all in. The impact of data is extending on a truly massive scale; we are striving to use big data to transform whole industries, from marketing and sales to weather forecasts, from medical diagnoses to food packaging and from the storage of documents and the use software to communication. Indeed, very much like Borges’ fictional Empire, we have come to believe that the more data we collect and analyze, the more knowledge we gain of the world and the people living in it. How foolish data maniacs we have become.

The conviction now prevails that big data delivers actionable insights into nearly every aspect of life. Philip Evans and Patrick Forth contest that “information is comprehended and applied through fundamentally new methods of artificial intelligence that seek insights through algorithms using massive, noisy data sets. Since larger data sets yield better insights, big is beautiful” (From their joint article in bcg.perspectives). Along these lines, our hunger for data is consistently increasing and our digital ecosystem is fueling it: sensors, connected devices, social media and a growing number of clouds continually produce new data for us to collect and analyze. According to a study by the International Data Corporation (IDC), the digital universe will about double every two years. From 2005 to 2020, the volume of data will grow by a factor of 300, to 40 zettabytes of data. A zettabyte has 21 zeros. In this world of exponential data growth, the ambition to accumulate data goes unchecked. As in Borges’ fictional empire, the outer limit is the scale of 1:1, a complete digital representation of our world.

Today, companies like IBM or LinkedIn are already pushing towards that limit. IBM is training its cognitive computing system called Watson to be able to answer virtually any question. In order to do so, IBM Watson is collecting unprecedented amounts of data to form an impressive corpus of information. The company just acquired Truven Health Analytics for $2.6 billion in cash, bringing to its health unit a major repository of health data from thousands of hospitals, employers and state governments across the US. It was the fourth major acquisition of a health data company in IBM Watson’s 10 month life span, showing just how important a digital representation of patients, diagnoses, treatments and hospitals is to the computer giant’s artificial intelligence system. LinkedIn’s vision is equally ambitious: they are creating an Economic Graph, which is nothing less than a digital mapping of the global economy. It aims to include a profile for every one of the 3 billion members of the global workforce. It intends to digitally represent every company, their products and services, the economic opportunities they offer and the skills required to obtain those opportunities. And it plans to include a digital presence for every higher education organization in the world. Yet, the endeavors of the two companies are but the tip of the iceberg. Their pursuit of building a complete digital representation of their respective fields is emblematic of a more general aspiration today towards a state of ubiquitous information.

The visions of companies like IBM Watson and LinkedIn are thus already evoking Borges’ imagined world. The forces of big data are converging and recreating the cartographic ambitions of the Empire of his story. The world is becoming self-referential. The digital representation of our world is expanding fast and at the outer limits, representation and reality are starting to coincide. The world and our picture of it are converging. Suddenly, we find ourselves in a world bearing a startling resemblance to Borges’ Empire.

How foolish – Borges’ story continues, calling into question the very purpose of such an immense representation. Whether cartographic or digital, a map of the scale 1:1 might not be as valuable as thought.

“[…] The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.”

In Borges’ fictional world, the next generations disposed of their forefathers’ map as they had not been gripped by the same ambition as their ancestors and recognized that the map of the scale 1:1 was useless. They left it to decompose and all that remained were the “tattered ruins” of the forebears’ map. The realization that a map of the scale 1:1 is practically pointless also echoes with our experience with the expanding data universe.  Professor Patrick Wolfe, Executive Director of the University College of London’s Big Data Institute, warns that “the rate at which we are generating data is rapidly outpacing our ability to analyze it.” Just about 0.5% of all data is currently analyzed, and Wolfe says that percentage is shrinking as more data is being collected. So we are also beginning to realize the impracticality of the masses of data that we are wielding. Rather than gaining exponentially more knowledge about our world through data, we are creating an entity that is in danger of slipping into oblivion through its sheer size.

In order to prevent our perpetually accumulating digital collection from suffering the same fate as Borges’ map – to be left to tattered ruins by our subsequent generations –, it is essential to draw actionable intelligence from it. Hence, the capacity to really understand the full complexity of the masses of collected data and to produce relevant knowledge from them will be the ultimate competitive advantage, today and even more so in the future.

While turning big data into smart or intelligent data is already being advocated by many, no patent solution has yet emerged about how to actually achieve this transformation. Today, applied mathematics, natural language processing and machine learning are equally weighing in the balance and replace every other tool that might be brought to bear. It is the idea that with enough data, the numbers speak for themselves. To reiterate what Evans and Forth said, “big is beautiful”. This idea informs the culture of Silicon Valley and by extension that of many ventures around the world.

Other methodologies like ontologies, taxonomies and semantics are completely disregarded in the current spirit of discovery. Where applied mathematics, machine learning and predictive analytics stand for size, ontologies, taxonomies and semantics stand for meaning and understanding. And while the latter might seem insignificant compared to the dimensions of the first, they will play no lesser part in determining the competitive fitness of companies. After the exponential growth of the digital universe over the last years, we have reached a degree of complexity that requires the insertion of a deep understanding of the matters at hand. Something that will not be achieved by collecting yet more data or with the implementation of an algorithm. Ironically thus, it is a change of direction away from “big is beautiful” that could really leverage the full power of big data.