Even more ado about nothing … or why the hype about big data and AI is often more about self-marketing than facts and real progress.

Every two days we produce the amount of data that was produced in total from the beginning of civilization until 2003. This shocking statistic was first presented by former CEO of Google, Eric Schmidt, in 2010. Since then, data production has certainly accelerated. Although mass data processing is nothing new, the hype surrounding the more familiar term “big data” only started in recent years [1]. But many people are quickly getting lost in this ever-growing jungle of data and often quite abstruse data processing methods.

Coincidences cannot be calculated …

… because “more data does not mean more knowledge,” as Gerd Antes proclaims succinctly in an interview with the Tagesanzeiger. The mathematician strongly criticizes the hype about big data usage because the mass of data leads to a higher probability of random correlations. For example, per capita cheese consumption and the number of deaths caused by entanglement in bedsheets in the USA show an identical curve. Machine analysis would possibly draw conclusions from this, whereas a scientist immediately recognizes it as a coincidence. [2]

Nevertheless, according to many big data supporters, coincidences no longer exist. They believe that if the quantities of data available are large enough, all interrelationships can be calculated in advance with the help of machine processing or deep learning and the right type of analyses. Past experience and available training sets are sufficient for this, and there is negligible risk of error ranges due to missing or irrelevant data. However, such a conclusion is fatal. Of course, certain areas, periods of time and interrelationships, etc. can be explored more easily, for which something is more or less likely to happen. However, this certainly does not mean that coincidences or significant deviations are impossible. For example, how can we expect an analysis of data collected from the past to precisely predict traffic accidents in the future? Or diseases, since information on disease progressions – and thus digital patient data – can be incomplete, inconsistent and/or inaccurate. [2]

Big, bigger, big data? Don’t exaggerate on your achievement.

Data analysis can thus be life-threatening …

Especially with regard to the field of medicine, Gerd Antes is not alone in cautioning against the pitfalls of big data and AI. If an incorrect treatment method is selected due to the results of big data analyses and machine learning, the effects can be devastating – for patients, for wallets and for reputations. With such enormous amounts of data available, true correlations and inconsistencies may not even be discovered. Inconsistencies and correlations can threaten or save lives. [2]

IBM made negative headlines again recently when the media company STAT analyzed IBM internal documents for a report which concluded that Watson for Oncology had repeatedly recommended “unsafe and incorrect” cancer treatments. The report also claimed that IBM employees and supervisors were aware of this. Although no deaths have been proven to have occurred as a result of these proposals, many prestigious hospitals have decided to stop using the multi-million-dollar technology. [3]

In this respect, the first signs of a rethink and a somewhat more rational approach in this area are already visible. The two to three years of seemingly boundless hype about IBM’s wonder computer Watson in the field of medicine is finally coming to an end. This will also happen in many other similar fields – at the latest, when people realize the importance of facts, reliable results and relevance rather than self-marketing and grandiose promises by well-known global tech groups with their often still very experimental products. It is certain that the aforementioned developments in the field of medicine can be transferred almost 1:1 to the digital HR market, for example with regard to the matching of jobs and skills.

Trustworthy knowledge comes from experts

Over five years ago Cornel Brücher published his provocative work “Rethink Big Data” in which he described big data supporters as fools. We at JANZZ have held a similar point of view from the beginning. It is simply not possible to acquire knowledge in the field of jobs and CVs, including more complex occupation data, by means of machine learning alone. Anyone who says otherwise is demonstrably wrong. And will remain wrong, no matter how often the same ideas and products are advertised and marketed; and even if much more money is invested in such technologies than before.

For this reason, and despite considerable investment, results that are based on this “big data approach” are still largely inadequate and have barely improved over recent years, regardless of the size of the data records used, e.g. for LinkedIn and IBM & Co. The results from machine learning will become increasingly error-prone as more factors and variables – and thus complicated rules and relations – are added. With the risk being that incidents of erroneous correlations or even assumed causality can occur. Knowledge graphs or ontologies, on the other hand, enable knowledge to be mapped and used in a very deep and structured manner. Knowledge concerning knowledge graphs is highly verifiable and trustworthy because the know-how of various experts is stored and connected in a structured manner – rather than being calculated by computer scientists who are experts in programming, but not, for example, in the fields of medicine, engineering, investment banking, etc. Since knowledge graphs reflect the relationships between many different areas, only they can provide relevant and precise search results and recommendations. For example, in the area of occupation data: A knowledge graph recognizes the difference and the connections between competencies, experiences, functions, specializations and education. They take into consideration, for example, that for job title “J” with apprenticeship “A,” skill “S” is very important. Let’s take a Senior Cloud Architect as an example. A knowledge graph recognizes this job title and knows that, for example, a master’s degree in computer science could one day lead to the applicant securing this job if he/she also has the skill “cloud solution development” and several years of professional experience.

Google also relies on experts and a knowledge graph for occupation data

This was proclaimed by Google when the company launched its knowledge graph “Google Cloud Jobs API,” on which its Google for Jobs search is based (see “Google Launches its Ontology-powered Jobs Search Engine. What Now?”). Google realized then that an ontology-based approach would give better search results. In the case of a semantic search based on the knowledge of a knowledge graph, a search for an “Admin Assistant” would not add results that are only similar to the search term, such as “HR Admin” or “Software Admin.” Or a big data analysis could possibly determine random correlations and thus suggest completely different jobs that only have similar skill requirements (engineers, for example, but also office assistants need knowledge of Microsoft Office).

To know the difference and thus truly know about job search and have a general understanding of professions and their interrelationships is therefore generally only possible with a knowledge graph. Matt Moore, product manager of Google Cloud, stated as the reason for introducing Google Cloud Jobs API: “We want to provide a better job search experience for all employers and candidates. Because, let’s face it: Hiring the right people is one of the most important things your company needs to do.” [4]

Only people have the knowledge necessary to comprehend human nature …

This raises the question of whom you can really trust when it comes to this most important task: the selection of employees. It’s a never-ending story: According to the CV, the applicant was the perfect candidate, but unfortunately he/she did not fit in personally. Drawing such conclusions, which are not suggested by the available (digital) data, is at a level where it is the turn of HR specialists, humans. Technological tools can manage CVs and rank them according to obvious findings such as education, skills, experience, etc. if the data flood is manageable and, above all, is correctly evaluated. Even the best candidate according to the documentation can suddenly disappear into the crowd due to the large number of misinterpreted or misunderstood criteria. And the best CV does not always belong to the best candidate. In the firm belief that even this last remaining human factor will finally be banned from selection processes, more and more tech companies and start-ups are trying to digitalize this dimension and control it with artificial intelligence. This is again done with mostly unsuitable methods and even before the process-enabled, existing digital data would have been correctly used and evaluated. The specialists and leading providers of technologies who have been dealing with serious and resilient processes and products in digital HR for several years now agree on this to a large extent – not only since Google entered this market segment. [5]

Big data limits knowledge development

So, more data really does not mean more knowledge. Knowledge must be structured, stored and validated. And people with the right expertise have to be involved. Caution is therefore called for in combating a flood of data that can no longer be structured and which results in random correlations. Alexander Wissner-Gross, a scientist at Harvard University and the Massachusetts Institute of Technology (MIT), summarized it interestingly, “Perhaps the most important news of our day is that datasets – not algorithms – might be the key limiting factor to development of human-level artificial intelligence.” [6]

So, it is above all the content of knowledge that is promising, not the amount of data from which this knowledge is to be extracted. In the end, it is promising and reassuring that only experts or tools based on real expertise in many important areas, such as medicine or recruitment, can make reliable and correct judgments. All this makes the hype about big data and AI in HR a little more bearable. And our mission at JANZZ.technology – “We turn big data into smart data” – is more up to date than ever.

[1] Brücher, Cornel. 2013. Rethink Big Data. Frechen: MITP-Verlag.

[2] Straumann, Felix. «Vieles ist blankes Marketing». Big Data. In: Tagesanzeiger (2018), Nr. 168, P. 32.

[3] Spitzer, Julie. 2018. IBM’s Watson recommended “unsafe and incorrect” cancer treatments, STAT report finds. URL: https://www.beckershospitalreview.com/artificial-intelligence/ibm-s-watson-recommended-unsafe-and-incorrect-cancer-treatments-stat-report-finds.html [2018.08.01].

[4] From video: Google Cloud Platform. 2017. Google Cloud Jobs API: How to power your search for the best talent (Google Cloud Next ’17). URL: https://www.youtube.com/watch?v=Fr_8oNKtB98 [2018.08.03].

[5] Watson, Christine. 2018. RecTech is creating more – not less – need for the human touch. URL: http://www.daxtra.com/2018/08/03/rectech-creating-more-need-for-human-touch/?utm_content=75449136&utm_medium=social&utm_source=twitter [2018.08.09].

[6] Alexander Wissner-Gross. 2016. Datasets Over Algorithms. URL: https://www.edge.org/response-detail/26587 [2018.07.27].