Why leading employment services and software providers are betting on ontologies.

Algorithms are out, datasets are in. Perhaps one of the crucial findings in data science today is that datasets – not algorithms – might be the key limiting factor to developing human-level artificial intelligence. This contention is especially true in the case of solutions for the labor and recruitment market. Many companies in the recruitment market and public employment services are taking notice and are investing in the ontology-based solutions of JANZZ.technology.

Therefore, we have taken a brief moment to lay out the underlying reasons why datasets have become so important. And what qualities these datasets must have in order for businesses to take full advantage.

Over the past years, machine learning as well as deep learning and evidence based systems have achieved remarkable breakthroughs. These have in turn, driven performance improvements across AI components. Perhaps the two branches of machine learning that have contributed most to this are deep learning, particularly in perception tasks, and reinforcement learning, especially in decision making. Interestingly, these advancements have arguably been driven mostly by the exponential growth of high-quality annotated dataset, rather than algorithms. And the results are staggering: continuously better performance in increasingly complex tasks at often super-human levels.

Machine learning thrives on patterns. Unfortunately, our world is full of an almost limitless number of outliers. The labor market in particular is intrinsically a very tough market for automated solutions. Neither job titles, nor skills or educations are in any way standardized across the world or even in a particular country. There is a lot of company, culture and geography specific language involved in the description of jobs and qualifications. Furthermore, implicit phrases like “relevant education” or “relevant experience” are all too common, making job descriptions hard to decipher for machines. Algorithms – even sophisticated ones – have a hard time dealing with such an amount of heterogeneity, implicitness and inconsistency. When one adds the factor or language, it gets even more complicated. Currently, most algorithms struggle to deal with any languages other than English.

Datasets on the other hand, in particular annotated datasets of high quality, can reflect and understand the full bandwidth of the labor market vocabulary. They can also deal with cultural and geographical particularities. However, not all datasets are annotated and of a high quality. Most datasets that companies or public employment services have at their disposal are legacies of the past and therefore often messy, incomplete or inconsistent. Nevertheless, they want to leverage the power of data to improve their applications. Therefore, they need a way to enhance their data with standardized, intelligent meta-data.

janzz.jobs_matching_800px

This is where ontologies come into play. An ontology formally represents knowledge as a hierarchy of concepts within a domain. Concepts are linked to each other through different relations. Through the relations that have been set and the location of the term in the ontology the meaning of a specific term becomes interpretable for a machine. An ontology is a dataset but it is of such high quality that it can also help improve the quality of other datasets.

JANZZ.technology focuses solely on the labor market and its ontology is the largest multilingual encyclopedic knowledge database in the area of occupation data, in particular jobs, job classifications, hard and soft skills and qualifications. The occupation and skills ontology can help companies and public employment services in many respects. It can serve as the basis of matching engines, parsing tools, natural language processing or classification tools, improving the results and learning of these tools significantly. More specifically, it can enhance job and candidate matching processes, CV parsing, benchmarking, statistical analyses and much more.

Positive that high data quality is going to create a competitive advantage for them, many stakeholders in the global labor market are currently investing in the solutions offered by JANZZ.technology. Above all its ontology. Data quality is becoming a focal point of competition in the digital labor and recruitment market.

Lost in Big Data? The Misguided Idea Ruling the Data Universe.

lost_in_big_data

“. . . In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it.[…]”

“On Exactitude in Science”
Jorge Luis Borges

Borges’s story imagines an Empire addicted to the idea of creating a perfect representation of its world. The fictional Empire has immersed itself completely in the task of creating a map that coincides with its land point for point. Today, I cannot help but think that we find ourselves in a very similar environment: data is profoundly changing our world and how we perceive it. We find ourselves in the midst of a data revolution so vast, pervasive and young that it is hard to take it all in. The impact of data is extending on a truly massive scale; we are striving to use big data to transform whole industries, from marketing and sales to weather forecasts, from medical diagnoses to food packaging and from the storage of documents and the use software to communication. Indeed, very much like Borges’ fictional Empire, we have come to believe that the more data we collect and analyze, the more knowledge we gain of the world and the people living in it. How foolish data maniacs we have become.

The conviction now prevails that big data delivers actionable insights into nearly every aspect of life. Philip Evans and Patrick Forth contest that “information is comprehended and applied through fundamentally new methods of artificial intelligence that seek insights through algorithms using massive, noisy data sets. Since larger data sets yield better insights, big is beautiful” (From their joint article in bcg.perspectives). Along these lines, our hunger for data is consistently increasing and our digital ecosystem is fueling it: sensors, connected devices, social media and a growing number of clouds continually produce new data for us to collect and analyze. According to a study by the International Data Corporation (IDC), the digital universe will about double every two years. From 2005 to 2020, the volume of data will grow by a factor of 300, to 40 zettabytes of data. A zettabyte has 21 zeros. In this world of exponential data growth, the ambition to accumulate data goes unchecked. As in Borges’ fictional empire, the outer limit is the scale of 1:1, a complete digital representation of our world.

Today, companies like IBM or LinkedIn are already pushing towards that limit. IBM is training its cognitive computing system called Watson to be able to answer virtually any question. In order to do so, IBM Watson is collecting unprecedented amounts of data to form an impressive corpus of information. The company just acquired Truven Health Analytics for $2.6 billion in cash, bringing to its health unit a major repository of health data from thousands of hospitals, employers and state governments across the US. It was the fourth major acquisition of a health data company in IBM Watson’s 10 month life span, showing just how important a digital representation of patients, diagnoses, treatments and hospitals is to the computer giant’s artificial intelligence system. LinkedIn’s vision is equally ambitious: they are creating an Economic Graph, which is nothing less than a digital mapping of the global economy. It aims to include a profile for every one of the 3 billion members of the global workforce. It intends to digitally represent every company, their products and services, the economic opportunities they offer and the skills required to obtain those opportunities. And it plans to include a digital presence for every higher education organization in the world. Yet, the endeavors of the two companies are but the tip of the iceberg. Their pursuit of building a complete digital representation of their respective fields is emblematic of a more general aspiration today towards a state of ubiquitous information.

The visions of companies like IBM Watson and LinkedIn are thus already evoking Borges’ imagined world. The forces of big data are converging and recreating the cartographic ambitions of the Empire of his story. The world is becoming self-referential. The digital representation of our world is expanding fast and at the outer limits, representation and reality are starting to coincide. The world and our picture of it are converging. Suddenly, we find ourselves in a world bearing a startling resemblance to Borges’ Empire.

How foolish – Borges’ story continues, calling into question the very purpose of such an immense representation. Whether cartographic or digital, a map of the scale 1:1 might not be as valuable as thought.

“[…] The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.”

In Borges’ fictional world, the next generations disposed of their forefathers’ map as they had not been gripped by the same ambition as their ancestors and recognized that the map of the scale 1:1 was useless. They left it to decompose and all that remained were the “tattered ruins” of the forebears’ map. The realization that a map of the scale 1:1 is practically pointless also echoes with our experience with the expanding data universe.  Professor Patrick Wolfe, Executive Director of the University College of London’s Big Data Institute, warns that “the rate at which we are generating data is rapidly outpacing our ability to analyze it.” Just about 0.5% of all data is currently analyzed, and Wolfe says that percentage is shrinking as more data is being collected. So we are also beginning to realize the impracticality of the masses of data that we are wielding. Rather than gaining exponentially more knowledge about our world through data, we are creating an entity that is in danger of slipping into oblivion through its sheer size.

In order to prevent our perpetually accumulating digital collection from suffering the same fate as Borges’ map – to be left to tattered ruins by our subsequent generations –, it is essential to draw actionable intelligence from it. Hence, the capacity to really understand the full complexity of the masses of collected data and to produce relevant knowledge from them will be the ultimate competitive advantage, today and even more so in the future.

While turning big data into smart or intelligent data is already being advocated by many, no patent solution has yet emerged about how to actually achieve this transformation. Today, applied mathematics, natural language processing and machine learning are equally weighing in the balance and replace every other tool that might be brought to bear. It is the idea that with enough data, the numbers speak for themselves. To reiterate what Evans and Forth said, “big is beautiful”. This idea informs the culture of Silicon Valley and by extension that of many ventures around the world.

Other methodologies like ontologies, taxonomies and semantics are completely disregarded in the current spirit of discovery. Where applied mathematics, machine learning and predictive analytics stand for size, ontologies, taxonomies and semantics stand for meaning and understanding. And while the latter might seem insignificant compared to the dimensions of the first, they will play no lesser part in determining the competitive fitness of companies. After the exponential growth of the digital universe over the last years, we have reached a degree of complexity that requires the insertion of a deep understanding of the matters at hand. Something that will not be achieved by collecting yet more data or with the implementation of an algorithm. Ironically thus, it is a change of direction away from “big is beautiful” that could really leverage the full power of big data.

Effective Data Curation for Occupation Related Data: How We Are Dealing with NAICS and ISIC.

The North American Industrial Classification System (NAICS) and the International Standard Industrial Classification (ISIC) are two landmarks on our way to master occupation data. The way we are curating the data from these two classifications is exemplary of our approach to put a deep understanding of jobs, skills and industries at the center of our recruitment/employment solutions. Hence, we felt it would be about time to give you a little more insight into how we deal with occupation related data, showing you the inherent complexity of the labor market and the difficulty in preparing occupation related data in a way that it can go on to drive some of today’s most powerful applications. For example public employment services, applicant tracking systems, statistical tools or job boards. Solutions that help alleviate some of today’s hardest problems on the global labor market.

NAICS and ISIC

The two industrial classifications are fairly complex structures in themselves. They also show a different approach to the classification of industries. When looking at an industry like street construction for example, NAICS lists a total of 38 different activities under “Highway, street and bridge construction”, among which you will find airport runway construction, highway line painting, pothole filling and guardrail construction. ISIC on the other hand is less detailed; it sums up the same industry in only three bullet points: asphalt paving of roads, road painting and installation of crash barriers and traffic signs. While ISIC contains less detailed information about activities, the underlying structure of the two classifications is the same. The International Standard Industrial Classification has provided guidance to countries in developing national activity classifications, hence most national taxonomies took over its general structure and filled it with country specific activities.

How JANZZ.technology enriches data from standard classifications

Now, what do we do with the thousands of activities and industries in these classifications? We connect each of the terms within the classifications with terms that are already in our ontology JANZZon!: not only related industries, for example other types of civil engineering in the case of “street construction”, but also occupations, skills, specializations and educations that belong within the realm of a particular industry. Also SSIC, the Singapore Standard Industrial Classification, adopts the basic framework and principles of ISIC. Including each of these industrial classifications into our ontology means having a greater level of detail and comprehensiveness at our fingertips than any of the taxonomies could provide on their own.

NAICS and ISIC street construction

Not only industries and activities are curated like that but also skills, educations, job titles etc. All these “data trees” are again interconnected. “Street construction” is related for example with the “road construction engineer”, the “roller driver”, “infrastructure planning” and “road surface marking”.

Sometimes, the denomination of skills, industries and specializations can be the same: for instance, “street construction” could also be a skill or specialization of a construction worker. In these cases, NAICS, ISIC and SSIC intersect with taxonomies of skills and competencies such as ESCO. Our ontology curation team adds these intersections and thereby creates yet more cross-relations and thus makes the ontology even smarter.
On the one hand, the ontology enriches the data from the standard classifications by establishing meaningful connections between occupations, skills, industries and so on. In multiple languages at that. On the other hand, another layer of detail is added to the taxonomies by including also real life data: data from job boards for instance. For taxonomies like NAICS and ISIC have become important tools for comparing statistical data on economic activities but the denominations used are not necessarily the ones used in CVs or jobs postings. By adding a wealth of synonyms, we make the data harvested from the taxonomies fit to be used not only for statistical purposes but also for job matching.
Finally, the effective curation of occupation related data is not only ensured by the breadth and detail of data that is entered into our ontology JANZZon! but also by the industry specific expertise of our team. Establishing meaningful relations between occupations, skills and education requires human experts in order to guarantee the high quality of the knowledge base. In a time when machine learning, smart algorithms and predictive analytics are often held as ubiquitous solutions to everything, we put a deep understanding of occupations, skills and industries back at the center of solving some of today’s hardest labor market issues.

JANZZ Mindsetter – Interview with Dr. Chia-Jung Tsay

JANZZ Mindsetter is about critical mindsets. It provides space for critical voices to offer insights into HR, recruiting, digital transformation, labor market issues such as gender and minority discrimination and many more topical issues.

Dr. Chia-Jung Tsay on biases against strivers

Dr. Chia-Jung Tsay (UCL School of Management) studies the psychological influences on decision making and interpersonal perception, and how expertise and biases affect professional selection and advancement. Dr. Tsay’s work has been published in leading academic journals and featured in media outlets including the BBC, Economist, Harvard Business Review, Nature, and NPR, and in television programs, radio stations, and newspapers across 48 countries. For us, she answered three questions regarding her latest work titled “Naturals and strivers: Preferences and beliefs about sources of achievement“.

janzz_mindsetter_tsay

How do you position your argument against the idea that hard work and perseverance are key to achieve success?

There’s a lot of great research out there that suggests that differences in achievement likely reflect deliberate effort and persistence, rather than only innate talent. So it’s interesting that we may have little awareness that we actually have a preference for the natural, and we even sacrifice objective qualifications to hire the natural – and yet it may well be the consistent and persevering individual who achieves more in the long run.

Why are we willing to give up better-qualified candidates in order to hire those believed to be naturals?

Delving into how/why the naturalness bias develops is of great interest for future research. One possibility is that we have a preference for potential over even demonstrated achievement. It is also possible that natural talent is attributed more to stable internal characteristics, and thus be perceived as an immutable, more authentic, and more certain path to success.

Your research suggests that our bias for natural talent is unconscious. How do you think this bias could be circumvented then, e.g. in recruiting?

Further work would be necessary to reveal more specific levers through which we may attenuate the effects of the naturalness bias. If the way in which this bias functions overlaps with those of more established biases, we may consider several possible solutions at the point of performance evaluation. These solutions might include ensuring more precise and tangible metrics of assessment, confronting evaluators with highly achieving exemplars of both naturalness and striving, allowing evaluators to have the time and cognitive resources to fully consider the metrics that are important and valued for actual performance, or simply filtering out any candidate application materials that reference sources of achievement.

JANZZ Mindsetter – Interview with Dr. Wen Hua

JANZZ Mindsetter is about critical mindsets. It offers space for critical voices to offer insights into HR, recruiting, digital transformation, labor market issues such as gender and minority discrimination and many more topical issues.

Dr. Wen Hua on gender issues in the Chinese Job market

Dr. Wen Hua has rich experience in research and international development in the field of gender. She obtained the M.Phil. Degree in Social Anthropology at University of Bergen of Norway in 2005 and received the Ph.D. in Anthropology at the Chinese University of Hong Kong in 2010. She was a visiting fellow of Gender Research Programme at Utrecht University of Netherlands in 2007. She has published several papers on gender issues in English and Chinese journals. She is the author of Buying Beauty: Cosmetic surgery in China, published by Hong Kong University Press 2013.

janzz_mindsetter_hua

Why do more and more Chinese women undergo cosmetic surgeries despite a plethora of reports on the possible side effects?

Since the reforms in the early 1980s, Chinas has been one of the fastest growing economies in the world. The uncertainty and instability created by the drastic and dramatic economic, socio-cultural and political changes in China have produced immense anxiety that is experienced by women both mentally and corporeally. The economic reform has resulted in fierce competition in the job market and produced much pressure on young women to get an edge to stand out in the fierce job market. Meanwhile, despite dramatic social changes, some traditional gender norms that prize women’s beauty over ability remain remarkably unchanged, which leads people to value women’s physical appearance in the workplace. The rapid social transitions lead people to grasp every opportunity presented, and cosmetic surgery is therefore viewed by some women as an investment to gain “beauty capital” for one’s future life in a rapidly changing and fiercely competitive society.

How does beauty matter in job recruitment in China?

In my book, I argues that some women view “Being good-looking is capital,” that is, an attractive appearance as a set of tangible and portable personal assets that are convertible into financial or social capital that can give them an edge in the fierce job market, where occupational segregation of female labor in the service industry and employment discrimination based on gender, appearance, height and age widely exist. In the past decade, it was not unusual that we saw that besides education background and work experience, job advertisements specified gender, age, marriage status, and even height and appearance such as “above-average looking,” “good-looking,” or “height over 1.65 meters.” Female job applicants, especially young graduates who already have fewer opportunities than their male counterparts, have to face more prejudice and discrimination based on appearance during their job-hunting. Within these fewer opportunities, when age and appearance matter, it is not surprising why some Chinese women regard beauty as a capital in the brutal competition for jobs.

What could be done in order to reduce the pressure on graduates to undergo cosmetic surgery?

Over the years, I saw that job advertisements, which require specific gender, age, marriage status, height and physical appearance, are less and less to be seen openly in job adverts. But I think that discrimination in employment still exists in China’s workplace. The discrimination has changed from overt to recessive, while the situation might be even worse because hidden prejudice and discrimination against women is harder to avoid and punish. According to the Third Survey of Chinese Women’s Social Status in 2010, more than 72 percent of women had a perception of “not being hired or promoted because of gender” discrimination. I think that to safeguard women’s rights and interests, the authorities should put more effort and effectively punish gender discrimination in employment, which can also reduce the pressure of graduates to undergo cosmetic surgery.

 

The World Economic Forum on the Future of Jobs

Are you prepared to meet the challenges in the global labor market that lay ahead? Is your company? The World Economic Forum’s Future of Jobs report highlights the widespread disruptions in the labor markets that will be caused by the developments in fields such as artificial intelligence, machine-learning, genetics and nanotechnology. The report found that “technological disruption is interacting with socio-economic, geopolitical and demographic factors to create a perfect storm in labor markets in the next five years”.

The technological innovations over the coming years will lead to an automation of tasks that are highly repetitive such as administrative and manufacturing tasks. At the same time, new jobs will be created by these innovations: most notably roles such as the data analyst, which companies expect will help them make sense and derive insights from the torrent of data generated by technological disruptions, and the specialized sales representative, as industries will have to get more skilled at explaining the value of their new products to outsiders. However, the jobs gained over the next five years will not be able to outweigh the expected losses. The report estimates that a total of 5.1 million jobs will be lost within the period of 2015-2020. What is worse is that the impact is not distributed evenly as routine white collar office functions as well as manufacturing and production roles are expected to be hit hardest – with a total loss of 7.1 million jobs. In contrast, 2 million jobs will be gained in highly skilled professions, predominantly in computer and mathematical, and architecture and engineering related fields.

Importantly, the report’s claim as to the net loss of jobs due to automation and technology is a highly contested one among economists. For instance, David Dorn, Professor of International Trade and Labor Markets at the University of Zurich comes to the conclusion that the two effects – the losses and the gains of jobs due to technological progress – will more or less balance each other out. However, also he argues that the jobs that are being created are not in the same pay bracket as the ones that are lost. Hence, Dorn perceives a divide that is widening more and more.

In order to be prepared to meet these challenges, companies need to build a new approach to workforce planning and talent management, where better forecasting data and planning metrics can anticipate the skills that will be needed to persist. According to the World Economic Forum, “HR has the opportunity to add significant strategic value in predicting the skills that will be needed, and plan for changes in demand and supply”. This means that companies will need expert tools that can generate actionable insights into the development of the labor market. Such tools could help companies make job training investments based on skills deemed seminal and job seekers could get customized suggestions to follow the best opportunities for advancement.

It is not surprising that evaluating which skills will be promising in the future or not is the hardest part. The Forum’s own in-depth analysis of industries, occupations and skills of the future proves that this is not as easy as it seems.

skills

The WEF’s list of top 10 skills for 2020 does hardly seem to reflect the scope of the proclaimed disruptions. The skill sets in 2015 and 2020 contain eight identical skills. Only the ranking has changed. For instance, creativity becomes much more important whereas negotiation loses relevance. More generally though, the skills are formulated so general that they could be assigned to almost any occupation. Indeed, the Word Economic Forum’s prediction reads more like a prophecy of the Delphi Oracle that is so pliable that it will come true in any case. Apart from listing seminal skills, the list does thus highlight the need for better analytical tools that can capture the complexity of the global labor market.

In order to efficiently analyze occupation data and to produce actionable insights that can prepare companies and governments for the disruption ahead, we need semantic tools that can provide context for skills and occupations on a global level. Tools that can make sense of different cultural understandings of a job. Tools that can make meaningful connections between different jobs and skills. Simply tools that bring the same or even a better understanding to occupations than we do.

JANZZ Highlights: How we started off 2016 successfully

2015 was an exciting and busy year for us, with projects in Europe, South East Asia and the Middle East. The complexity due to the many different languages, cultures and labor markets demanded a lot from our database maintenance team. Therefore, we are all the more proud, to have successfully mastered these projects and to have gained so much knowhow on occupation data. Our team and our central asset, our ontology JANZZon!, have learnt so much.

Occupational classifications

  • We have integrated a major part of the Indian occupational classification NCO-2004. That includes not only occupations in English but also in Hindi.
  • The entry of JSOC 2011 (Japan) and NOC 2011 (Canada) is soon completed
  • We are collecting over 14’000 jobs in Dutch, from the national Dutch classification BO&C. We are also enhancing this data with information from real life job postings.

LinkedIn Skills

As the search for the perfect matching talent or job on LinkedIn becomes more and more important, the significance of the skills you display on your LinkedIn profile increases. The network even advertises that members who register their skills will get four times more profile views. The skills users include on their profile also offer an opportunity to personalize job suggestions, adverts and search results more accurately. On the other hand, companies can search for job candidates according to their job title or skills.

Our ontology already included about 70% of all global LinkedIn skills. In order to achieve our goal “to master occupation data”, we have started to teach our ontology also the remaining 30% of these skills. For we are serious about really knowing all the skills in the world (The same is obviously also true for jobs).

Semantic Technology

Why is it so important, to include all these classifications and skills in our ontology? Why does it, for instance, not suffice that LinkedIn knows all the skills its users register? Our ontology not only registers these terms but it also interlinks them logically. In case of the LinkedIn skills, JANZZ provides significant added value through the interlinking of different languages, which makes LinkedIn’s skills comparable on a global basis. Hence, our ontology JANZZon! offers essential context and intelligent evaluation options for applications such as information systems, matching engines, job portals, CV parsers, statistical analysis and modelling tools and much more. The ontology becomes the means to utilize an enormous amount of data intelligently. Big data becomes smart data.