Laura: Hi everyone. In today’s episode, machines versus humans! Why do you think we invest so much into “human” power at JANZZ to curate all our data? Well, let me introduce you first to my guest today. She is Theodora Sykara Lekaki, one of our ontology team members. Welcome to the studio!
Theodora: Hi Laura, thank you for the invitation!
Laura: Thank you for joining!
Theodora, let’s dive right in. Intelligence is typically defined as “the capacity for learning, reasoning, and understanding”. What would you say distinguishes human intelligence from Artificial Intelligence?
Theodora: Well, artificial intelligence is not even remotely comparable to human intelligence…. but to give you an answer, I’d say that Artificial Intelligence lacks one of the key markers of intelligence: the capacity for understanding.
Laura: I agree, an AI system is basically a sophisticated “bag of tricks” designed to make us believe that it has some understanding of the task at hand.
Theodora: Exactly. You see, intelligence in the human sense requires a key cognitive capability: storing and using commonsense knowledge, which humans develop through a combination of learning and experience.
AI systems simply don’t have this fundamental requirement. And won’t achieve it in the foreseeable future. Especially not the currently widespread AI systems. They rely exclusively on statistical or machine-learning methods, which are, by the way, also called knowledge-lean systems. Statistics just don’t generate understanding.
Laura: Yeah, a knowledge-lean system doesn’t sound like it’s going to leverage commonsense knowledge anytime soon… How do we develop AI-based technologies here at JANZZ?
Theodora: We take a knowledge-based approach, which means that we feed these technologies with extensive human knowledge. And it’s precisely this human-generated knowledge representation that ensures we can develop technologies that actually deserve the name artificial intelligence.
Laura: Today’s topic is very broad so let’s focus on our own expertise; the world of semantic technologies related to human resources and recruitment… Why is your job at JANZZ so important?
Theodora: Well, especially in the field of HR or job searching, the reliability of data and evaluations is crucial which is why counting on a human team is necessary.
Laura: Could you give me an example?
Theodora: Sure, the typical example of how real people can be affected by underperforming AI technology happens all too often:
For example when we have perfectly suitable candidates for a job position and they are discarded by AI-based systems like ATS just because their resume doesn’t contain the exact keywords specified in the filter or is associated with false contexts. Something that would most likely never happen if a conscientious human recruiter had seen the CV.
Laura: Of course machines lack commonsense knowledge.
Theodora: Exactly and commonsense knowledge is absolutely essential when it comes to understanding natural language. We humans acquire this linguistic competence at an early age and can use it to discern the meaning of arbitrary linguistic expressions. Knowledge-lean AI models will never be able to do so, to the same extent because they work on a purely quantitative basis: their ‘intelligence’ is based on statistical approximations and memorization of text-based data.
Laura: So what do AI machines understand?
Theodora: AI, or more specifically, ML (machine learning) systems can, at times, sidestep the problem of understanding and give the impression that they are behaving intelligently. But they will never actually understand the meaning of words; they simply lack the connection between form, namely, language, and content: the relation to the real world.
Laura: So machines don’t understand the words they are processing.
Theodora: No, they don’t. Think about it; commonsense knowledge comprises an inconceivable number of facts about how the world works.
We, humans, have internalized these facts through lived experience and can use them in expressing and understanding language.
Laura: Yes we humans, simply don’t have to encode this staggering amount of knowledge. We effortlessly understand the world.
Theodora: Yes and precisely because this tacit knowledge is not captured systematically, knowledge-lean AI systems have no access to it.
Laura: So the connections are based purely on cooccurrence, on what shows up together often. And because these systems do not store the meaning or meanings of a word, they often have great difficulty in discerning the nuances of everyday language.
Theodora: Yes and there’s more Laura. Machines make connections that we just don’t understand! Which makes the results difficult, if not impossible, to explain. And explainability is really important when you’re dealing with HR tech because of its direct impact on real people’s lives.
Laura: Absolutely. So tell me how we use humans at JANZZ to make our AI products smarter and reliable?
Theodora: First of all, our machine learning is supervised by people. Let’s take our job and CV parsing tool as an example. The JANZZparser! relies on natural language processing, a branch of machine learning. but always combined with human input: Our data analysts and the team I am part of (the ontology curator team) carefully and continuously train and adapt the language-specific deep learning models, supervising at every relevant step. And we are very careful with the training data.
Laura: Aha… I understand that many automated tasks in big data require a significant amount of human labor and intelligence… So tell me more…
Theodora: NLP tasks are trained using our in-house corpus of gold standard training data which we hand-select. Gold standard means that this collection must be the most accurate, reliable and unbiased dataset you can get. So a fair amount of our work involves collecting, combining, and annotating work-related data. And making sure that the corpus as a whole represents the highly diverse world of jobs, skills and other related concepts as accurately as possible, in particular also covering areas where online data is rarer.
Laura: And of course, this is very important because to train language models we require VAST amounts of texts, mainly from the internet. And if we would just feed this data to the machine without human filtering, the bias in the data – and therefore in the algorithms – would be huge!
Theodora: Yes, this is why even the most advanced language model is still biased. even though teams of experts have gone to great lengths. to try and tackle this problem through tuning the algorithms.
Curating these training data requires strong reasoning skills and real-world understanding, as well as problem-solving and creative thinking. Reasoning is already a huge challenge for machines and real-world understanding… Well, that’s just impossible for a machine without substantial human input. So there is no way around it: If you want reliable, explainable, unbiased AI systems, you need people.
Laura: Makes sense. So we use people to supervise our machine learning models and to collect and annotate the right training data. What else do we use people for in our AI technologies?
Theodora: Well, keeping with our example of the JANZZparser!, another very important step is that the parsed information is normalized and contextualized, which is key for further processing like matching or analytics. To do this reliably, the parser really needs to understand the content in some sense. This is where our hand-curated ontology JANZZon! comes in.
Laura: Yes, we have talked about our ontology before on this podcast, but let me remind our listeners that JANZZon! is the most comprehensive multilingual knowledge representation for job-related data that exists worldwide.
Theodora: Yes, this knowledge base that can be read by machines (so machine-readable) contains millions of concepts such as occupations, skills, specializations, educations and experiences that we humans manually link according to their relations with each other.
This is basically how we convey the context of a given concept to our machine learning models. The model can access the ontology to look up a concept and can see how it is linked to other concepts, how strong those links are, what type of relation two concepts share, and so on. So it’s a machine-readable representation of our knowledge of all these work-related concepts.
Laura: Theodora, you work mainly on Greek ontology. Could you give some practical examples of why these links/connections/relations are needed?
Theodora: Yes! For example, in Greek, we use the same phrase to describe a middle school math teacher and a professor of mathematics. So, if you were searching for a math professor in Greek using only keywords, you would come up with various irrelevant results.
At JANZZ, we teach our tools to understand the context of a phrase and then make informed decisions that significantly improve the accuracy of our search.
This is why we need the linguistic expertise and the cultural understanding of our people. Humans can read between the lines, and understand nuances and cultural differences. By adding all these aspects to our knowledge representation, we achieve accurate matching.
Laura: Interesting! Could you tell me more?
Theodora: Inside the ontology, the concepts are logically organized under main branches or categories, such as occupations, skills, industries and several others. These main branches are central to the matching process. Now, each concept has several codes and labels assigned to it, that further optimize the matching accuracy.
Laura: OK, so, like occupations, skills, experiences, and this kind.
Theodora: Yes, and each concept is annotated with various attributes like translations in different languages or classification codes.
Laura: Yes, we have over 100 official occupation classification systems, such as ISCO-08, the International Standard Classification of Occupations or ESCO, which is the multilingual classification of European Skills, Competences, Qualifications and Occupations.
Theodora: Yes, and we also have available over 60 other standardized reference systems as well as a variety of occupation classes.
Laura: What are these exactly? The occupation classes…
Theodora: That’s a type of classification we’ve introduced to quantify how specific a given job title is. Some job titles are very specific like Android App Developer, right?
Laura: Right, the job title says it all.
Theodora: Exactly, we don’t really need much additional information to match such a job to a person. But a title like a Consultant or a Manager. carries very little matchable information. It’s just too vague, or unspecific.
Laura: I imagine we could find such roles in practically ALL industries…
Theodora: Yes, and for a precise matching or classification, we would need additional information like, as you just mentioned, the industry or others like specialization, skills experiences, etcetera.
This is why at JANZZ we created these occupation classes. We use them to influence the weighting of the relations between concepts in the knowledge graph and in the matching process.
So, when comparing a job posting and a resume: the less specific the job title, the more weight other factors are given by the matching algorithm.
Laura: Ok, now I understand how we manage to get more accurate results for any kind of job title or role…BUT, how are the concepts in the knowledge representation related?
Theodora: Every concept is part of a vast map and all these concepts are interconnected in a meaningful way that reflects common understanding and job market practices.
Going back to the math professor example, a professor of mathematics is linked to very different skills, diplomas, experience and even soft skills from a middle school math teacher.
Every connection in our ontology is a conscious decision taken after careful consideration and is based on our understanding of language, culture and, most importantly, the real world.
Again, because our ontology is a representation of human knowledge.
Laura: So if I understand it correctly, we need to ensure that it contains the connections we humans make, our common understanding and contextualization of the world, as opposed to the unpredictable connections a machine might make.
Theodora: Exactly, as you can imagine, there is a great level of analytical thinking involved, and we need to consider a variety of factors before making even the smallest alterations. That’s why humans are important!
Laura: Yes, that is why we need curator experts like you Theodora. At JANZZ we count on the knowledge of 10 curators for every software engineer or data scientist we have. and thanks to the diversity in the team, we can cover more than 40 languages and almost 100 different industries!
Theodora: Yes, everyone in the team brings a lot of experience and knowledge from different fields and on top, we also cover different cultures and languages.
Laura: Cross-lingual and cross-cultural differences are one of our key success factors. If you want to learn more about our human team of experts visit our website or contact us. We will for sure come back with more episodes soon. And if you’d like to read more about knowledge-lean and knowledge-based AI, check out our article on the topic. I’ll leave the link on the description of this episode. Thank you Theodora for bringing your intelligence to the podcast today!
Theodora: A pleasure Laura!
Laura: Stay tuned and goodbye!