The Podcast

JANZZ.technology – The Podcast

Season 2

Season 2, Episode 3

Why not purely machine-generated? It’s just common sense…

Discover in this episode how and why JANZZ.technology uses human intelligence to create artificial intelligence that truly deserves its name.

JANZZ.technology – The Podcast

Contributor: Theodora Sykara Lekaki, Ontology Maintenance & Support, JANZZ.technology
Host: Laura Flamarich

View transcript.

Laura: Hi everyone. In today’s episode, machines versus humans! Why do you think we invest so much into “human” power at JANZZ to curate all our data? Well, let me introduce you first to my guest today. She is Theodora Sykara Lekaki, one of our ontology team members. Welcome to the studio!

Theodora: Hi Laura, thank you for the invitation!

Laura: Thank you for joining!

Theodora, let’s dive right in. Intelligence is typically defined as “the capacity for learning, reasoning, and understanding”. What would you say distinguishes human intelligence from Artificial Intelligence?

Theodora: Well, artificial intelligence is not even remotely comparable to human intelligence…. but to give you an answer, I’d say that Artificial Intelligence lacks one of the key markers of intelligence: the capacity for understanding.

Laura: I agree, an AI system is basically a sophisticated “bag of tricks” designed to make us believe that it has some understanding of the task at hand.

Theodora: Exactly. You see, intelligence in the human sense requires a key cognitive capability: storing and using commonsense knowledge, which humans develop through a combination of learning and experience.

AI systems simply don’t have this fundamental requirement. And won’t achieve it in the foreseeable future. Especially not the currently widespread AI systems. They rely exclusively on statistical or machine-learning methods, which are, by the way, also called knowledge-lean systems. Statistics just don’t generate understanding.

Laura: Yeah, a knowledge-lean system doesn’t sound like it’s going to leverage commonsense knowledge anytime soon… How do we develop AI-based technologies here at JANZZ?

Theodora: We take a knowledge-based approach, which means that we feed these technologies with extensive human knowledge. And it’s precisely this human-generated knowledge representation that ensures we can develop technologies that actually deserve the name artificial intelligence.

Laura: Today’s topic is very broad so let’s focus on our own expertise; the world of semantic technologies related to human resources and recruitment… Why is your job at JANZZ so important?

Theodora: Well, especially in the field of HR or job searching, the reliability of data and evaluations is crucial which is why counting on a human team is necessary.

Laura: Could you give me an example?

Theodora: Sure, the typical example of how real people can be affected by underperforming AI technology happens all too often:

For example when we have perfectly suitable candidates for a job position and they are discarded by AI-based systems like ATS just because their resume doesn’t contain the exact keywords specified in the filter or is associated with false contexts. Something that would most likely never happen if a conscientious human recruiter had seen the CV.

Laura: Of course machines lack commonsense knowledge.

Theodora: Exactly and commonsense knowledge is absolutely essential when it comes to understanding natural language. We humans acquire this linguistic competence at an early age and can use it to discern the meaning of arbitrary linguistic expressions. Knowledge-lean AI models will never be able to do so, to the same extent because they work on a purely quantitative basis: their ‘intelligence’ is based on statistical approximations and memorization of text-based data.

Laura: So what do AI machines understand?

Theodora: AI, or more specifically, ML (machine learning) systems can, at times, sidestep the problem of understanding and give the impression that they are behaving intelligently. But they will never actually understand the meaning of words; they simply lack the connection between form, namely, language, and content: the relation to the real world.

Laura: So machines don’t understand the words they are processing.

Theodora: No, they don’t. Think about it; commonsense knowledge comprises an inconceivable number of facts about how the world works.

We, humans, have internalized these facts through lived experience and can use them in expressing and understanding language.

Laura: Yes we humans, simply don’t have to encode this staggering amount of knowledge. We effortlessly understand the world.

Theodora: Yes and precisely because this tacit knowledge is not captured systematically, knowledge-lean AI systems have no access to it.

Laura: So the connections are based purely on cooccurrence, on what shows up together often. And because these systems do not store the meaning or meanings of a word, they often have great difficulty in discerning the nuances of everyday language.

Theodora: Yes and there’s more Laura. Machines make connections that we just don’t understand! Which makes the results difficult, if not impossible, to explain. And explainability is really important when you’re dealing with HR tech because of its direct impact on real people’s lives.

Laura: Absolutely. So tell me how we use humans at JANZZ to make our AI products smarter and reliable?

Theodora: First of all, our machine learning is supervised by people. Let’s take our job and CV parsing tool as an example. The JANZZparser! relies on natural language processing, a branch of machine learning. but always combined with human input: Our data analysts and the team I am part of (the ontology curator team) carefully and continuously train and adapt the language-specific deep learning models, supervising at every relevant step. And we are very careful with the training data.

Laura: Aha… I understand that many automated tasks in big data require a significant amount of human labor and intelligence… So tell me more…

Theodora: NLP tasks are trained using our in-house corpus of gold standard training data which we hand-select. Gold standard means that this collection must be the most accurate, reliable and unbiased dataset you can get. So a fair amount of our work involves collecting, combining, and annotating work-related data. And making sure that the corpus as a whole represents the highly diverse world of jobs, skills and other related concepts as accurately as possible, in particular also covering areas where online data is rarer.

Laura: And of course, this is very important because to train language models we require VAST amounts of texts, mainly from the internet. And if we would just feed this data to the machine without human filtering, the bias in the data – and therefore in the algorithms – would be huge!

Theodora: Yes, this is why even the most advanced language model is still biased. even though teams of experts have gone to great lengths. to try and tackle this problem through tuning the algorithms.

Curating these training data requires strong reasoning skills and real-world understanding, as well as problem-solving and creative thinking. Reasoning is already a huge challenge for machines and real-world understanding… Well, that’s just impossible for a machine without substantial human input. So there is no way around it: If you want reliable, explainable, unbiased AI systems, you need people.

Laura: Makes sense. So we use people to supervise our machine learning models and to collect and annotate the right training data. What else do we use people for in our AI technologies?

Theodora: Well, keeping with our example of the JANZZparser!, another very important step is that the parsed information is normalized and contextualized, which is key for further processing like matching or analytics. To do this reliably, the parser really needs to understand the content in some sense. This is where our hand-curated ontology JANZZon! comes in.

Laura: Yes, we have talked about our ontology before on this podcast, but let me remind our listeners that JANZZon! is the most comprehensive multilingual knowledge representation for job-related data that exists worldwide.

Theodora: Yes, this knowledge base that can be read by machines (so machine-readable) contains millions of concepts such as occupations, skills, specializations, educations and experiences that we humans manually link according to their relations with each other.

This is basically how we convey the context of a given concept to our machine learning models. The model can access the ontology to look up a concept and can see how it is linked to other concepts, how strong those links are, what type of relation two concepts share, and so on. So it’s a machine-readable representation of our knowledge of all these work-related concepts.

Laura: Theodora, you work mainly on Greek ontology. Could you give some practical examples of why these links/connections/relations are needed?

Theodora: Yes! For example, in Greek, we use the same phrase to describe a middle school math teacher and a professor of mathematics. So, if you were searching for a math professor in Greek using only keywords, you would come up with various irrelevant results.

At JANZZ, we teach our tools to understand the context of a phrase and then make informed decisions that significantly improve the accuracy of our search.

This is why we need the linguistic expertise and the cultural understanding of our people. Humans can read between the lines, and understand nuances and cultural differences. By adding all these aspects to our knowledge representation, we achieve accurate matching.

Laura: Interesting! Could you tell me more?

Theodora: Inside the ontology, the concepts are logically organized under main branches or categories, such as occupations, skills, industries and several others. These main branches are central to the matching process. Now, each concept has several codes and labels assigned to it, that further optimize the matching accuracy.

Laura: OK, so, like occupations, skills, experiences, and this kind.

Theodora: Yes, and each concept is annotated with various attributes like translations in different languages or classification codes.

Laura: Yes, we have over 100 official occupation classification systems, such as ISCO-08, the International Standard Classification of Occupations or ESCO, which is the multilingual classification of European Skills, Competences, Qualifications and Occupations.

Theodora: Yes, and we also have available over 60 other standardized reference systems as well as a variety of occupation classes.

Laura: What are these exactly? The occupation classes…

Theodora: That’s a type of classification we’ve introduced to quantify how specific a given job title is. Some job titles are very specific like Android App Developer, right?

Laura: Right, the job title says it all.

Theodora: Exactly, we don’t really need much additional information to match such a job to a person. But a title like a Consultant or a Manager. carries very little matchable information. It’s just too vague, or unspecific.

Laura: I imagine we could find such roles in practically ALL industries…

Theodora: Yes, and for a precise matching or classification, we would need additional information like, as you just mentioned, the industry or others like specialization, skills experiences, etcetera.

This is why at JANZZ we created these occupation classes. We use them to influence the weighting of the relations between concepts in the knowledge graph and in the matching process.

So, when comparing a job posting and a resume: the less specific the job title, the more weight other factors are given by the matching algorithm.

Laura: Ok, now I understand how we manage to get more accurate results for any kind of job title or role…BUT, how are the concepts in the knowledge representation related?

Theodora: Every concept is part of a vast map and all these concepts are interconnected in a meaningful way that reflects common understanding and job market practices.

Going back to the math professor example, a professor of mathematics is linked to very different skills, diplomas, experience and even soft skills from a middle school math teacher.

Every connection in our ontology is a conscious decision taken after careful consideration and is based on our understanding of language, culture and, most importantly, the real world.

Again, because our ontology is a representation of human knowledge.

Laura: So if I understand it correctly, we need to ensure that it contains the connections we humans make, our common understanding and contextualization of the world, as opposed to the unpredictable connections a machine might make.

Theodora: Exactly, as you can imagine, there is a great level of analytical thinking involved, and we need to consider a variety of factors before making even the smallest alterations. That’s why humans are important!

Laura: Yes, that is why we need curator experts like you Theodora. At JANZZ we count on the knowledge of 10 curators for every software engineer or data scientist we have. and thanks to the diversity in the team, we can cover more than 40 languages and almost 100 different industries!

Theodora: Yes, everyone in the team brings a lot of experience and knowledge from different fields and on top, we also cover different cultures and languages.

Laura: Cross-lingual and cross-cultural differences are one of our key success factors. If you want to learn more about our human team of experts visit our website or contact us. We will for sure come back with more episodes soon. And if you’d like to read more about knowledge-lean and knowledge-based AI, check out our article on the topic. I’ll leave the link on the description of this episode. Thank you Theodora for bringing your intelligence to the podcast today!

Theodora: A pleasure Laura!

Laura: Stay tuned and goodbye!


Covered in this episode:
JANZZ.technology: ‘So clever I don’t understand a word of what I am saying’ – AI’s potential for handling text-based data is far from unlimited.
Read more.

Season 2, Episode 2

More on matchmaking

Join us in this follow-up episode to find out more about what we need to get close to the perfect (job) match. Which aspects to consider – and which NOT to consider.

Listen now.

Contributor: Yahel Appenzeller, Ontology Maintenance & Support, JANZZ.technology
Host: Laura Flamarich

View transcript.

Laura: Hi everyone. Welcome back! Today we continue with part 2 of our Matching episode and this is why next to me I have our Ontology Curator Yahel Appenzeller!

Yahel: Hi everyone!

Laura: Thank you for joining us again!

In the last episode, we were saying that when it comes to skills matching, many, many details need to be considered. One category of information we didn’t mention last time is that related to personal characteristics, i.e. information like name, age, nationality, gender, religion, civil status, appearance, etc … and I want to point out that we don’t use that kind of data at all. When it comes to matching, what matters are the skills and other purely job-related characteristics we were talking about in the last episode. Personal characteristics are at best irrelevant and at worst insert bias into the matching process.

Yahel: True Laura. Fighting discrimination in the hiring process is a very important matter that we haven’t mentioned yet. And it’s not just the potential candidates that benefit… Nowadays businesses and the recruiting industry are placing increased emphasis on company branding to attract talent. At the same time, inclusive hiring strategies have become very important to the public. So, deploying bias free matching in the hiring process puts hiring companies in a very favorable light.

Laura: Good point!

Yahel: And when using AI technology, unbiased, explainable matching is not just nice to have, it’s going to be mandatory in Europe very soon with the upcoming AI Act. Which means it’ll also be mandatory for any company that operates and recruits globally.

Laura: Yes, we had a post on this topic recently, discussing the EU white paper “On Artificial Intelligence – A European approach to excellence and trust”. It’s about using AI in Human Resources… I’ll put the link to that post on our website janzz dot technology in the description of this episode. We’ll be dedicating a podcast episode to this topic soon, too…

Yahel: I’ll stay tuned then!

Laura: Excellent! And picking up where we left off last time… We were saying that we need to take care not to compare apples with oranges. Right Yahel?

Yahel: Exactly, let’s look at an example. Say we have a manager who is open-minded, communicative, strong in leadership and good at solving problems.
Without a meaningful context for this person’s role as a manager, we don’t know which industry the potential candidate is in. Retail, construction, finance, clothing… it could be anything. Without more specific information, the candidate would probably match to vacancies in any one of these industries, despite each such job most likely requiring specific industry insights or knowledge. There is no relevant experience to put the skills into a meaningful context.

Laura: So it’s not enough to just consider skills, we also have to take experience into account.

Yahel: Yes, experience is another part of the puzzle…and, by the way, the most complicated entity in terms of “matchability”. However, adding experience to the mix is still not enough to accurately match people to jobs and applicants to positions. As I mentioned in the last episode, there may also be other, let’s say, formal requirements like education or licences, which clearly need to be taken into account as well. But we need more.

Laura: What else is needed?

Yahel: Personality! Aspects like skills, experience and formal requirements don’t determine whether the new copywriter will fit into the team well, whether the new nurse will arrive at the hospital on time, or the new PR executive will perform well under pressure. A match only becomes truly successful if the applicant’s personality is considered too. My resume details a wide range of things I have done, but how I have done them is just as relevant.

Laura: So if we take job title, skills, experience and personality of a candidate into account, do we get the perfect match?

Yahel: It depends on what you match this against. After all, when you have a perfect match, the skills and personality of a new employee will complement the skills and personalities of co-workers. If I’m the only software engineer in a company, I need to be an all-rounder and take initiative with ease. If I am hired in a team with two others – one of them more familiar with field X, the other with field Y, both introverts – I’ll need to bring skills that complement and enhance those of the team as a whole. That way, our collaboration can create something completely new.

Laura: Oh, that’s a big challenge, how is it possible to know the team (if there is one) or the environment where a candidate will be working? It’s just not possible, right?

Yahel: Some clues may be found in the information contained in a job posting, but in general, this data isn’t readily available. That’s why matching software usually focuses on matching candidates and jobs without taking the team, the environment, etc. into consideration at all. This means you might get the perfect match for the role, but a very poor match for the team or company.

Laura: So can this be fixed? If you’re expected to fit in well with a team, then the prospective co-workers should also influence the perfect match.

Yahel: That’s right, Laura. Ideally, the co-workers’ profiles should be matched as well. And although this is perfectly feasible, there are surprisingly few matching solutions out there that actually incorporate this. But it’s definitely an interesting approach. With explainable matching software, you could even reverse-engineer this. You can create a profile for the team, based on their tasks and responsibilities as a collective, then use the matching for a gap analysis. Once you see the missing skills, experience and other aspects, you can determine what qualities you should be looking for in a new team member.

Laura: That sounds awesome! So what’s the catch?

Yahel: Well, in a nutshell: Because the problem is so complex, there’s simply no one-size-fits-all algorithm. There’s a big difference in matching a nurse or a kindergarten teacher, a marketing manager or a picker packer. So we need very different matchings based on the different activities, roles and occupations, surrounding teams and company environment. This is why we use multiple approaches at JANZZ. And then there’s the individual expectations for the outcome. Matching is driven by expectations and expectations change constantly. So we can only evaluate all the factors as far as possible in order to best approximate the perfect match. But one thing is sure: the only way matching isolated data fragments through arbitrary keywords will produce the perfect match is by pure coincidence. It’s like picking a candidate out of a hat. Except that would at least be cheaper…

Laura: I think I’d prefer a good approximation…

Yahel: Then you should go with a matching engine that considers all these factors we’ve been talking about – in an appropriate way: skills, experience, personality and former job titles. If these criteria are properly interpreted and weighted appropriately, you’ll get the best possible starting point to bring people and jobs together using technology.

Laura: You say ‘weighted appropriately’. What do you mean by that?

Yahel: How much emphasis is put on the various criteria in the matching process. For instance, the more senior the position, the more important experience becomes. If we think about a career starter, then education or other criteria play a more important role. Hiring managers and recruiters generate varying sets of must-have and nice-to-have criteria depending on the specific position. And job seekers have their own set of preferences and requirements too.

Laura: So perspective matters in matching too?

Yahel: Yes. It‘s quite possible that a candidate thinks they‘re a good enough match for a position and are happy to apply for it. Perhaps partly because the workplace is close to their home and they like the brand.
It can look quite different from the perspective of the potential employer. They‘re interested in quality over quantity and want candidates that meet their requirements as closely as possible – which often don‘t include criteria like how long the candidate‘s commute is. When we perform comparative matching tests for POCs or benchmarking, clients often look at conversion metrics like did this candidate actually apply for this position? But even a candidate that, looking at the job-related data we discussed, could be a perfect match from the employer’s perspective may not apply because of personal preferences, company branding, the later steps of the recruiting process like automated interviews or in-depth assessments etc. So this is more a question of behavioral analysis and in a sense, market research on the entire recruiting process. Matching as we view it is just one part of this.

Laura: Do you think, possibly by extending the scope of the matching, that this technology will replace recruiters anytime soon?

Yahel: I imagine it’ll be another few decades before these kinds of algorithms can identify the perfect hire on their own and truly replace the human element in the recruiting process. But I’m not sure if that should even be the goal. Sure, you can reduce bias significantly if you go about it the right way, and machines tend to be faster and cheaper than humans. But do we really want machines to make hiring decisions for us? Decisions that affect people’s lives to this extent? As far as we’re concerned here at JANZZ, our technology isn’t about getting rid of humans. It’s about getting rid of mistakes. We simply want to enable more efficient, unbiased processes that help recruiters and hiring managers to make better decisions and people get the jobs that fit them best… no more and no less…

Laura: Thank you so much Yahel. This ethical aspect is another super interesting topic in itself. We’ll be talking about that in an upcoming episode this season.

Yahel: You’re welcome!

Laura: Thank you for listening and be sure to tune in for the next episode of our podcast. Goodbye!


Covered in this episode:
White Paper on Artificial Intelligence: a European approach to excellence and trust.
Read more.

Season 2, Episode 1

It’s a MATCH!

But not like the ones you can get on dating apps… In this new episode, Yahel Appenzeller from our Ontology Curator team helps us understand how matching works – one of JANZZ.technology’s core competences.

Listen now.

Contributor: Yahel Appenzeller, Ontology Maintenance & Support, JANZZ.technology
Host: Laura Flamarich

View transcript.

Laura: Hi everyone and welcome to the second season of this podcast! Today’s episode is about matching! And to speak about it I have with me Yahel Appenzeller from our Ontology Curator team!

Yahel: Hi everyone!

Laura: Matching is one of our core competencies, and no, we don’t make dating apps like Tinder or Bumble. But I bet if we did, the chances of finding the love of your life would be way higher than with the current apps…

Yahel: Haha. Well, I’m not sure how exactly matching works in dating apps but I’d imagine that the challenge of finding the right fit for a vacancy or a job candidate is a little more clear cut than finding a true life partner… What I do know is that, unlike dating apps or many job search sites, JANZZ matching never generates complete mismatches.

Laura: See, this is why you came today to the podcast, to speak about that. Wouldn’t the world be better if we could set some filters and parameters in an application and swipe for jobs or candidates that fit?

Yahel: It would definitely save time and effort for recruiters and employment services.

Laura: When I think about it…. finding a job is already a job in itself. And it’s an ordeal we all go through at some point in life – or even several times…

Yahel: Yes and not just for people looking for jobs. Recruiters, headhunters, hiring managers and employment agencies spend hours going through resumes and applications too. And with the current tight labor markets across the globe, finding and retaining talent is key – but to retain talent, you need to make sure the candidates are a great match from the start. And when it comes to finding candidates – recruiters are typically faced with either far too many applicants, or far to few…

Laura: Right, and nowadays we all have access to a vast amount of information… you can search for jobs and candidates all over the world!

Yahel: Yes, the market is now global and thus much larger – and much more visible. As a result, attractive positions get a lot of applicants and attractive candidates get a lot of offers. The question is, how many of those applicants or offers really are a good fit? Regardless of whether you use a national or global site, be it a job site or a professional networking site, or any other source, the results are in many cases inadequate, if not downright irrelevant. A lot of this has to do with the fact that the majority of search algorithms are based on fairly crude keyword-based matching, which typically not only produces a slew of irrelevant results, but also misses many good results because of the myriad vocabularies people use. There is also the challenge of varying interpretations of the different terms, and the many variations in how different aspects are weighted. What’s important in a candidate for me might be very different for you – even if we’re looking to fill the same position. The same goes for different people looking for the same type of job. There are also plenty of hidden expectations, like criteria that aren’t mentioned explicitly. All these aspects combined make matching a highly complex task and the right match very much a moving target.

Laura: Makes sense. And as you said, matching good candidates is very important. So how can applicants, recruiters or companies get a suitable match?

Yahel: Well, first let’s clarify that matching is the act of pairing entities that suit each other in some predefined sense, in our case a job and person.

Laura: Ok.

Yahel: Even in this context, the word ‘matching’ can have various meanings. In some jobs, whether a candidate is suitable for a given job can be fairly simple. If you are physically healthy, for example, have some stamina and can show up on time, you should be able to pick strawberries. There are other jobs, however, that require a variety of certificates, specializations and experience. Try to match candidates to a position as neonatal surgeon in a hospital and this becomes clear. For some positions, experience is more important, for others the training, the skills, the soft skills or the values… And the importance of the various criteria can also change depending on the stakeholders. So the process gets tricky pretty quickly.

Laura: True, there are a lot of details to be considered during the matching process…

Yahel: And the prevailing conditions are constantly changing. Requirements that were commonplace yesterday no longer apply today, and in turn, today’s requirements will no longer be valid tomorrow. The same is true for the jobs themselves.

Laura: Of course, who would have said that you could be a content creator for TikTok now? And who would have cited such specialization in his or her CV?

Yahel: How we define jobs, prospective employees and the labor market shifts all the time and this is a huge challenge when a machine has to deal with the task. Because there are very few constants and little to no universal “truths”.

Laura: Can a machine or an algorithm satisfy our matching expectations?

Yahel: Machines have to apply all the experience and knowledge of the recruiting specialist, in much the same way, paying attention to the smallest details, filtering the relevant information and adapting to changes in the labor market. As I said at the beginning, talent and businesses also use quite different vocabularies and descriptions, place emphasis on different aspects, have diverging interpretations of skills and levels of skills, and so on. And the datasets we want to match, say a candidate profile and a job description, are often very asymmetric. Meaning that one tends to carry a lot more information or criteria than the other. So there are multiple dimensions of complexity that come into play and need to be resolved by the software, or the underlying machine or algorithm. note

Laura: Right.

Yahel: Suppliers of this kind of software focus on different dimensions of the data to resolve this highly complex task. For example, former job titles of applicants or the associated skills are taken into account. An algorithm then compares job postings and resumes, and a match is made. Successful?

Laura: A match based on former job titles or skills? I don’t know – you tell me…

Yahel: Imagine, if a candidate had position X at company A, they can also hold position X at company B, right?

Laura: Yes?

Yahel: This may have held true in the past, to some extent. For a long time, jobs and job requirements were fairly clear cut, with more or less standard job titles like bricklayer, medical secretary, patent lawyer, janitor, etc. Today we have happiness heroes instead of customer service operators, digital overlords as website managers, and accounting ninjas as financial managers. So your algorithm needs to know what all these new, at times pretty obscure job titles correspond to. And there are challenges even with more traditional job titles: Is a sales consultant someone who works in retail and advises customers? Or someone who prepares offers, takes up orders and negotiates contracts with customers? Do they sell carpets or sensors for digital twin systems in manufacturing? Many job titles are either too generic, too specific or too obscure. And the titles companies come up with often describe functions as opposed to occupations. Without a more detailed description of the position and its context, we wouldn’t know whether an applicant is really suitable for a position, or vice versa – so how would a machine know?

Laura: Well, context definitely makes a big difference, I can see that. And much has changed in individual jobs, too, over time.

Yahel: Exactly. So, a good match also depends on how up to date or relevant a candidate’s experience is. After a certain number of years, your experience in some jobs might not be relevant anymore, or not as relevant, because the skillset has changed, the technology, the research, or the market. Whereas for other jobs this might not be an issue. If you were a programmer in the 80s and then switched careers, that experience will certainly not be relevant now. But if you were an experienced legal secretary in the mid 2000s, you will most likely still find a good match in a similar role. But of course, more recent experiences should carry more weight..

Laura: How can machines then make these distinctions efficiently?

Yahel: Well… Some job matching providers solve the matching problem by using other parameters – they look at skills and competencies since these represent the ‘content’ behind job titles. So, the matching algorithm considers the candidate’s skills and the skills required for a job and matches them. Skills-based or competency-based matching is more meaningful and thus promises better results because it takes into account not only a title previously held by an applicant but also that person’s knowledge, talents, insights and education. And many skills are transferable from one role to another, which can widen the talent pool in a tight market.

Laura: So the key is the skills.

Yahel: They certainly play a very important role. And apart from widening the talent pool for recruiters, skills-based matching also opens up opportunities for new career paths. For instance, a project manager could transition to a leadership role like general manager, or become an account manager, a business developer, or a consultant. With a few additional skills, an engineer could switch to technical project management, and a good lawyer could be a great lobbyist.

Laura: I see your point. Skills like critical thinking, management and leadership, analytical skills and creativity are transferable to other management and leadership positions.
So, can we say that skills are a reliable factor for machines to evaluate the perfect match for a vacancy?

Yahel: Well, it depends. When talking about skills, we need to make sure that we’re not comparing apples with oranges. The sales skills of a retail store assistant are very different to those of sales executive in med tech industry. This issue certainly has to be addressed before you can get meaningful results from a skills-based matching engine.
As I said, skills certainly play a very important role. But so does knowledge, education, and experience. And then there are jobs that require authorizations or licenses, like medical professionals or lawyers. All these various elements need to be taken into account – in the right context and with the right weighting. Something a good recruiter or hiring manager will do – consciously or not. So now all these different aspects need to be explained to a machine. And we haven’t even started on what it takes to match a candidate to a team…

Laura: Ok, I’d love to know more and I bet I’m not the only one. Let’s dive into this in the next episode.

Yahel: Let’s do that!

Laura: Thank you very much for coming.

Yahel: A pleasure!

Laura: And to our listeners, thank you too. Be sure to join us for our next conversation with Yahel.


Host: Laura Flamarich

Season 1

Season 1, Episode 7

Analyzing skills data

Skills are getting a lot of attention these days – from businesses, government institutions, policymakers, researchers, and many others. But what exactly is a skill? In this episode, we speak with Stefan Winzenried, the CEO and founder of JANZZ.technology, about skills data.

Listen now.

Contributor: Stefan Winzenried, CEO and Founder of JANZZ.technology

View transcript.

Laura: Hi everyone! Welcome to another episode of this series of podcasts from JANZZ.technology. This is Laura Flamarich and today we’ve invited Stefan Winzenried again, the CEO and founder of JANZZ.technology – this time to talk about skills data. Hi Stefan!

Stefan: Hello!

Laura: Let’s jump right into today’s topic. Skills are getting a lot of attention these days – from businesses, government institutions, policymakers, researchers, and many others. Gartner has called them “the new currency for talent”, and there have been many, many posts on reskilling, upskilling, future-proof skills, top post-COVID skills, etcetera, etcetera. I’ve heard that especially these posts, even the ones from seemingly reliable sources like Forbes or the World Economic Forum, are a bit of a red rag to you, Stefan – why is that?

Stefan: For one thing, these posts hardly ever provide any kind of information as to what data their claims are based on. And the ones that do are almost always based on data that’s really biased. Take the LinkedIn reports on in-demand skills. Every time they’re published, you see countless articles and posts reproducing these lists blindly, just taking them at face value. Nobody stops to think about the data behind them, even though it’s clear that, for instance, blue-collar professions and industries are massively underrepresented on their network. The reports generate this huge buzz about how we need to reskill and upskill everyone to become IT professionals just because LinkedIn says that blockchain and cloud computing are most in demand. But that’s just not true.

Laura: So what is in demand?

Stefan: Well, before we even think about answering that question, we need to work out something much more fundamental: what exactly is a skill? Off the top of your head: what do you think a skill is?

Laura: Erm… I’d say a capability… something you learn and you are really good at.

Stefan: Ok. So with that definition, being a quick study isn’t a skill because it’s not something that you can learn. But being able to tie a knot in a cherry stalk with your tongue is.

Laura: (laughs) Yeah, ok, but that’s probably not very relevant for anyone’s job..

Stefan: Exactly. But being a quick study can be. Ok, the cherry stalk example is a bit silly, but it turns out that finding a robust definition of skills is quite a challenge. And there are many, many definitions floating around. For instance, ESCO, the European job and skills classification system, defines a skill as an ability to apply knowledge and know-how to complete tasks. Which includes, say, being able to use specific tools or software. But O*Net in the US, on the other hand, makes a distinction between these kinds of abilities and what they call skills. And Indeed summarizes anything that can be useful in a job under the term skill: competencies, abilities and even knowledge.

Laura: I see, there really is a lot of variation.

Stefan: Yes, and those are just the differences in the definition of a skill as a general concept. Then you get all sorts of interpretations when it comes to concrete examples. What are project management skills? Or carpentry skills? Those terms can mean very different things depending on who you ask. And all these variations will cause discrepancies in data collection and analysis. Along with many other serious challenges.. So, to get back to your original question as to what skills really are in demand: With the techniques most commonly used today, especially those that rely on online data, like data from online job postings or professional networks, it’s almost impossible to say. What we do know is that when you look at studies based on more representative data like the ManpowerGroup Talent Shortage surveys, it’s positions in skilled trades that are hardest to fill – and has been for years, along with truck drivers and healthcare professionals. Which suggests that we should be training people for these occupations – where they clearly don’t need knowledge in blockchain and cloud computing…

Laura: But most reports on skills determine a really high demand for digital skills in some shape or form. How does that compute?

Stefan: One reason is that most of these reports are based on data from online job postings or professional networks, which tends to be severely flawed for several reasons. First off, not all job vacancies are advertised online, in fact, certain markets are so dried out that vacancies aren’t advertised at all. And some types of jobs are more likely to be advertised online than others – as it seems, typically jobs that require “digital skills”. Whatever that’s supposed to mean..

Laura: So the data is biased.

Stefan: Yes, severely so. Very often, large firms in certain industries are completely overrepresented, even though they usually make up only a small part of the labor market. In Switzerland, for example, over 90% of all jobs are in small to medium sized firms. In fact, the majority of them are in small companies. Which have completely different roles, hierarchies and therefore skill sets than large companies. And they use a different language and different terms, too. Take “HR generalist”. That’s a typical term that you rarely find in small companies, because it is obvious that that’s what they’re looking for. And if it does show up, it’s understood differently than in a large company. And this also affects the desired or required skills, experience, etc… And there are many more problems that make OJA data mostly useless or at least not very meaningful when they’re not addressed. To put it bluntly, this crystal ball of OJA data is about as reliable as the ones at fairs…

Laura: OJA as in online job advertising.

Stefan: Right. The thing is, the many companies and institutions that handle OJA data and use it to derive, say, predictions and training programs from it are very rarely aware of the true extent of these issues.

Laura: So what other issues are there with online data?

Stefan: Well, apart from bias and confusion around what actually constitutes a skill, there’s the issue of duplicates. There’s a lot of overlap in the various sources of online job postings and tackling deduplication really is far from trivial. Another, very underestimated issue is granularity.

Laura: What do you mean by that?

Stefan: The level of detail – both in the gathered data and in the communicated results. Digital skills, for instance, is a classic example of grouping more and more skills together until you end up with a completely meaningless umbrella term. Of course there’s a high demand for digital skills if you summarize everything from being able to use digital devices over handling social media accounts professionally to programming in Java. But what are you going to do with this? You can’t perform sound analyses with data this coarse. It’s useless for any kind of meaningful statistics or matching, let alone defining hiring strategies or policymaking.

Laura: I see your point. So why do you think this is done? I mean, if the data’s collected, say, from online job postings, then surely it has a higher level of detail, right? I’ve never seen the term digital skills in a job ad…

Stefan: Yes and no. The level of detail in online job postings varies considerably, and not just depending on the country or sector. Even across postings for the same profession in the same country, you’ll find anything from page-long detailed accounts of all responsibilities and required skills and qualifications, to a post with the same information encoded implicitly in three sentences. But the real clustering usually happens when the collected data is processed. It needs to be standardized to make the data points comparable. And this standardization is typically based on simplified classifications or taxonomies instead of leveraging comprehensive ontologies with a high level of detail. For instance, the ESCO and O*Net taxonomies are used in a lot of these projects. ESCO currently has about 13,500 skills concepts and O*Net about 9,000. Sounds like a lot, right?

Laura: Right!

Stefan: But it actually isn’t. Our ontology, for instance, includes over a million skills concepts. So if you use ESCO or O*Net, you already lose a lot of detail. But it’s not just about the number of skills. More skills doesn’t necessarily mean better information. It’s more about having a way to compare the right things with each other. If you just standardize and cluster detailed information into oblivion, you end up comparing apples and oranges without even realizing it. And these mistakes get carried along and multiplied over all downstream processes. But some processes are only possible if you have the context and the degree of complexity of a skill in its description. And that’s why you need a much larger number of mappable skills.

Laura: Can you give an example?

Stefan: Take knowledge of Tensorflow as a skill. Tensorflow is a software library used for machine learning and AI. Now if you standardize this using ESCO, the closest you’ll get is the term utilize machine learning. But that’s an umbrella term for a whole host of skills and knowledge. Not just all sorts of other software libraries as well, like Pytorch. But also things like specializations in the different branches and subfields of machine learning: supervised or unsupervised, deep learning, and so on. How are you going to find the right talent for your project efficiently with a term that broad? Or design effective L&D strategies and government training programs? It’s just useless.
(pauses) But of course, if you just want to generate attention with your results, then oversimplification is definitely the way to go. A short list of buzzwords is much easier to digest.. So after standardizing it, the data is often clustered and simplified even further.

Laura: Haha, ok. So it’s not just the taxonomies then.

Stefan: No sensible skills taxonomy would list “digital skills” as an individual skill, they do provide more detail than that. But bias and granularity aren’t the only issues with skills data. Remember I said that we don’t have a common understanding of skills, as a general concept?

Laura: Sure.

Stefan: I also pointed out that the same goes for the individual terms we use to describe a supposedly concrete skill. If I give you a term like project management, you’ll have an idea of what that is based on your own knowledge and experiences. So you’ll give the term some kind of meaning. And I’ll do the same, based on my knowledge and experiences. We agree that we both know what project management is – and then human nature kicks in: We assume we’re talking about the same thing.

Laura: And… we’re not?

Stefan: Nine times out of ten? No, we’re not. And the fact that different people have different notions of any given skill is a huge issue for data collection and analysis. Let’s take another example. One of the most common so-called skills required in job postings anywhere on the planet, and included in most taxonomies, is use Microsoft Office. This may sound like a fairly specific skill at first, but the informative value of this term is zero.

Laura: Really?! How so?

Stefan: It’s completely unclear which applications in this large family of software are meant, and to what extent a person is supposed to be able to use them. If you think about what an employer is looking for when they use this term in a job ad, what they actually want is a whole set of much more concrete skills and knowledge, depending on the job description. You might need to be able to structure a document, or create auto-calculating spreadsheets or good presentations – which requires skills in storytelling and visual communication. And the skill set will typically be very different for an office help in a small business compared to a marketing specialist in a large corporation. So basically, saying someone can use Microsoft Office is about as helpful as saying they can use a toolbox. But that doesn’t stop it from showing up in over 80% of all job ads and close to 100% of all resumes and CVs…

Laura: I’d never thought of it like that… But a lot of this implicit information can be worked out from the context, right?

Stefan: Humans can – to a certain extent and with the right background knowledge. But for any kind of meaningful analysis, we’re talking about extracting both explicit and implicit skills from huge data sets. For that you need an AI-based tool that actually understands the content and the context of, say, a job description. Which is really only possible if it has access to an extensive knowledge representation that includes information not only on skills or jobs, but also education, work experiences, certifications and much more. As well as required levels and the complex relations between all these different concepts.

Laura: This is starting to sound like a huge challenge. Wouldn’t it be easier to collect data on jobs instead? And then work out the skills demand using… standard skill profiles for jobs?

Stefan: It would – if there were such a thing as a standard skill profile for a given occupation. But several studies, including one we did here at JANZZ, show that there’s just too much variation: there’s national and regional differences, differences across industries, even across teams within a single business. The job description often factors in what skills are already covered by other team members, which will change the required skill set for that particular point in time. Or the company advertising the job is specialized in a very specific activity that requires a different skill set. Take carpentry, for example. A company that specializes in cabinetry and furniture production will have little use for a carpenter skilled in drywalling and roof carpentry, and the common denominator in terms of skills is just too small. So there’s no way around collecting data on the skills themselves.

Laura: Ok, so we need to collect skills data. But to recap, the data’s typically incomplete and biased, the extracted skills are taken out of context and often generalized into meaninglessness and in fact, we can’t really agree on the notion of a skill in the first place.

Stefan: Exactly. And despite these issues, people draw all sorts of wild conclusions from their shaky data, and propagate unfounded claims on current – or even worse: future skills demands. And if this kind of information comes from a seemingly reliable source like, say, the World Economic Forum, the ILO or the World Bank Group, then chances are, the results will be used as a basis for far-reaching decisions. Like which training projects a government should allocate funds to, for student career counseling or even company recruiting strategies.

Laura: So it could have pretty dramatic consequences for the labor market, right?

Stefan: Yes! Looking at the current interpretations flying around, they are encouraging politicians to carry on potentially misallocating billions in funding for upskilling and reskilling in the wrong areas. And actively point even more young talent away from occupations and industries in dire need of new talent: skilled trades and construction, nurses, care workers and more – many of which are still clearly “future proof”. Just look at the situation in the UK with truck drivers. The very obvious failure to attract new workers has had dramatic systemic consequences. And yet, the key players in government and businesses are still going on about upskilling and reskilling workers for a digital world. Completely undeterred. It’s just bizarre..

Laura: So how can we do better?

Stefan: For one thing, we need to move away from easy statements and generic lists of glorified buzz skills and towards differentiated interpretations and communication – even if it is less sexy. But for that, we need to gather the right data. We have to start by somehow reaching a common understanding, which means agreeing on a definition of a skill, standardizing skill designations and levels. And we need to determine key skills: skills that are truly relevant to the job in question. Which requires analyzing skills in their context – a challenge that knowledge-lean systems based purely on machine learning will never overcome – and understanding that the most frequently mentioned skills aren’t necessarily the important ones, that many key skills are implicit. We need to understand the limitations of data from online sources and gather and provide additional information when this data falls short. In short, we need to generate smart, unbiased data for smart, unbiased decisions.

Laura: On that note, thank you Stefan for joining us.

Stefan: Thank you too.

Laura: We have covered a lot of ground today, but we could carry on for hours. This really is the hot topic at the moment and we will certainly delve deeper in the next season – and some upcoming whitepapers, I believe…

Stefan: Right!

Laura: This episode finishes this first season of the podcast. Follow us on Linkedin or on your favorite podcast platform to make sure you don’t miss the next season.


Host: Laura Flamarich

Season 1, Episode 6

Smart data – Unlocking the value of your data assets

You know that new data alone does not necessarily equal new intelligence. In this episode Jennifer Jayne Jakob, Technical Writer and Solution Documentalist deconstructs and delves into the world of Smart Data.

Listen now.

Contributor: Jennifer Jayne Jakob, Technical Writer & Solution Documentalist, JANZZ.technology

View transcript.

Laura: Hi everyone, welcome to another episode of JANZZ.technology’s podcast. With me today is Jennifer Jayne Jakob, our Tech Writer.

JJ: Hi!

Laura: JJ, you recently wrote about how important data is these days for any successful organization. So why is that?

JJ: Well, the main idea is that big data is supposed to help organizations make better decisions based on actual evidence, i.e. backed by data – rather than just following gut feelings or assumptions.

Laura: But it is all about just having more and more data?

JJ: No. I mean, of course you need a lot of data if you want to draw any kind of meaningful conclusions from it. For instance, if you want to know why your customers prefer product X over product Y, you wouldn’t just ask three random customers and then base your strategy decisions on that. You’d want to ask as many as possible. Right?

Laura: Right!

JJ: And you’d want to find out what factors influence their decision. Back in the days before the internet became commercial, this information was gathered in time-consuming market research studies, customer surveys and the like. But at some point, when businesses moved online and mass data collection and storage became feasible, people realized that they could track not just what customers bought, but how they navigated through a site, how they’re influenced by promotions, reviews and page layouts and all sorts of other behavior. So they started gathering data – more and more and more data about more and more aspects of customer behavior. The thing is businesses don’t just grow simply because they’re sitting on a mountain of data. You have to do something with that data, it needs to be analyzed. And you clearly can’t analyze hundreds of terabytes of data by hand. You need analytic tools, you need algorithms, machine learning models, the works.

Laura: Sure, data’s not magic. You need tools to do something with it… But surely the more data you feed those tools, the better the outcome, right?

JJ: It’s not quite that simple. New data itself doesn’t necessarily equal new intelligence – no matter how good your tools are. You see, big data is messy. If you don’t have the right strategy, your data can end up just being a great big, expensive, useless jumble. Put bluntly, even the best tools can’t turn a stinky mess into sweet-smelling roses.

Laura: Hahaha okay, got it. So if more isn’t more, then what is?

JJ: Well, let’s go back to the old-school market research studies, pre-internet. They were about collecting data too. But no one would’ve just collected all sorts of random data on their customers. Apart from being really inefficient, any market researcher will tell you that to get meaningful insights, you need a well-designed study. You’ll want to ask the right people the right questions and somehow organize the results so that you can compare them and verify them, etcetera. And we’re speaking about a time when data was analyzed more or less exclusively by human, intelligent beings. These days, we want data to be understood by machines, or algorithms, as well. And contrary to what many people think, algorithms – even AI-based algorithms – are actually pretty dumb, at least to start with. They’re much faster than humans, sure, but if you want them to be smart in any sense of the word, you need to feed them smart data.

Laura: What do you mean by that?

JJ: Well, think of an AI or machine learning model as a child. Under the right circumstances, children love learning. They soak up any information you offer and constantly try to make connections and find patterns to understand how all sorts of things work. And their understanding grows with the feedback they get. But if you just throw anything and everything at them, unfiltered random titbits taken out of context, they’ll get overwhelmed and confused pretty quickly. They can also develop biases and prejudice. So ideally, if we want kids to grow up to be smart, unprejudiced human beings, we need to give them information that’s somehow curated: the right amount, on appropriate topics, put into context, with the right annotations – in other words, we want to pass on our knowledge. And the same is true for AI or machine learning models.

They need curated, smart data to become what we want them to be: models that can find and use the most impactful data faster to learn more accurately how the world works and provide meaningful insights to us. That way, we and our businesses can really use these tools and all that data to move towards truly evidence-based action instead of just spending our days guessing.

Laura: Ok, so I get why we need this smart data, but I’m still not quite sure what it is…

JJ: Smart data is data that’s based on standards and explicit semantics, i.e. on the actual meaning of concepts. It’s both machine- and human-interpretable, linkable, contextualized, and reusable. It’s neither redundant, unnecessary, nor duplicate. And its value is that it can be used directly to answer specific business needs or to accomplish pre-defined outcomes.

Laura: Good, so if we want our business to become data-driven, what do we need?

JJ: The first thing you need is a well-designed data strategy. Collecting data is not about filling a magic pot and expecting whatever comes out to grow your business. Think about what you want to do with the data first. What information is valuable to your business, what do you need to know for your decision-making processes? In other words, what is smart data for you? This will determine what data you collect and how, your data architecture, the team you need to get it done.

Laura: So use the outcome to decide the income.

JJ: Exactly! Smart data needs to be right at the start of the design process, this really can’t be stressed enough. If you just start collecting data without a good strategy, you’re very likely going to end up paying a lot of money for a highly skilled data team to spend around 80% of their time cleaning and renaming json files. And just think about the storage for possibly terabytes of unproductive and redundant data. On top of being really expensive, it’s not exactly environmentally friendly…

Laura: Ok, I’m sold. So how do we get smart data or how do we convert big data into smart data?

JJ: As I mentioned before, you need to have a clear understanding of what your individual smart data looks like from the get-go. And go for complete control of your data structure. Then you need to find the right data sources and aim to really collect only the data that’s truly relevant to your use case. Most raw data from these data sources will need to be cleaned, verified, and standardized, so you should make sure to design your data ingestion with a clear vision of your input data in mind. You’ll want to automate this as much and as well as possible. You’ll want to add filters and annotations to the data too. The crucial step, though, is contextualizing the data.

Laura: Meaning?

JJ: Suppose you have a car. One thing you’ll want to measure is the motor temperature because this will give you information on how that motor is doing – you definitely don’t want your engine to overheat. But rising temperature alone does not necessarily imply impending doom and engine failure. Higher speeds cause higher temps, too. So contextualizing temp data with speed data can offer more insight. And if your engine really does overheat, you’ll probably want to know why so that you can fix it. So you could add data from sensors in the cooling system, maintenance history, etc. All these additional layers of data will give you a multi-dimensional context along with useful insights to help diagnose issues and correct unfavorable behavior. And the more dimensions you have, the more relationships you’ll see and the more context you can create – giving richer, more meaningful insights.

Laura: Sounds great in theory, but how does it work in the real world? How are you supposed to track down all the relationships for each datapoint?

JJ: Definitely not by hand! Basically, what we’re talking about here is saving data as knowledge as opposed to simply saving information. So your contextualization tool has to somehow or other be able to draw on knowledge. In our motor example, this tool needs to know what all the different factors like temperature, speed, cooling elements are and how they interact with each other. And because the tool is just a mindless tool that knows only what we tell it, we need to feed it expert knowledge in the form of bits and bytes. So all those different data tags need to be mapped into a kind of model of interconnected dots that the tool can read and act on. Or in more fancy terms, the application needs to have access to a knowledge representation in the form of a domain-specific knowledge graph – or ontology.

Laura: Ontologies! We spoke a lot about them in the last episodes of this podcast. Tell me more… how does this combination of humans and machines work?

JJ: Well, as I said, they are a way of representing a certain area of knowledge in a way that machines can use it. The more “intelligent” – and I really mean that in inverted commas – your application needs to be, the more important it is to have a good representation of the knowledge it needs. In these kinds of applications, knowledge is typically represented as a directed graph. This is a collection of nodes, or dots, representing the individual concepts in your domain of interest, for instance temp, speed, cooling elements and the like in the motor example, and edges, or lines, connecting the nodes. The edges represent the relations between the concepts. And the graph, so, the collection of nodes and edges, is directed. This means that the edges have a direction (think of an arrow), because the relation typically doesn’t go both ways. For instance, increasing speed leads to rising temperature, but rising temperature doesn’t necessarily mean the speed has or will increase. The application can then walk along this graph, learning the concepts and how they relate to each other.

Laura: And where do humans come in?

JJ: Someone has to build this knowledge representation! There are a lot of knowledge graphs or ontologies out there that are based solely on machine learning and other automation techniques. But generally speaking, the outcome with these is typically what you’d expect from the blind leading the blind. The really good ones are primarily hand-curated by a team of subject matter experts – humans! – with some support from supervised machine learning systems, say, to help integrate new or changed real-world data. Like our job and skills ontology here at JANZZ.technology!

Laura: So – maybe following with the JANZZ.technology example – how do the domain experts help? Why are human-curated ontologies better than machine-generated ones?

JJ: Unlike machines, human subject matter experts are aware that there are such things as subtleties and grey areas, ambiguities, implicit skills, typos, etc. For a job and skills ontology, these experts can go and research an obscure new term like “data ninja” to work out what kind job this is, whether it corresponds or relates to one that’s already in the ontology, what skills you need for that job, and so on. Of course, machine learning algorithms can be trained to work out some of this too, but so far, the quality is simply not comparable.

Laura: Why do you think that is?

JJ: One of the key reasons is that machine learning is based on statistics. So it can only ever be about recognizing patterns and guessing the statistically most probable meaning. This can go comically wrong. Take, for instance, the word pen. The most common use of this word is as a writing utensil. So what do you think a machine learning system would do with the sentence “the sheep is in the pen”? Even a 3-year-old child would get the right meaning immediately. That’s because we humans have common sense and common knowledge and can make use of this to actually understand the meaning. And remember, ontologies are about representing knowledge. You simply can’t have knowledge without meaning.

Laura: But ontologies are not a new thing, right?

JJ: Right, ontologies have been around in AI research for the last 40 years and they’ve had their ups and downs like many other approaches. But there are a lot of powerful ontologies out there for specific domains, for instance, the Financial Industry Business Ontology (FIBO) has been around for quite some time. We’ve been developing and improving our job and skills ontology here at JANZZ.technology for the past decade now. And there are numerous other ontologies for healthcare, geography, systems engineering, you name it. I’d say one of the main drivers of the current comeback of ontologies was probably the launch of Google’s knowledge graph in 2012, which is actually not quite the same thing. Still, these days all the big players are using ontologies in all sorts of domains to organize and contextualize their information to boost performance: social media networks, streaming providers, electronics manufacturers, supply chain logistic companies. They’re even used in forensic accounting. These organizations have caught on to the fact that big data alone is not enough, and that, for successful AI-based technologies, smart data is simply indispensable. And this is where ontologies come in: by representing contextualized information, they’re basically smart data management systems. That can drive performance by feeding the machines with real knowledge in readable form to produce meaningful and actionable insights.

Laura: So, in a nutshell, start using applications based on knowledge graphs if you want to unlock the full potential of your data.

JJ: Exactly

Laura: Thank you, JJ for joining me today!

JJ: My pleasure, Laura.

Laura: Thank you for listening and goodbye!


Host: Laura Flamarich

Season 1, Episode 5

Education Zones – Bridging the gap between candidate education and employer requirements in online job matching

Anyone who uses online job search services has undoubtedly encountered seemingly obvious or even ridiculous mismatches. Many of these mismatches are based on the inadequate handling of education-related information, as JANZZ ontologist Timothy Sperisen explains in this episode.

Listen now.

Contributor: Timothy Sperisen, Ontology Maintenance & Support, JANZZ.technology

View transcript.

Jingyu: Hi, welcome to another episode of JANZZ podcast. Today we invited the co-author of one of our white papers, “Education zones – Bridging the gap between candidate education and employer requirements in online job matching”, Timothy Sperisen.

Hi Tim, welcome!

Tim: Good day everyone. Glad to be here.

Jingyu: Tell us about your education and what you do at JANZZ.

Tim: After finishing my Master of Accounting and Finance at University of St.Gallen and adding the teacher certificate at the University of Zurich, I have found myself working at JANZZ curating the ontology with emphasis on education for the last 3 years and a half.

I have looked carefully after the education sections in our occupation and skill branches as well as taken care of the education branch itself in our ontology. Especially the latter process is always ongoing because we always have new educations coming in that need to be categorized and classified correctly.

Jingyu: What motivated you to write this paper?

Tim: As JANZZ wants to stay ahead of the competition, we have to tackle the challenges that haven’t been tackled before. The matching process is full of such challenges, for example, the proper management of skills and soft skills in the ontology or the matching algorithm based on the ontology – those are huge challenges, out of my league.

So, since I am coming from an educational background, I focused on solving a challenge connected to the issues amongst education. As a consequence, we had a go at one of the oldest unsolved issues of the job matching: the education gap.

Jingyu: What kind of challenges or issue did you observe?

Tim: Let me ask you a question in return.

Jingyu: Ok.

Tim: Have you ever found your exact education wanted in a job ad?

Jingyu: No, not that I can remember.

Tim: So, that is where we start. Companies tend to use very general phrases when it’s about the desired education, whereas the CV’s of the job applicants show very clear and precise titles of their completed educations. So it is very difficult, nigh impossible, to match job requirements from the employer’s side and educations from the potential employee’s side one to one. They literally don’t match.

Jingyu: So the problem you tackle is between the candidate education and employer requirements. Could you make an example here, so we can understand better how this mismatch occurs?

Tim: There are two classic mismatcheSun: On one hand, the too wide requirement and on the other hand the too narrow requirement. Both are a challenge for the job-matching AI.

Let’s make it more concrete and start with “too wide”: If a possible employer just asks for “a bachelor degree”, AI may serve them a Bachelor of Theology as well as a Bachelor of Chemistry or IT for potential candidates. Because they’re both Bachelors, and that is all that is being asked.

Now for the “too-narrow” issue: if a potential employer asks for “a degree in tourism or related field”, AI may serve them only Bachelors of Tourism for potential candidates and ignore many others due to keyword matching because many other degrees that could be used in tourism are not titled anything like tourism..

Jingyu: I see, to address the challenges, JANZZ.technology has created the concept of Education Zones, can you explain to us how Education Zones work?

Tim: In our ontology, we add an additional classification to every education. These classifications allow us to mark any education (like Bachelor of Computer Science) as part of one (or several) Education Zones, but not part of other specific education zones.

Jingyu: What criteria do you follow to divide the Education Zones?

Tim: if you look at the “Tourism” industry, it quickly becomes clear that many ways lead to a career in Tourism. Tourism is not a classic field of education but a multifaceted zone that unites many different educational backgroundSun: no entrance exams, no necessity of degree like for example in some Medicine fields. So our aim is to catch what could be considered a valuable entry to work in tourism.

Jingyu: So (following this example), which educations are ignored to enter tourism that we should consider valuable?

Tim: For example, there are Cooks, Hospitality Managers, Travel Agents. None of them has the word tourism in their degree or education titles, but they all end up in the tourism sector, so it has to be our aim to make these connections visible for the JANZZ AI and avoid the errors connected with keyword matching. This way, JANZZ can construct Zones filled with different educations related to fields like “Tourism” or “Computer Science” and thereby prevent its matching process from proposing educations outside this zone for the offered job.

Jingyu: So for labelling the education, you have to add the label to both cv and job description sides, is it correct?

Tim: I think the answer to that question is yes. We can add the information to both parts (CV and job description) via our ontology that works in both directions and gathers the information coming from both sides.

The concept of Education Zones is most beneficial when companies want to use general phrases like “educational background in …”, which is a very popular case. And at the same time, they are well-suited to help the matching algorithm. The Education Zones provide a more effective categorization of educations, creating the basis for better and more accurate matches.

Jingyu: Very impressive, when you were creating these Education Zones, which part did you find the most challenging?

Tim: The creation of the zones and the exact categorization of the according educations. As you know, JANZZ’ ontology is already providing a huge and very well-organised data set consisting not only of educations and occupations, but also of skills, industries, authorizations, licences, specialisations and much more information. But at the same time, we have to keep pushing by developing more instruments and connections to increase the quality of our matching. The more information the ontology contains, the better instruments we need to extract the maximum out of it in terms of matching. Education Zones is only one of many approaches we use at JANZZ in our mission to continuously improve the job matching.

Jingyu: Is this process of creation of the zones and the exact categorization a purely manual process?

Tim: The creation itself is purely manual data entry, yes. The implementation later requires more complicated integrations and calculations, but for a closer look at this, we would need another entire podcast. Through Education Zones, we generate helpful additional information that can be used as a filter or a funnel in order to prevent mismatches or make matches that would normally be overlooked.

Jingyu: Are you in the progress of adding more features to the education zones or matching improvement in terms of education part?

Tim: Right now, the biggest challenge is trying to make sure that we organise the huge variety of educations as well as possible. This sounds easier than it is because even within educations, the same word in the same language can mean a different thing in another country. Best example is the spanish “Bachillerato” which can mean very different things depending on the country

So, the challenges in the education sector are never ending. We strive to continuously integrate all possible concepts and to put them into the right relations so our matching results can be the best possible.

And to add one more point, sometimes experience and education are overlapping, especially for someone who has already left university for over 20 years. In this case, education doesn’t matter that much anymore and experience should be taken into consideration.

Jingyu: What do you see the benefit of having such a concept like education zones?

Tim: I assume that it gives JANZZ the advantage in matching because we have thought of things and twists that the competition hasn’t.

Jingyu: Thank you Tim for joining us today.

Tim: Thank you to you.

Jingyu: You can find the link to the white paper on the description of this chapter or on our website. Thank you for joining us today. We encourage you to continue listening to this podcast, to understand more about the challenges of working with skills, competencies and qualifications among others. Subscribe and see you soon!


Host: Jingyu Sun


Covered in this episode:

JANZZ.technology white paper: Education Zones – Bridging the Gap Between Candidate Education and Employer Requirements in Online Job Matching. Read more.

Season 1, Episode 4

How to generate meaningful skills and job data.

If you are at a loss on how to generate useful data for job searches, this episode is for you. Join Alejandro Jesús Castañeira Rodríguez, Principal Data Scientist, as he delves into the complexities of mastering occupation data.

Listen now.

Contributor: Alejandro Jesús Castañeira Rodríguez, Principal Data Scientist, JANZZ.technology

View transcript.

Laura: Hi everyone, welcome to a new episode. To tackle today’s topic: how to generate useful data for job matching, I have with me Alejandro Jesús Castañeira Rodríguez, our Principal Data Scientist.

Alejandro: Hi everyone, thank you for having me.

Laura: At JANZZ.technology, we are dealing with occupation data every day and Alejandro is our go-to person when it comes to innovative approaches and data analysis.
Alejandro, please, tell us a bit about your job… what do you do as a principal data scientist at JANZZ.technology?

Alejandro: As a data scientist, I work through all stages of the product development cycle, starting with data collection, as we collect our own training data to train our deep learning models, this is done in a Swiss way, where we have specialized domain curators who collect data on a daily basis where I help organize and monitor this process.

Once the data is collected, I clean and normalize it, and after that, I perform the training of our Deep Learning models for the Human Resources domain. A big part of the work consists of constantly keeping our technology on par with the current state of the art, so I constantly research new deep learning models and test new architectures. If they work better than the ones we use, we update them.

I also work in other areas of the development cycle such as the deployment and maintenance of our services. Also, some of our B2B customers have very specific and complex requirements, so I assist them by developing customized APIS that meet their needs.

Laura: Our topic today is how to generate useful data for job matching, can you give us an example of useless data?

Alejandro: For example, the raw text of a job description, a resume or a profile from an HCM system, if it is not structured in a usable way, then it is just text, so if you use data like this for comparison, you will basically just be comparing text against text. Also, in this case, you will get a lot of noise, so there are some strings that are not relevant for comparison.

Laura: How is a text-only or say text-based approach not useful when it comes to comparison?

Alejandro: Well, sometimes completely different words can be used to describe the same occupation, education or skills, etc. and they all refer to the same concepts; so these words are semantically the same – or at least related. In a text-only approach, these kinds of relationships are lost. There are many partial relationships that might be overlooked in a text-based approach, for example, an occupation such as software developer carries some relationships with java frontend specialists, even if they are not exactly the same. Java frontend specialists are more specific, but there are still some competencies that both occupations have in common.

Laura: Following this example Alejandro, what innovative approach did JANZZ.technology come up with to capture such relationships of common skills?

Alejandro: Well, Java frontend specialists and software developers are both concepts in our knowledge graph or ontology, as we call it. So we can estimate the degree of similarity of these two concepts by the relation that they share on the JANZZ knowledge graph. Then we transform the job advertisement, candidate’s resume or a profile from a human capital management system, into a structured and explainable representation composed of ontology concepts.

At JANZZ.technology, we have developed a structure where we store competencies such as occupation, skills, education, etc. in a vector we call a JANZZ (don’t confuse it with the name of our company). We can compare the vectors between candidates and jobs to get a match score. It is good to note that in the JANZZ vector, we only store the job-related attributes, so other factors such as gender, nationality, ethnicity, etc. are completely discarded.

Laura: It’s interesting that you mention that, can you clarify why it’s so important that the matching score doesn’t take those irrelevant factors into account?

Alejandro: They aren’t used for matching because at JANZZ.technology, we believe that anonymized procedures at the very beginning of the job application process can largely reduce discrimination and bias and improve equal opportunities. Transparency is paramount to ensuring that AI is not biased and we see the growing importance of this reflected in the increasing number of regulations and guidelines in the EU regarding the subject of explainable AI. The regulations focus strongly on avoiding unfair bias when AI software is deployed in production. In this sense, AI systems which are designed to replace or guide human decisions should be fully explainable and each prediction made by these systems should be fully traceable and transparent.

One of the important regulations entitled by the EU is the right to explanation, this right primarily refers to the right of an individual to be given an explanation for decisions taken by an algorithm that significantly affect them legally or financially. At our company, we have developed our matching system in a way that is fully compliant with all the EU regulations: it is completely explainable, transparent and interpretable, and you can describe in granular detail how a matching score is produced, which skills, educations, etc. a candidate is missing for a given position, how many years of experience are still required, and so on.

Laura: How does JANZZ.technology transform job ads, resumes, or a profile from an HCM system into this structured and explainable form?

Alejandro: To generate this structure, JANZZ has developed its own parsing system that is able to process several document types in multiple languages and transform them into structured information. This system uses a combination of Natural Language Processing methods and several Deep Learning models that have been specifically trained to process human resources data.

Laura: Alejandro, you mentioned using machine learning in data processing, can you tell us more about it?

Alejandro: Yes, we apply deep learning and AI methods in some of our processes, but I think it is important to notice that the main objective for us with the use of this technology is to recommend, say, better candidates to vacancies or employees to projects or training initiatives within enterprises in much less time, but with the final hiring decision still completely made by humans. So, the main objective of our Deep Learning and AI methods isn’t to get rid of human intervention but to empower it. For this purpose, at JANZZ.technology we have developed a unique and patented end-to-end recommender system for the HR field.

Laura: Can you explain a little more about this end-to-end recommender system?

Alejandro: Sure, this system comprises a series of Natural Language Processing techniques and Deep Learning models, to achieve a fully automated process that will propose semantically related and explainable suggestions to job seekers and companies in real-time. The architecture of the system combines the JANZZ Ontology with several NLP techniques like Named Entity Recognition, Text Classification, and Entity Relationships based on HR data collected by Domain Specialized Curators and it also includes Language Detection, specific Text Preprocessing and Language Models pre-trained over domain-specific data, which allows the system to extract and process more than 50 different characteristics from job postings and resumes such as occupations, skills, education, etc., with various levels of granularity across multiple languages.

Laura: And just to clarify, all this training data is gathered by people?

Alejandro: Yes, as I said before all data used to train these models is completely collected by hand by our domain specialized curators on a daily basis, so our models are constantly updated and improved.

Laura: Which JANZZ application are you most involved in?

Alejandro: I am mostly involved in our parsing services. Resumes and job descriptions, as well as performance reports and employee profiles in global HCM and corp training Solutions, can appear in multiple formats and writing styles. So we have to structure this information in a convenient way, which is why we have developed our own parser that is available in multiple languages for job descriptions and resumes. The structured records can then be indexed into a database, to do statistical analysis, to identify potential trends, etc. I think this product saves a lot of time for recruiters and HR specialists who have to filter hundreds of resumes for a single job offer or find exactly the right person with the right skills out of all employees worldwide for a specific training… and it also helps companies to store their data in an organized way.

I also work in the development of the JANZZ classifier. This is a family of tools to classify, standardize and annotate complex sets of occupation-related data, such as job title, skills, function or industry. We cover over 100 official occupation classification systems, such as O*Net, ISCO-08, ESCO, NOC, KLdB, ROME or SSOC 2015, and more than 60 standardized reference systems. In addition, I am also in charge of the development of several customized B2B solutions for multiple JANZZ customers.

Laura: Speaking about machine learning/deep learning and natural language processing… How do you think these technologies will evolve?

Alejandro: Deep learning is a rapidly changing field at the intersection of computer science and mathematics. The purpose of machine learning is to teach the computer to do various tasks based on the data. Deep learning has become an important area of artificial intelligence because of its success in many different fields.

The natural language processing field has also witnessed considerable growth over the past few years, owing to the affordable, scalable, and computational power, an increase in digitization of data, and the merger of NLP with deep learning (DL) and machine learning (ML).
I’m confident that the future of deep learning and NLP will be every bit as diverse as its past and we will see some truly fascinating, and quite frankly world-changing advances in the coming years — it’s a very exciting space.

Laura: Why do JANZZ’s ML and DL models achieve the best results in the market?

Alejandro: I think that one of the most important factors of why our solutions achieve such good results, is because of the data quality, we manually collect data on a daily basis to enrich our ontology and to train our deep learning models, In the AI and Deep Learning Field your systems will only be as good at the data that you have trained with, so JANZZ’ approach of strictly hand-selecting only makes a big distinction as we make sure that our systems are based on the highest data-quality standards.

Laura: Which NLP & AI problems in the HR field have you already solved?

Alejandro: One of the biggest problems in today HR sector is to have the information structured in a connected and usable way, in this sense we have solved this problem at Janzz by creating the most comprehensive ontological database of HR data in the world, which is available in multiple languages and involves millions of nodes and relations, the ontology can be used as the basis for many different tasks such as matching, statistical analysis, typeaheads and auto-completion, skills suggestions, etc. Also, we have been able to solve the problem of precisely extracting information in job postings, candidate resumes, HCM profiles, etc. For this we have developed the parser, this system involves NLP and Deep Learning models which have been trained under supervised data to realize the extraction of occupations, educations, skills, experiences, etc. The parser is also capable of assigning different granularity levels to the extracted entities like the education level for a certain college title, the level of proficiency in any given skill, the number of years of experience, etc. Additionally, the parser can solve the problem of sectioning job descriptions into several regions of interest like company description, requirements, duties, offer, application, etc. This is especially helpful for companies that want to store and display their job offers with a consistent structure.

Laura: Alejandro, thank you very much for joining us today. It was a great pleasure talking to you. We hope to talk with you again on some other topics related to AI, ML/DL and NLP.

Alejandro: Thanks for having me. I am glad to join you anytime.

Laura: In the next episode we will talk about the challenges of comparing the education of candidates with employer requirements in online job matching. Thank you for listening and goodbye!


Host: Laura Flamarich

Season 1, Episode 3

The challenges of translating multilingual knowledge graphs

Understand why it is essential to have a strong team of translators like Sara Noriega, specialized in the HR field, to supervise and train all deep learning (DL) processes at JANZZ.technology.

Listen now.

Contributor: Sara Noriega Turatti, Ontology Maintenance & Support, JANZZ.technology

View transcript.

Laura: Hi everyone, welcome to a new chapter, today we will speak about the challenges that we face when translating knowledge graphs, also called ontologies, and for that, I have seating next to me one of our linguistic specialists, Sara Noriega. Welcome to the studio!

Sara: Hello, I’m very happy to be here today.

Laura: For you to know Sara’s background is translating and inTERpreting as well as being a Spanish teacher. Sara, I asked you to come today to speak about the challenges of building a multilingual knowledge graph in the fields of education, skills and job data.
Let’s start with the basics. Translation and inTERpreting, what is the difference?

Sara: The main difference is that interpreters convert messages from one language to another orally, while translators do so in a written text.

Laura: So in our multilingual ontology, where we have millions of related concepts, what are we doing? Translating or inTERpreting?

Sara: We are translating. Think that the main skill of translators is the ability to understand the source language and the culture of the country where the text originated and reflect all this information in the target language without losing the meaning, but at the same time making sense to the culture of the target country. For example, if you have to translate a text whose source language comes from a culture where there are at least 9 distinct synonyms for the same word. In English, you would have to either describe them or leave them in the original language and then explain what they mean.

Laura: Oh do you have an example in mind?

Sara: The Greek language, for instance, has been spoken in the same geographical region for at least 4000 thousand years, so having a very rich vocabulary it’s only natural. For example, the word “love” has at least 9 distinct synonyms conveying different hues of the concept. The word „agape“, for example, expresses a selfless and idealized kind of love while „eros“ suggests feelings of romantic, passionate love. On the other hand, „storge“ indicates a kind of protective, kinship-based love esp. between family members and “filia” signifies the deep and authentic love towards a friend.

Laura: I see, then as a translator, you have to find the most natural and perfect translation possible.

Sara: yes. Translators also make use of different materials, like dictionaries, translation software, etc. and have a very good knowledge of the target language, usually the mother tongue.
On the other hand, coming back to the comparison we started before, interpreters have no material to rely on, they simply use their mental skills and resourcefulness, therefore, they translate the message into the target language, but not necessarily in the same way and style as the original, as they always prioritise the meaning of the message.

Laura: From my perspective, the difference is clear, as their role is not the same, nor are their language skills trained in the same way. How do we handle translations here at JANZZ.technology Sara?

Sara: Well, in JANZZ we use translation on a daily basis, on our ontology, in other words, a large database of up to 57 different languages. The main function we have in our ontology is to organise and categorise all terminology (occupations, skills, educations, etc.) in such a way that it is structured into specific branches and families in order to guarantee a precise match on our platform.

Laura: How is it possible to work with so many languages at the same time? Do you speak that many languages?

Sara: Ha, ha, no. I wish to. But, as you know, we are a multicultural and multilingual team that translates these terms into the different languages available. This seems easy at first glance but, precisely because of the cultural problem mentioned before, sometimes we have to deal with a term widely used in Chinese, for example, that has no translation into Spanish.

Laura: What happens when we have regional specific jobs and we want to translate them?

Sara: In that case, it depends on each concept. For instance, the job “percebeiro” a typical profession in northern Spain and Portugal. It is a type of fisherman who fishes for barnacles, crustaceans that live on the rocks of the cliffs and beaches of these areas. In the UK this profession has the name “barnacle fisherman”, so “percebeiro” is a unique word in Spanish and Portuguese that English doesn’t have.

Laura: Oh this is so particular. I imagine there are so many jobs in the world that are specific to the region that we would never finish…

Sara: And it doesn’t end there, we also find translation problems with educations. A very clear example is the “bachillerato”, which even if its meaning varies in different Spanish-speaking countries, in Spain, for example, refers to the last two years of high school, and not to a Bachelor’s degree, as it might seem at first glance in English.

Laura: Uh! We are opening pandora’s box! Do you only apply translations for the ontology?

Sara: No, we also act as translators when teaching the Artificial Intelligent machine, not in the same way as with the ontology, but we help it to understand that in the different languages in which it works, concepts or segments should not always be taken literally; sometimes there are nuances that it must learn to distinguish.

Laura: What are the challenges you are facing at JANZZ?

Sara: The classification of professions and studies in Europe and around the world in general changes from country to country, both in terms of time and type of training, as well as in terms of skills and tasks performed in the occupation. For instance, a carpenter who completed an 18-month apprenticeship in the UK will have a different skill set than a carpenter with a 4-year apprenticeship in Austria, even though both completed standardized training for the same skilled trade.

Laura: But Sara, let me ask you; if there is no universal consensus, how can you then classify such a variety of terms?

Sara: We must take into account the different common reference frameworks for qualifications and studies such as the European Qualifications Framework (EQF) or the European Credit System for Vocational Education and Training (ECVET), and even for classifying occupations such as ESCO (for Europe) or (ISCO 08), which is used internationally. So the classification and translation of occupations, skills, educations and other terminology collected in our ontology is a real challenge for us.

Laura: Uh-huh, each time is getting more complicated. Why are ontologies and semantic technologies the solution to overcome such challenges?

Sara: Because even if these international classification systems are very helpful, they are not 100% reliable in every language, since our team often finds mistakes in the translations. It is only possible by a human expert curator to compare differences and similarities in education and qualifications across borders and language divides. The curator’s team can understand the correct meaning of a term, thanks to their background knowledge and the context in which the specific term is used. A machine, of course, lacks this ability.

Laura: What if you find a term that cannot be translated? Does this happen?

Sara: Of course, the beauty and fascination of translation are precisely that it is not exactly like maths, it is not a question of A + B equals C, so it is not all black and white. In translation, there are always different brushstrokes. The first example that comes to my mind is “Au pair”. This is a French word that refers to a young foreign person, typically a woman, who helps with housework or childcare in exchange for food, a room, and some money. This job cannot be translated to English, so we would leave it in the French version.

Laura: Right, in Spain we also use the french word for this job. But what about the nannies are they not doing the same job?

Sara: Yes, they share most of the skills but have different working conditions. However, what is really important for us is to understand that their main skills are almost the same, in order to provide an accurate match.

Laura: Alright, that makes sense. If I would be a nanny my skills would be also suitable for an “Au pair” job… Sara, the last question. You usually work on the English, Spanish and Catalan ontology. Do all languages face the same issues?

Sara: Oh, haha, no. In fact, I always laugh with my coworkers about how many differences we find in each language. For example, in Vietnamese and In Greek, they use one word for the occupations “consultant”, “counselor” and “advisor”. Or in Arabic, they translate occupation bayie alzuhur (baear alzuhur) English with two different words “florist” and “flower seller” with different meanings than in English.
Laura: I get the impression that we can have many chapters more related to the challenges our team is facing. Sara thank you so much for your time and clarification.

Sara: Thank you!

Laura: To conclude what we’ve learned today, ontologies and semantic technologies are extremely important (and basically the solution) to overcome linguistic challenges. Now we can understand how important it is to have a powerful human team of translators like Sara, specialized in the HR domain to supervise and train all the DL processes at JANZZ.technology. If you want to learn more about the big world of occupational data, subscribe to not miss the next episode. One more time, thank you for listening! Goodbye!


Host: Laura Flamarich

Season 1, Episode 2

What does an ontology curator do?

What is an ontology or knowledge graph? Ontologist and terminologist Yeonjoo Oh helps us understand what they are and their importance.

Listen now.

Contributor: Yeonjoo Oh, Ontology Maintenance & Support, JANZZ.technology

View transcript.

Laura: Hi everyone! Welcome to a new episode, today next to me we have Yeonjoo Oh.

Yeonjoo: Hi

Laura: I asked her to join me to explain what Ontologists do. At JANZZ.technology the ontology team consists of 40 people at the moment and growing… from all nationalities and together they speak more than 40 languages. Yona not only is part of this team but also has a technical background. She studied Multilingual Text analysis Computational Linguistics at the University of Zurich and also worked as Terminologist. Anytime I have to understand something I ask her…

When was the first time that you heard about ontologies?

Yeonjoo: The first time I heard about ontologies was when I took “Semantic Analysis” course during my studies in Computational Linguistics. It was mandatory for us to read “Speech and Language Processing” by Dan Jurafsky and James Martin, and in chapter 15, I came across with the term “ontology” for the first time. There, I learned that the set of categories or concepts is called terminology, and ontology represents a hierarchical organization that shows the relations among these concepts.

Laura: Was it so remarkable that you even remember the chapter in the book?

Yeonjoo: Well, I had to write a summary of that chapter for the assignment, and that’s why I remember probably.

Laura: How would you explain what is an ontologist?

Yeonjoo: In simplest words, we are like librarians and our ontology database can be a huge library. A library provides users with access to books, and a librarian helps users to find books of their interests. We, ontologists guide our users through the ontology database with HR data. Every book has a classification code in order to be sorted and organized properly, which also applies to our ontology work. We constantly work on various occupation concepts that are needed to be classified and try to organize them according to ontology rules.

Laura: I love the example 🙂

Yeonjoo: And we are also teachers for Artificial Intelligence because we are selecting relevant contents and information for ontology and parser to improve the accuracy and quality of matching.

Laura: From where do you obtain these occupation concepts?

Yeonjoo: Job ads and CV from everywhere + imported/mapped databases and official collections of data and taxonomies, etc.

Laura: So when you find a new concept that is not in the database, what would be the procedure, then?

Yeonjoo: So first we find these concepts classified as categorized “no categorized”. Then the first thing we can check is if the concept can be merged within an existing one. If not, we have to consider how specific or generic the concept is to classify it precisely. As well as we check the concept similarity.

Laura: How do you contribute to our JANZZ ontology?

Yeonjoo: Ontologies can provide semantic modelling that can detect the underlying meanings and similarities in CVs and job descriptions. In order to achieve these purposes, I contribute by creating the right structure among concepts and sub-concepts, quality management, and consistent management of the multilingual ontology or Knowledge representation database.

Laura: How would a day of an ontology curator look like?

Yeonjoo: My daily tasks mainly compose of selecting concepts based on ISCO-08 codes, organizing the child terms related to those specific concepts, and ensuring that they are in right hierarchical relations. The most important task is to check if the terms are correct in corresponding languages, including German, English, and languages which you are in charge of, in my case, Japanese and Korean.

Laura: For those that are not so familiar with the term ISCO-08, let me clarify that ISCO is the International Standard Classification of Occupation that the International Labour Organization (ILO) created. So these codes that “yonju” was referring are to classify occupations.

Yeonjoo: Exactly!

Laura: But, Yeonjoo, are the concepts you are working on only about occupations?

Yeonjoo: No, they are composed of various branches including, for example, skill, specialization, education, authorization, and industry. For ‘skill’, we make a distinction between ‘hard skills’ which are job-related knowledge and abilities that employees need to perform their job duties and ‘soft skills’ which can describe personal qualities that help employees thrive in the workplace. ‘Education’ branch represents a set of concepts that are based on the field of study, qualification, degree, certificate, vocation, or online courses/trainings. Since the arrival of the current pandemic, more and more online courses/trainings have been released on different platforms such as Coursera, Udemy, and LinkedIn, which triggered a significant digital transformation in the education field.

Laura: Of course, this makes sense!

Yeonjoo: When parsing, we annotate these kinds of courses, for instance, ‘Clinical Natural Language Processing’, ‘Python for Data Analysis and Visualization’, ‘Digital Marketing’, or online SAP trainings. Labeling and correlating these courses/trainings to specific occupations helps us create a better overview of potential matching results. We also check and edit ‘education level’ and ‘experience level’ concepts related to occupations.

Then, when we work in parsing, we annotate data in order to recognize and identify specific types of entities from job descriptions and CVs in order to create ‘Gold Standard’ data for AI systems.

Laura: What is exactly the Gold Standard?

Yeonjoo: In NLP and Computational linguistics, Gold Standard refers to a set of data that has been manually prepared or labelled, which can represent desired results as closely as possible. This can be background knowledge to teach and train AI systems about the basic concepts of occupations, geographical objects, people, companies, or experiences with machine learning algorithms.

Laura: aha so coming back to teaching the AI applications… to create intelligent results.

Yeonjoo: Yes, our mission is to give them plenty of ‘quality’ data. False tags and annotations can lead to false and inaccurate identification of entities. If we want AI applications to perform like humans, we need to be excellent annotators and teachers to create real human-like intelligence results.

Laura: What kind of challenges do you face?

Yeonjoo: We are dealing with a multilingual database so we face many challenges, from consistency issues to ambiguity problems. Think that we have to check if all terms in all languages are semantically consistent.

Laura: Do you have any example in mind that could help us understand it?

Yeonjoo: So for example we have an occupation concept called “Geisha”. Do you know it?

Laura: Oh you mean this Japanese woman… like the ones from the book/movie ​​Memoirs of a Geisha.

Yeonjoo: exactly Laura the Japanese performance artists and entertainers that are trained in the traditional Japanese performing arts styles. So… I had this concept.

Laura: Yes

Yeonjoo: And in our database, we have different hierarchical trees. So first I put this “Geisha” concept under the “performer and entertainer” tree. But we also have a specific tree for “dancers”. And Geishas do many things, such as dance, music and singing, as well as being proficient conversationalists and hosts. So in this specific case, I had to classify it under both trees.

Laura: I see so it is really important to understand every single concept…

Yeonjoo: Yes, because we always have to consider which concept can belong under each tree. Therefore, very challenging.

Laura: What do you like about this job?

Yeonjoo: You can apply computational linguistic knowledge to various fields, but the fact that we are dealing with HR data is very fascinating. We can also capture the newest trends from HR market while parsing CVs and job descriptions. Before joining this company, I wasn’t aware that such a tremendous amount of occupations exists in the world. What makes my job more interesting is that we integrate country or culture-specific occupation concepts into our database, such as Wagashi-Shokunin from Japan which is Traditional Japanese Confectionery Craftsman or Haenyeo from Korea which refers to female shellfish divers originated from the Korean province of Jeju. Also, when it comes to Resume Parsing and job description parsing, frequently it is quite challenging to parse because they are becoming more and more creative.

Laura: What do you mean when you say creative?

Yeonjoo: For example, recently I was parsing? A job ad about firefighters, and they were asking if the applicant had seen the anime series called “Gundam” which is a Japanese military fiction with robots… As a sort of preferable requirement.

Laura: ok I see maybe the head of the firefighter was a big fan… but still…

Yeonjoo: I was thinking how do you put such a thing in a job description, right?

Laura: I don’t think I would have written that on the job ad… What did you do?

Yeonjoo: Well in that case that was not a skill or anything that could be a condition to be a firefighter so what we do is to annotate that as not relevant skill… As you can see, working with job ads on a daily basis con be sometimes surprisingly funny… but at the same time makes it very complicated.

Laura: Why?

Yeonjoo: Well think about the huge amount of not relevant requirements that are written in the job ads. For us humans can be a bit confusing to read that: “watching an anime series to be a firefighter is a requirement”, but in the end, we know that it is not. A machine cannot distinguish that.

Laura: I see your point. I remember this paper of the International Labour Organization about “The feasibility of using big data in anticipating and matching skills needs”(I will leave the link in the description of the podcast). If we think of parsing job ads on a larger scale, the more jobs we give to the machine the more probabilities of finding such not relevant information and therefore the confusion and wrong results.

Yeonjoo: Exactly Laura, this is why it is so important to explore the data set in detail because of the enormous variance in information density and relevance across vacancy postings.

Laura: So you carefully annotate the not relevant information and like this, with enough annotations the machine improves the extraction of such entities. Am I right?

Yeonjoo: Yes, and think that this not only works for relevant or not relevant information. We also help recognise other information about the job advertised that frequently is implicit, hidden in stipulations about education/training and qualifications, and about the experience. If these are not represented accurately, semantically and in a knowledge representation, the collected data is distorted.

Laura: Thank you yonju for joining us today. I think now thanks to you we have a better understanding of why it will be mainly experts – and not algorithms – who will continue to be responsible for such data modelling in the future.

Yeonjoo: Anytime! It was a pleasure 😉

Laura: In the next episode we will talk about the big world of translation and its challenges on big data and HR technology. Subscribe if you don’t want to miss it! Thank you for listening and goodbye!


Host: Laura Flamarich


Covered in this episode:

ILO report: The feasibility of using big data in anticipating and matching skills needs. Read more.

Season 1, Episode 1

The deep understanding of HR data and the expertise of JANZZ.technology

Interested in the world of HR tech and semantic technologies? This first episode of the JANZZ.technology podcast gives you insights from a market leader’s perspective.

Listen now.

Contributor: Stefan Winzenried, CEO and Founder of JANZZ.technology

View transcript.

Laura: Hi everyone. Welcome to the first episode of this series of podcasts from JANZZ.technology. This is Laura Flamarich, Video producer and digital media specialist and with me today I have the CEO and founder of JANZZ.technology, Stefan Winzenried!

Stefan: Good morning!

Laura: Stefan thank you so much for joining me, in this chapter I wanted to go through the origins of JANZZ and also explain how we ended up creating the largest multilingual knowledge representation for occupation data – among other interesting products and services we offer and that we will try to tackle in the course of this podcast series.

Stefan: Alright

Laura: Stefan I remember perfectly when you asked me during my job interview 2 years ago if I understood what JANZZ.technology does…

Stefan: Did you have an answer?

Laura: I did, but of course, it was superficial. One of the problems we constantly face is that explaining what we do and why is a bit of a challenge, right?

Stefan: Right! Even after over a decade I’m still looking for the perfect way to explain what we do in short and simple terms…

Laura: How do you usually introduce the company?

Stefan: We’re a tech and consulting company based in Switzerland that specializes in parsing, classifying and matching job and skills data. Our technologies are based on semantics, so, the meaning of words and language. Our goal is to develop tools that really understand content and context of the data they’re processing – instead of making guesses based on statistics and probabilities like most of the other software on the market. Also – and this is really important to us – we make sure that all our software, and especially our matching software is explainable and bias-free. We want the best people to match with the best jobs, regardless of age, gender, ethnicity, or any other characteristics not related to job performance.

Laura: Our specialties are semantic search & matching, parsing and classification, smart data and knowledge graph (or ontology) building – for occupation data like jobs, skills, qualifications and more. Who can benefit from our product and services?

Stefan: Our white label products range from classifiers over parsers to semantic matching tools, mainly as cloud solutions for HR tech users and providers. They’re used across the globe in job portals, HR management systems, ATS, public employment services, large organizations and businesses that have their own job sites. What we offer is unique in that our products work quite differently to those of our competitors: ours are based on a powerful representation of real-world knowledge gathered and curated by humans. Thanks to this, our solutions are in a class of their own when it comes to quality, performance and digital end-to-end processes in HR tech that actually work.

Laura: So you’re basically saying that all the other providers selling similar services and solutions are making promises they can’t live up to, that it’s all just marketing. Why do we assert that we are the only ones in our class?

Stefan: Apart from us, there are two classes of competitors. The ones offering services based on keyword matching, and the ones using some kind of taxonomies and ontologies – also known as knowledge graphs. Even in the first group you’ll find providers claiming they use semantics. But anything based on matching keywords simply has nothing to do with semantics. This kind of technology just compares words with each other letter by letter, there’s no attempt at understanding the input at all. So it wouldn’t even be able to match two synonyms: words that describe the same thing, like a custodian and a property caretaker, or a supervisor and a line manager. It will also give you irrelevant results like “assistant to the manager” if you search for the term “manager” just because there’s a partial match.

Laura: Ok, we can definitely do better than that! And the second group of competitors?

Stefan: The second group are the providers offering products that they call semantic solutions because they use some form of knowledge representation, say, taxonomies and maybe even ontologies. These competitors all have at least one of two problems: one, the knowledge representations are not granular enough, so they don’t provide enough detail or context. Then you’ll get generic terms like communication skills or flexibility, which are completely useless for any kind of meaningful skills analysis, say for skill-based job matching and talent management, career guidance, or whatever. Take flexibility. Do you mean time flexibility, say, willing to work shifts or irregular hours? Or do you mean cognitive flexibility, so being able to adapt your behaviour and thinking?

Laura: I see what you mean, those really are two very different things.. What’s the second problem you mentioned?

Stefan: The second, very common problem is that they lack the truly smart data we have to feed our applications. Because their knowledge representations are not curated by humans. In fact, I’m not convinced this should even be called a knowledge representation… You see, most of our competitors rely almost exclusively on machine learning techniques and other big-data-driven slash statistical approaches in their solutions.

Laura: What’s wrong with that? These days, we’re all trying to automate as much as possible, no?

Stefan: Yes, as much as possible. And of course we also use cutting-edge deep learning techniques, natural language processing, named entity recognition, etcetera, etcetera. But there are limits. HR and, on a larger scale, labor market management is such a deeply human, complex field; there are so many different stakeholders with so many different vocabularies and ideas and interpretations of all the various jobs and related skills, educations and the like. You need systems that can draw on real-world knowledge and somehow grasp the meaning, the content of job ads, resumes, worker profiles. Approaches based purely on data and statistics – machine learning, deep learning, call it whatever you want – just can’t solve this task. To model human knowledge, you need humans. If you then feed this to your machine learning or AI-based algorithms, that’s when you’ll get truly meaningful, reliable results.

Laura: Modeling human knowledge with humans – that’s exactly what we do at JANZZ.technology…

Stefan: Yes – and have been doing for over a decade now. Our ontology is by far the most comprehensive, granular knowledge representation in our field. It’s constantly growing and curated by a team of multilingual, multinational humans with a deep understanding of the subject. That’s why we’re the only ones who really deliver what we promise: meaningful and accurate results in job and skills classification, parsing and matching.

Laura: You say for over a decade. Do you want to explain how everything started? What was your initial idea?

Stefan: When we first started out in the late 2000s, there had already been some digitalization in HR tech and labor market management. But for the most part, the systems were made up of disconnected processes and still required a lot of manual input. And what was really missing was some kind of deep tech to match people with jobs in an even remotely satisfactory way. That’s what we set out to change. We wanted to use AI to create high-performing, bias-free matching tools and end-to-end HR and labor market management solutions that are easy to use regardless of the user’s digital background. And that really match the right people with the right jobs. We were also strongly inspired by the work of Diamond, Mortensen and Pissarides on how mismatch problems affect the labor market, which was later awarded the Nobel prize in Economics. We wanted to address this challenge by helping employment agencies find suitable jobs for jobseekers with fair, non-discriminatory and affordable matching tools.

Laura: And did it all go as planned?

Stefan: (laughs) Does it ever? I don’t have an engineering background so, to be honest, I dramatically underestimated the challenge at first. When we began, I never would have thought that it would take us 10 to 12 years just to lay the necessary groundwork for true end-to-end AI-based processes that actually work. But here we are now, and because we believed in what we do, and did what needed to be done, we now have great products and solutions – and a head start of several years over the competition.

Laura: When did we introduce AI for the first time?

Stefan: I’m a bit reluctant to use the word AI because there’s a lot of confusion and discussion around what really constitutes AI. I do think that our solutions have more of the I in them than those of our competitors, and this developed gradually over the years with the continuous improvement of our technologies and our ontology. We’ve been using machine learning techniques since around 2016/2017, which is certainly viewed as a part of AI.

Laura: Why it is only possible to get smart outcomes through ontologies?

Stefan: As I explained earlier, the key to meaningful results is to build systems that can process the actual content, the meaning behind the words, or strings, in the data you feed them, say, from job ads, resumes, worker profiles. This requires a deep understanding of jobs, skills, education and professional experiences and the many relationships between them that even the most intelligent algorithms can’t acquire by themselves. Knowledge graphs – or ontologies – are currently the only way to represent this understanding in a way that machines can process. Basically, ontologies turn data – especially big data – into smart data. And if you want smart outcomes, feeding your algorithms smart data is essential.

Laura: Ah, smart data. We’ll have an episode all about that later on in the season… For now, can you tell us a bit more about the ontology team at JANZZ and why they‘re so important?

Stefan: Our curation team is composed of linguists, professional and educational experts, experienced specialists from domains such as medicine, engineering, IT, banking and finance, trade, and much more. These people are from all over the world with pooled knowledge of many different cultures and languages. And they work continuously to expand and improve our ontology JANZZon!. Thanks to their amazing work, JANZZon! already covers over 40 languages including regional variations and dialects, integrating cultural characteristics and cross-lingual differences in job descriptions, qualification requirements and more into the knowledge representation. And the subtleties and grey areas, the ambiguities and implicit information they deal with in their ontology work every day simply cannot be resolved by machine learning models or other techniques. This makes their work truly invaluable for the performance of our technologies.

Laura: Thank you so much Stefan for participating in this first episode.

Stefan: Thank you. And I’m looking forward to listening to the upcoming episodes!

Laura: Of course, and to our listeners: if you’re interested in knowing how we develop these products and solutions or want to understand more about the world of HR tech and semantic technologies, make sure you catch our next episode. We’ll find out what exactly an ontology curator does and why it’s so important to have such a team training the “machines”. Remember to subscribe so you don’t miss it! Thank you for listening. Goodbye.


Host: Laura Flamarich

Season 1, Introduction

JANZZ.technology: the podcast – Introduction

Introduction to the JANZZ.technology podcast series where we give you an insight into the world of occupational data and recruitment technologies.

Listen now.

View transcript.

Laura: Hi everyone, welcome and thank you for listening. This is the short introduction to our first podcast from JANZZ.technology, the leading technology company in occupation data. I am Laura Flamarich, Video blogger and producer, and digital media specialist

Jingyu: And I am Jingyu Sun, Junior Business Development Manager. Welcome! In this first season, we will cover 7 episodes. Our topics will be focusing on big data, real artificial intelligence, natural language processing, future of work, and of course our core asset–the ontology.

Laura: Many of you might not know what ontology is. Ontologies, also known as knowledge graphs or knowledge representations are, in short, the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks.

Jingyu: Still confusing? Don’t worry because we will have two episodes to talk about that, where we will invite our ontology curators, Sara Noriega and Yeonjoo Oh from our headquarters in Zurich to explain the complexity of their jobs and the challenges they deal with every day.

Laura: Alejandro Jesús Castañeira Rodríguez, the Principal Data Scientist of JANZZ.technology will also join us to talk about how JANZZ generates useful data using the latest machine learning techniques and deep learning models. With our tech writer, Jennifer Jayne Jakob, we will talk about smart data.

Jingyu: As an introduction to our company, we invited Stefan Winzenried, the CEO of JANZZ.technology, to share our story with you and to explain what we do. With him, we will also talk about the issues surrounding skills data and the unique approach that JANZZ.technology is taking to tackle them.

Laura: Last but not least, we will invite Timothy Sperisen, the author of one of our white papers to talk about education zones, the innovative way we created at JANZZ.technology to classify education data…well, many interesting contents for you, our listeners.

Jingyu: And you might be wondering: – Why are we doing this? Well, for several reasons, one would be to help people understand what we are really doing at JANZZ technology… Sometimes even our work colleagues have trouble explaining it.

Laura: Yes, it’s not simple. We also wanted to produce more content from our social media channels, and we were thinking… What else can we do to make a voice?

Jingyu: So, it seems like the podcast is the new fashion! If you look around, it is the new medium for companies to connect to the world and of course, we want to be part of it.

Laura: But most importantly, we want to share with you our stories at janzz.technology, what do we do, what do we care about, what do we value and what do we want to do for our society as a technology company

Jingyu: Every start-up or company has a story and we think that as providers of cutting-edge solutions in the field of HR tech and the global public employment service and labor market, it will be interesting to share our experience and expertise.

Laura: The truth is, we have been blogging and sharing our thoughts in our blog, named: Knowledge Base and on our website: www.janzz.technology since the beginning.

Jingyu: By the way, you can find a broad range of articles, posts and helpful information not only in English but also in many more languages, like Spanish, Chinese, Vietnamese, French, Arabic and many more. Now we also want to find our community listeners and involve our work colleagues, clients and partners.

Laura: So thank you very much for listening to this short introduction and we invite you to listen to our next episode.

Jingyu: And if you haven’t already, subscribe to this podcast — so that you don’t miss any of the stories, conversations, or practical tips we’re putting together for you.

Laura: Bye, see you soon!

Jingyu: Goodbye….


Hosts: Jingyu Sun and Laura Flamarich