Skills are getting a lot of attention these days – from businesses, government institutions, policymakers, researchers, and many others. But what exactly is a skill? In this episode, we speak with Stefan Winzenried, the CEO and founder of JANZZ.technology, about skills data.
Laura: Hi everyone! Welcome to another episode of this series of podcasts from JANZZ.technology. This is Laura Flamarich and today we’ve invited Stefan Winzenried again, the CEO and founder of JANZZ.technology – this time to talk about skills data. Hi Stefan!
Laura: Let’s jump right into today’s topic. Skills are getting a lot of attention these days – from businesses, government institutions, policymakers, researchers, and many others. Gartner has called them “the new currency for talent”, and there have been many, many posts on reskilling, upskilling, future-proof skills, top post-COVID skills, etcetera, etcetera. I’ve heard that especially these posts, even the ones from seemingly reliable sources like Forbes or the World Economic Forum, are a bit of a red rag to you, Stefan – why is that?
Stefan: For one thing, these posts hardly ever provide any kind of information as to what data their claims are based on. And the ones that do are almost always based on data that’s really biased. Take the LinkedIn reports on in-demand skills. Every time they’re published, you see countless articles and posts reproducing these lists blindly, just taking them at face value. Nobody stops to think about the data behind them, even though it’s clear that, for instance, blue-collar professions and industries are massively underrepresented on their network. The reports generate this huge buzz about how we need to reskill and upskill everyone to become IT professionals just because LinkedIn says that blockchain and cloud computing are most in demand. But that’s just not true.
Laura: So what is in demand?
Stefan: Well, before we even think about answering that question, we need to work out something much more fundamental: what exactly is a skill? Off the top of your head: what do you think a skill is?
Laura: Erm… I’d say a capability… something you learn and you are really good at.
Stefan: Ok. So with that definition, being a quick study isn’t a skill because it’s not something that you can learn. But being able to tie a knot in a cherry stalk with your tongue is.
Laura: (laughs) Yeah, ok, but that’s probably not very relevant for anyone’s job..
Stefan: Exactly. But being a quick study can be. Ok, the cherry stalk example is a bit silly, but it turns out that finding a robust definition of skills is quite a challenge. And there are many, many definitions floating around. For instance, ESCO, the European job and skills classification system, defines a skill as an ability to apply knowledge and know-how to complete tasks. Which includes, say, being able to use specific tools or software. But O*Net in the US, on the other hand, makes a distinction between these kinds of abilities and what they call skills. And Indeed summarizes anything that can be useful in a job under the term skill: competencies, abilities and even knowledge.
Laura: I see, there really is a lot of variation.
Stefan: Yes, and those are just the differences in the definition of a skill as a general concept. Then you get all sorts of interpretations when it comes to concrete examples. What are project management skills? Or carpentry skills? Those terms can mean very different things depending on who you ask. And all these variations will cause discrepancies in data collection and analysis. Along with many other serious challenges.. So, to get back to your original question as to what skills really are in demand: With the techniques most commonly used today, especially those that rely on online data, like data from online job postings or professional networks, it’s almost impossible to say. What we do know is that when you look at studies based on more representative data like the ManpowerGroup Talent Shortage surveys, it’s positions in skilled trades that are hardest to fill – and has been for years, along with truck drivers and healthcare professionals. Which suggests that we should be training people for these occupations – where they clearly don’t need knowledge in blockchain and cloud computing…
Laura: But most reports on skills determine a really high demand for digital skills in some shape or form. How does that compute?
Stefan: One reason is that most of these reports are based on data from online job postings or professional networks, which tends to be severely flawed for several reasons. First off, not all job vacancies are advertised online, in fact, certain markets are so dried out that vacancies aren’t advertised at all. And some types of jobs are more likely to be advertised online than others – as it seems, typically jobs that require “digital skills”. Whatever that’s supposed to mean..
Laura: So the data is biased.
Stefan: Yes, severely so. Very often, large firms in certain industries are completely overrepresented, even though they usually make up only a small part of the labor market. In Switzerland, for example, over 90% of all jobs are in small to medium sized firms. In fact, the majority of them are in small companies. Which have completely different roles, hierarchies and therefore skill sets than large companies. And they use a different language and different terms, too. Take “HR generalist”. That’s a typical term that you rarely find in small companies, because it is obvious that that’s what they’re looking for. And if it does show up, it’s understood differently than in a large company. And this also affects the desired or required skills, experience, etc… And there are many more problems that make OJA data mostly useless or at least not very meaningful when they’re not addressed. To put it bluntly, this crystal ball of OJA data is about as reliable as the ones at fairs…
Laura: OJA as in online job advertising.
Stefan: Right. The thing is, the many companies and institutions that handle OJA data and use it to derive, say, predictions and training programs from it are very rarely aware of the true extent of these issues.
Laura: So what other issues are there with online data?
Stefan: Well, apart from bias and confusion around what actually constitutes a skill, there’s the issue of duplicates. There’s a lot of overlap in the various sources of online job postings and tackling deduplication really is far from trivial. Another, very underestimated issue is granularity.
Laura: What do you mean by that?
Stefan: The level of detail – both in the gathered data and in the communicated results. Digital skills, for instance, is a classic example of grouping more and more skills together until you end up with a completely meaningless umbrella term. Of course there’s a high demand for digital skills if you summarize everything from being able to use digital devices over handling social media accounts professionally to programming in Java. But what are you going to do with this? You can’t perform sound analyses with data this coarse. It’s useless for any kind of meaningful statistics or matching, let alone defining hiring strategies or policymaking.
Laura: I see your point. So why do you think this is done? I mean, if the data’s collected, say, from online job postings, then surely it has a higher level of detail, right? I’ve never seen the term digital skills in a job ad…
Stefan: Yes and no. The level of detail in online job postings varies considerably, and not just depending on the country or sector. Even across postings for the same profession in the same country, you’ll find anything from page-long detailed accounts of all responsibilities and required skills and qualifications, to a post with the same information encoded implicitly in three sentences. But the real clustering usually happens when the collected data is processed. It needs to be standardized to make the data points comparable. And this standardization is typically based on simplified classifications or taxonomies instead of leveraging comprehensive ontologies with a high level of detail. For instance, the ESCO and O*Net taxonomies are used in a lot of these projects. ESCO currently has about 13,500 skills concepts and O*Net about 9,000. Sounds like a lot, right?
Stefan: But it actually isn’t. Our ontology, for instance, includes over a million skills concepts. So if you use ESCO or O*Net, you already lose a lot of detail. But it’s not just about the number of skills. More skills doesn’t necessarily mean better information. It’s more about having a way to compare the right things with each other. If you just standardize and cluster detailed information into oblivion, you end up comparing apples and oranges without even realizing it. And these mistakes get carried along and multiplied over all downstream processes. But some processes are only possible if you have the context and the degree of complexity of a skill in its description. And that’s why you need a much larger number of mappable skills.
Laura: Can you give an example?
Stefan: Take knowledge of Tensorflow as a skill. Tensorflow is a software library used for machine learning and AI. Now if you standardize this using ESCO, the closest you’ll get is the term utilize machine learning. But that’s an umbrella term for a whole host of skills and knowledge. Not just all sorts of other software libraries as well, like Pytorch. But also things like specializations in the different branches and subfields of machine learning: supervised or unsupervised, deep learning, and so on. How are you going to find the right talent for your project efficiently with a term that broad? Or design effective L&D strategies and government training programs? It’s just useless.
(pauses) But of course, if you just want to generate attention with your results, then oversimplification is definitely the way to go. A short list of buzzwords is much easier to digest.. So after standardizing it, the data is often clustered and simplified even further.
Laura: Haha, ok. So it’s not just the taxonomies then.
Stefan: No sensible skills taxonomy would list “digital skills” as an individual skill, they do provide more detail than that. But bias and granularity aren’t the only issues with skills data. Remember I said that we don’t have a common understanding of skills, as a general concept?
Stefan: I also pointed out that the same goes for the individual terms we use to describe a supposedly concrete skill. If I give you a term like project management, you’ll have an idea of what that is based on your own knowledge and experiences. So you’ll give the term some kind of meaning. And I’ll do the same, based on my knowledge and experiences. We agree that we both know what project management is – and then human nature kicks in: We assume we’re talking about the same thing.
Laura: And… we’re not?
Stefan: Nine times out of ten? No, we’re not. And the fact that different people have different notions of any given skill is a huge issue for data collection and analysis. Let’s take another example. One of the most common so-called skills required in job postings anywhere on the planet, and included in most taxonomies, is use Microsoft Office. This may sound like a fairly specific skill at first, but the informative value of this term is zero.
Laura: Really?! How so?
Stefan: It’s completely unclear which applications in this large family of software are meant, and to what extent a person is supposed to be able to use them. If you think about what an employer is looking for when they use this term in a job ad, what they actually want is a whole set of much more concrete skills and knowledge, depending on the job description. You might need to be able to structure a document, or create auto-calculating spreadsheets or good presentations – which requires skills in storytelling and visual communication. And the skill set will typically be very different for an office help in a small business compared to a marketing specialist in a large corporation. So basically, saying someone can use Microsoft Office is about as helpful as saying they can use a toolbox. But that doesn’t stop it from showing up in over 80% of all job ads and close to 100% of all resumes and CVs…
Laura: I’d never thought of it like that… But a lot of this implicit information can be worked out from the context, right?
Stefan: Humans can – to a certain extent and with the right background knowledge. But for any kind of meaningful analysis, we’re talking about extracting both explicit and implicit skills from huge data sets. For that you need an AI-based tool that actually understands the content and the context of, say, a job description. Which is really only possible if it has access to an extensive knowledge representation that includes information not only on skills or jobs, but also education, work experiences, certifications and much more. As well as required levels and the complex relations between all these different concepts.
Laura: This is starting to sound like a huge challenge. Wouldn’t it be easier to collect data on jobs instead? And then work out the skills demand using… standard skill profiles for jobs?
Stefan: It would – if there were such a thing as a standard skill profile for a given occupation. But several studies, including one we did here at JANZZ, show that there’s just too much variation: there’s national and regional differences, differences across industries, even across teams within a single business. The job description often factors in what skills are already covered by other team members, which will change the required skill set for that particular point in time. Or the company advertising the job is specialized in a very specific activity that requires a different skill set. Take carpentry, for example. A company that specializes in cabinetry and furniture production will have little use for a carpenter skilled in drywalling and roof carpentry, and the common denominator in terms of skills is just too small. So there’s no way around collecting data on the skills themselves.
Laura: Ok, so we need to collect skills data. But to recap, the data’s typically incomplete and biased, the extracted skills are taken out of context and often generalized into meaninglessness and in fact, we can’t really agree on the notion of a skill in the first place.
Stefan: Exactly. And despite these issues, people draw all sorts of wild conclusions from their shaky data, and propagate unfounded claims on current – or even worse: future skills demands. And if this kind of information comes from a seemingly reliable source like, say, the World Economic Forum, the ILO or the World Bank Group, then chances are, the results will be used as a basis for far-reaching decisions. Like which training projects a government should allocate funds to, for student career counseling or even company recruiting strategies.
Laura: So it could have pretty dramatic consequences for the labor market, right?
Stefan: Yes! Looking at the current interpretations flying around, they are encouraging politicians to carry on potentially misallocating billions in funding for upskilling and reskilling in the wrong areas. And actively point even more young talent away from occupations and industries in dire need of new talent: skilled trades and construction, nurses, care workers and more – many of which are still clearly “future proof”. Just look at the situation in the UK with truck drivers. The very obvious failure to attract new workers has had dramatic systemic consequences. And yet, the key players in government and businesses are still going on about upskilling and reskilling workers for a digital world. Completely undeterred. It’s just bizarre..
Laura: So how can we do better?
Stefan: For one thing, we need to move away from easy statements and generic lists of glorified buzz skills and towards differentiated interpretations and communication – even if it is less sexy. But for that, we need to gather the right data. We have to start by somehow reaching a common understanding, which means agreeing on a definition of a skill, standardizing skill designations and levels. And we need to determine key skillStefan: skills that are truly relevant to the job in question. Which requires analyzing skills in their context – a challenge that knowledge-lean systems based purely on machine learning will never overcome – and understanding that the most frequently mentioned skills aren’t necessarily the important ones, that many key skills are implicit. We need to understand the limitations of data from online sources and gather and provide additional information when this data falls short. In short, we need to generate smart, unbiased data for smart, unbiased decisions.
Laura: On that note, thank you Stefan for joining us.
Stefan: Thank you too.
Laura: We have covered a lot of ground today, but we could carry on for hours. This really is the hot topic at the moment and we will certainly delve deeper in the next season – and some upcoming whitepapers, I believe…
Laura: This episode finishes this first season of the podcast. Follow us on Linkedin or on your favorite podcast platform to make sure you don’t miss the next season.