How an Ontology Can Help with Content-based Matching

Job and candidate search, job recommendations and automated candidate evaluations have one thing in common. They are a matching problem.

Simply put, given a set of CVs and a set of vacancies, the most similar items should match, that is, these items should come out at the top of the search, recommendation or evaluation. Most applications use either of two high-level approaches to achieve this: behavior-based or content-based. They each have pros and cons, and there are also ways to combine the approaches to take advantage of both techniques.


Behavior-based approaches leverage user behavior to generate recommendations or suggestions. These approaches are domain agnostic, meaning the same algorithms that work on music or movies can be applied to the jobs domain. Behavior-based approaches do suffer from a cold start problem. If you have little user activity, it is much harder to generate good quality results.

Content-based approaches use data, such as user preferences and features of the items being matched or recommended, to determine the best matches. For recommending jobs, using keywords of the job description to match keywords in a user’s resume is one content-based approach. Using keywords in a job to find other similar jobs is another way to implement content-based recommendations.

However, the issue in this process is really the determination of similarity between two items. How can the similarity between for instance a resume and a vacancy be determined effectively even though they are often structured extremely heterogeneously? All too often, simple keyword-based matching is used for this, which means that many similarities go undetected, as keyword variations, synonyms and alternative phrases are not matched. With a content-based approach, it is important that the semantics (the underlying meaning) of two items be compared rather than the wording. This is where ontologies come into play. They can provide a relational model that can detect the underlying meanings and similarities in CVs and job descriptions. Ontologies enable a digital representation of implicit knowledge: humans usually understand the correct meaning of a term, thanks to their background knowledge and the context in which a specific term is used. A machine on the other hand lacks this ability. It can however, learn about the semantic meaning of a term by means of the concepts and relations stored in an ontology. By using an occupation and skills ontology as an intermediary, content-based approaches for job recommendations, job and candidate search and automated candidate evaluations can achieve much more.

The comprehensive ontology of occupations and skills JANZZon! for example offers a large number of poly-directional concepts pertaining to the global labor market. With its extraodrinary range of concepts, this ontology offers essential context and intelligent evaluation and enhancement options for applications such as information systems, matching engines, job portals, CV parsers, statistical analysis and modelling tools.

Industry Taxonomies Enhanced by JANZZ’s Occupation Ontology

At the heart of’s ontology of occupations and skills, there are over 35 taxonomies, among which occupation, skills and industry taxonomies like O*Net, ESCO, NAICS and ISCO-08. They are mapped by the JANZZ curation team to form a single entity that serves as a relational model for a great part of the world’s economic activity. As part of the latest additions to the occupational ontology JANZZon!, the curation team has inserted the two industry classifications GICS and ICB into the ontology, thereby extending the scope of the ontology as well as enhancing the intelligence of the two industry classifications.

GICS refers to the Global Industry Classification Standard and is a standardized classification system for equities developed jointly by Morgan Stanley Capital International (MSCI) and Standard & Poor’s. The GICS methodology is used by the MSCI indexes, which include domestic and international stocks, as well as by a large portion of the professional investment management community. The GICS hierarchy begins with 10 sectors and is followed by 24 industry groups, 67 industries and 147 sub-industries. Each stock that is classified will have a coding at all four of these levels.

ICB (Industry Classification Benchmark) is an industry classification taxonomy launched by Dow Jones and FTSE in 2005. It is used to segregate markets into sectors within the macro economy and categorizes over 290’000 securities worldwide. The ICB uses a system of 10 industries, partitioned into 18 supersectors, which are further divided into 41 sectors, which in turn contain 114 subsectors.

The two industry classifications allocate each company to the subsector that most closely represents the nature of its principal business activity. Thereby, the classifications allow a comparison of companies across national and linguistic boundaries.

Industry taxonomy enhanced by the ontology network

However, mapped to the semantic network of JANZZon!, the intelligence of the two classifications and their potential use multiply exponentially. As part of the semantic database JANZZon! the taxonomies are connected to a dense web of relations between occupations, skills, specializations and industries. The information on individual companies from the two taxonomies and the relational model of occupations, skills, specializations and industries are intertwined to form an even greater knowledge database. With the added knowledge about companies and how they relate to industry sectors, the ontology JANZZon! can serve its purpose even better, namely to provide an accurate relational model for parsing, matching benchmarking and classification.