Everything you ever wanted to know about ontologies, taxonomies, parsing, …
… job classifications from ISCO-08 through BO&C, ASCO and ROME V3 to KldB 2010, semantic matching, SKOS or ISCED. Or what the difference is between an ontology and a taxonomy. Or what a friend of a friend has to do with ontology modelling. You will find a number of selected answers to these and further questions associated with such topics here in our glossary.
AdaBoost is a boosting algorithm which constructs a classifier.
In linguistics, an allomorph is a variant form of a morpheme (the smallest grammatical unit in a language). The concept occurs when a unit of meaning can vary in sound without changing meaning. The term allomorph explains the comprehension of phonological variations for specific morphemes.
Ammattiluokitus 2010 (Finland)
Ammattiluokitus 2010 is the name of the Finnish occupation classification, latest version in 2010. Its structure is based on ISCO-08 and it is available in three languages (FI, S, EN). It follows as closely as possible the structure and the definitions of ISCO-08 in 1-4 digit levels. National circumstances were taken into consideration by adding 5-digit level occupational groups where necessary. The 5-digit level contains 103 groups.
ANZSCO (Australia & New Zealand)
The Australian and New Zealand Standard Classification of Occupations (ANZSCO) is based on ISCO-08, its latest version from 2009. It has a hierarchical structure with 8 Major Groups, 43 Sub-Major Groups, 97 Minor Groups, 358 Unit Groups, and 1014 Occupations.
An Application Programming Interface (API) specifies how some software components should interact with each other. A program part of a software system can be offered to other programs to establish a connection. An API only defines a program binding on the level of the source code. In addition to accessing databases or computer hardware, such as hard disk drives or video cards, an API can be used to ease the work of programming graphical user interface components. In practice, many times an API comes in the form of a library that includes specifications for routines, data structures, object classes, and variables. [Wikipedia]
The Apriori algorithm learns association rules and is applied to a database containing a large number of Transactions.
ASCO (Arab countries)
The Arab Standard Classification of Occupations (ASCO) 2008 was created by a cooperation of various arabic countries. Its structure is based on ISCO-08. Still, some arabic countries use their own national occupation classification.
ASOC (Saudi Arabia)
ASOC stands for the Arab Standard Occupational Classification and is known as Tasneef in Arabic. It is the occupational classification system that is being built in the Kingdom of Saudi Arabia in 2015. ASOC covers all occupations and jobs in the national economy, including occupations in the public and private sectors. The classification system is built based on ISCO-08 and is available in Arabic and English.
Association rule learning is a data mining technique for learning correlations and relations among variables in a database.
ATS (Applicant Tracking System)
An Applicant Tracking System (ATS) is a software application that enables the electronic handling of recruitment needs. An ATS can be implemented on an enterprise or small business level, depending on the needs of the company. An ATS is very similar to customer relationship management (CRM) systems, but are designed for recruitment tracking purposes. In many cases they filter applications automatically based on given criteria such as former employers, years of experience and schools attended. This has caused many to adapt techniques similar to those used in Search engine optimization when creating and formatting their résumé. [Wikipedia]
Big data is a blanket term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to e.g. spot trends in the labour market. [Wikipedia]
BO&C (beroepen, opleidingen en competentieregister) is a register on jobs, educations and skills by the employee insurance agency in the Netherlands.
Boosting is an ensemble learning algorithm which takes multiple learning algorithms (e.g. decision trees) and combines them. The goal is to take an ensemble or group of weak learners and combine them to create a single strong learner.
BRC 2014 (Netherlands)
The occupation classification of the Netherlands, Beroepenindeling ROA-CBS 2014 (BRC 2014), is based on ISCO-08 and used for statistical purposes.
CART stands for classification and regression trees. It is a decision tree learning technique that outputs either classification or regression trees. Like C4.5, CART is a classifier.
CBO 2002 (Brazil)
CBO 2002 is the latest version of the Brazilian Occupational Classification (Classificação Brasileira de Ocupações). It classifies jobs in an hierarchical ordering by their task and skill content.
The entries in the occupation database of the Federal Statistical Officeall of Switzerland are all tagged with a 8 digit-code stem code. Various additional codes and nomenclatures (i.e. SBN 2000 or ISCO-08) can then be linked to it. Thus, each occupation entry possesses exactly 1 stem code, but can possess different nomenclatures at the same time (i.e. male/female labels, different spelling etc.).
A classifier is a tool in data mining that takes a bunch of data representing things we want to classify and attempts to predict which class the new data belongs to.
Cluster analysis is a family of algorithms designed to form groups such that the group members are more similar versus non-group members. Clusters and groups are synonymous in the world of cluster Analysis.
The Classificación Nacional de Ocupaciones 2011 (CNO-11) is the Spanish occupation classification, latest version of 2011. It is based on ISCO-08, but it adds a fifth level between the first and second level in the ISCO-08 structure to smoothen it.
CNP V 2010 (Cape Verde)
The Classificação Nacional das Profissões (CNP) 2010 is the occupation classification of Cape Verde, available in Portuguese, with the latest version of 2010. Its structure is based on ISCO-08, but it has a fifth level to represent national reality.
Cognitive biases are systematic patterns of deviation from norm or rationality in judgment, which are often studied in humans. We can roughly classify cognitive biases into the following categories: decision-making, belief and behavioral biases, social biases, and memory biases. A bias in AI refers to when a computer system reflects the implicit values of the humans who created it. Because no technology is free from human influence and they are extensions of their creators, it is extremely difficult to keep the algorithm, data or tools completely unbiased. At JANZZ.technology, we are aware of how our actions affect our customers and all the users of our AI services and tools. We try our best to make sure that our solutions don’t have a disproportionate effect on some groups of users with respect to others and are keen to take fairness analytics in job matching to the highest level. Typical biases, which include age, gender, origin etc., have already been completely eliminated from JANZZ applications.
For more information please check:
Cognitive computing systems are designed to collaborate naturally with people and to amplify the possibilities of what either machines or humans could do on their own. Cognitive computing systems can process an incredibly large and complex volume of data (Big Data) in real time. These systems constantly accumulate more knowledge and experience; therefore they become increasingly accurate over time. Deep analytics applied to an understanding of context allow cognitive computing systems to understand their environment, learn for themselves, and act autonomously.
Conceptual graphs are a formalism for knowledge representation, they form a logical system for the semantic description of knowledge. Conceptual graphs are often applied in the fields of artificial intelligence, computer science or cognitive science. Relations bewteen two concepts can e.g. be displayed as follows: a [concept] is linked to another [concept] through a (relation), where concepts are graphically represented as rectangles, and relations as ovals.
The Clasificarea ocupatiilor din Romania (COR) is Romania’s occupation classification, latest version 2011. Its structure is based on ISCO-08 with four levels.
CP 2011 (Italy)
The Italian occupation classification Classificazione delle Professioni (CP) 2011 is based on ISCO-08 and available in two languages (IT, EN), latest version 2011.
CPO (Clasificación Paraguaya de Ocupaciones) is the current national classification of occupations in Paraguay. It is based on the structure of ISCO.
CPP 2010 (Portugal)
The Portuguese occupation classification Classificação Portuguesa das Profissões (CPP) 2010 is based on ISCO-08, adding a fifth level for the occupations. It is available in Portuguese and English and its latest version in from 2010.
CZ ISCO/COFOG (Czech Republic)
The occupation classification of the Czech Republic, Klasifikace funkcí vládních institucí, is based on ISCO-08, but adds a fifth (national) level. Its latest version was issued in 2011, and it is available in Czech.
C4.5 constructs a classifier in the form of a decision tree. In order to do this, C4.5 is given a set of data representing things that are already classified.
Data mining is the computational process of discovering patterns in large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems. The overall goal of the data mining process is to extract information from a data set and transform it into an understandable structure for further use. [Wikipedia]
The structure of the Danish occupation classification Danmarks Statistiks fagklassifikation is based on ISCO-08. It is a 6-digit hirachical structured classification in 5 levels (a national level was added). Its latest version is from 2011.
The European Dictionary of Skills and Competences (DISCO) II is an online thesaurus with currently more than 104,000 skill and competence terms and about 36,000 phrases in the fields of health, computing, social services, environmental protection and non domain specific skills and competences. Available in eleven European languages, DISCO is one of the largest collections of its kind in the education and labour market.
A distribution represents the probabilities for all measurable outcomes. For example, the grades for an exam could fit a normal distribution. This normal distribution represents all the probabilities of a grade.
The U.S. Equal Employment Opportunity Commission (EEOC) is responsible for enforcing federal laws that make it illegal to discriminate against a job applicant or an employee because of the person’s race, color, religion, sex (including pregnancy, gender identity, and sexual orientation), national origin, age (40 or older), disability or genetic information.
The EEO-1 compliance survey is mandated by the Equal Employment Opportunity Commission. The survey has companies submit employment data categorized by ethnicity, race, gender and job category. The EEO-1 collects data on nine major job categories.
In data mining, expectation-maximization (EM) is generally used as a clustering algorithm (like k-means) for knowledge discovery.
ERP (Enterprise resource planning)
Enterprise resource planning (ERP) is a business management software—usually a suite of integrated applications—that a company can use to collect, store, manage and interpret data from many business activities, including:
- Product planning, cost and development
- Manufacturing or service delivery
- Marketing and sales
- Inventory management
- Shipping and payment
ERP provides an integrated view of core business processes, often in real-time, using common databases maintained by a database management system. ERP systems track business resources — cash, raw materials, production capacity — and the status of business commitments: orders, purchase orders, and payroll. The applications that make up the system share data across the various departments (manufacturing, purchasing, sales, accounting, etc.) that provide the data. ERP facilitates information flow between all business functions, and manages connections to outside stakeholders
ESCO is a classification by the European Commission for Skills, Competences, Qualifications and Occupations. It is of particular interest regarding transversal skills/competences.
The French Nomenclature des familles professionnelles (FAP) is the result of a harmonization/ approximation between PCS (Professions et Catégories Socioprofessionnelles) and ROME (Répertoire Opérationnel des Métiers et des Emplois). Occupations are assembled to occupation groups by common competences and skills. The classification is especially important for educational and statistical purposes.
The Friend of a Friend (FOAF) project is is one of the first applications for the Semantic Web, creating a Web of machine-readable pages describing people, the links between them and the things they create and do. Thousands of people already do this on the Web by describing themselves and their lives on their home page. Using FOAF, one can help machines understand his/her home page, and through doing so, learn about the relationships that connect people, places and things described on the Web. [FOAF Project]
GB/T 6565-2015 (China)
The Classification and Codes of Occupations was issued in 2015 and it replaces the old Classification GB/T 6565-2009.
GICS (Global Industry Classification Standard) is an industry taxonomy developed in 1999 by MSCI and Standard & Poor’s. The system is similar to ICB (Industry Classification Benchmark), a classification structure maintained by FTSE Group.
Homonymy, Homography and Homophony
In linguistics, a homonym is, in the strict sense, one of a group of words that share the same spelling and pronunciation but may have different meanings. Thus homonyms are simultaneously homographs (words that share the same spelling, regardless of their pronunciation) and homophones (words that share the same pronunciation, regardless of their spelling). The state of being a homonym is called homonymy.
stalk1: part of a plant
stalk2: follow/harass a person
further: left, skate, etc.
Allthough it is not always very clear, a distinction is sometimes made between “true” homonyms, which are unrelated in origin and polysemous homonyms, or polysemes, which have a shared origin.
HSCO 08/FEOR-08 (Hungary)
The Hungarian occupation classification Hungarian Standard Classification of Occupation (HSCO), is based on ISCO-08 and is available in two languages (EN, H). It consists of four levels and was put into force in 2011.
A hyperplane is a function like the equation for a line, y = mx + b. In fact, for a simple classification task with just 2 features, the hyperplane can be a line.
Hyponymy and Hypernymy
A hyponym is a word or phrase whose semantic field (a set of words grouped by meaning referring to a specific subject) is included within that of another word, the hypernym. Thus, it is a hierarchy type: the hyponym shares a type-of relationship with its hypernym. It is one of the central semantic relations between concepts in semantic webs, taxonomies and thesauri. [Wikipedia]
Dermatologist, Gynaecologist, Immunologist are all hyponyms of their hypernym Medical Specialist, whereas Medical Specialist is again a hyponym of Doctor of Medicine.
ICB (Industry Classification Benchmark) is an industry classification taxonomy launched by Dow Jones and FTSE in 2005. It is used to segregate markets into sectors within the macroeconomy and categorizes over 290’000 securities worldwide. Hence, it enables comparison of companies across four levels of classification and national boundaries.
The Standard International Standard Classification of Education (ISCED) was developed by UNESCO to enable a classification and characterization of school types and schooling systems. It distinguishes between various levels and is useful for the declaration of the level of education (the highest completed level of education) for the international comparison.
The International Standard Classification of Occupations is an universal mono-hierarchical classification scheme of occupation groups, compiled by the International Labour Organization (ILO). There have been published four versions of the classification since 1957, abbreviated to ISCO-58, ISCO-68, ISCO-88 and ISCO-08, respectively. Among others, the ISCO classification is used by the European Union and by its member nations as a basis scheme for the construction of own (national) occupation classifications.
Íslensk starfaflokkun 1995 is the occupation classification of Iceland with a structure based on ISCO-08.
JANZZ.jobs: All the Skills of the World on one single Platform.
JANZZ.jobs is a unique global occupation data matching platform where all the skills of the world are captured, stored, and compared across languages and borders. People, jobs and projects are matched anonymously and securely with unparalleled accuracy in real time thanks to the latest semantic technology. JANZZ.jobs is the only place on the web where the efficient matching of comparable and meaningful results can be found.
JANZZclassifier! analyzes complex sets of occupation related data, such as job title, skills, function or industry, and annotates them with intelligent and standardized meta-data. Over 30 official classification systems, such as O*Net, ISCO-08, BO&C, ASOC or SSOC 2015, and other standardized occupation titles are available. Through its precise classification and standardization, JANZZclassifier! makes your data comparable and thereby creates the ideal starting point for further processes such as benchmarking, matching or statistical analyses.
JANZZon! is the largest multi-lingual encyclopaedic knowledge base of occupation data ever. With an extraordinary volume of inter-related concepts, this unique ontology makes sense of complex occupation data and provides the essential context needed for many potential applications by information systems, matching engines, and statistical analysis and modelling tools to gain insight from their occupation data.
JANZZjobsAPI is the direct and uncomplicated interface to our products and SaaS solutions, in particular to the matching engine JANZZsme! and the ontology JANZZon!. At the heart of the Jobs API offered by JANZZ is therefore a comprehensive jobs and skills ontology that represents knowledge about occupations and skills as well as the ways in which they relate to one another. By means of the Jobs API, developers can connect to JANZZ’s extensive knowledge base of occupation related data as well as to its smart matching engine, which allows them to integrate semantic job search functionality with third party applications. The JANZZjobsAPI integrates seamlessly with labor market solutions such as job boards, applicant tracking systems or company career sites.
The system of JANZZon! (the organization of concepts) follows logical criteria, but is also strongly content-driven, focusing on the content of the particular concepts. Background knowledge to the individual terms (their specific content) and the respective indexing of the concepts in the ontology are crucial, because a machine can only “learn” about terms and their significance by the availablilty of this knowledge.
Occupations and functions can be organized relatively easily following logical criteria. Hierarchies are rather clear and the structuring of branches is not very complicated. Occupations can therefore be classified and indexed according to different industries and especially existing (international) occupation classifications. Somewhat more complicated to organize are the specializations and skills. A strict logical order is often difficult or not possible at all. For indexing and organizing in the ontology, a specific skill, specialization and the person holding this skill or specialization is the focus and starting point.
Microsoft Word is clearly related to the concept Microsoft Office, a software. But, Microsoft Word is also often associated with Text processing, still it does not necessarily make sense to subordinate Microsoft Word Text processing. In order that the machine is still able to establish the connection between Microsoft Word and Text processing, both concepts can be linked to each other through relations.
JANZZsme! is the newest generation semantic matching engine solution for exploiting any occupation Big Data, such as unemployed workforce profiles, and candidate and job offer databases. Superior search and match functionality enables data mining and querying for occupation concept comparison, profiling, gap analysis and benchmarking.
With JANZZon! in the background, JANZZsme! provides full semantic matching offering transparency of information with unparalleled precision.
The Serbian occupation classification Jedinstvena nomenklatura zanimanj / Klasifikacija zanimanja is based on ISCO-08, consisting of four levels. It is available in Serbian.
JSOC 2011 (Japan)
The Japan Standard Occupational Classification (JSOC) is avaiable in Japanese andEnglish. It is not directly linked to the International Standard Classification of Occupation (ISCO-08). JSOC reflects the occupational structure in Japan. Therefore, the classification structure is different from ISCO.
KldB 2010 (Germany)
The Klassifikation der Berufe (KldB) 2010 is the German occupation classification of the Federal Agency of Labour / Institute for Labour Market and Occupation Research. It is structured hierarchically with five levels: fields of occupations, main occupation groups, occupation groups, sub-groups of occupations, occupation categories. In addition, there is a classification on the basis of the level of requirement: assistant and apprenticeship occupations, technical occupations, complex specialist occupations, highly complex occupations.
K-means creates kgroups from a set of objects so that the members of a group are more similar. It’s a popular cluster analysis technique for exploring a dataset.
kNN, or k-Nearest Neighbors, is a classification algorithm. However, it differs from the classifiers previously described because it’s a lazy learner.
The occupation classification of Bosnia and Herzegovina Klasifikacije Zanimanja based on ISCO-08.
KZiS 2010 / COFOG (Poland)
The Polish occupation classification Klasyfikacja Funkcji Rządu 2010 is available in two languages (PL, EN) and its structure is based on ISCO-08. It consists of five levels (with an additional local level) and was put in force in 2010.
A lazy learner doesn’t do much during the training process other than store the training data. Only when new unlabeled data is input does this type of learner look to classify.
LB 501-2002 (China)
The Classification and Codes of Occupations was issued by Ministry of Human Resources and Social Security of the People’s Republic of China in 2005 and it is based on the international standard classification of occupations (ISCO).
It’s a type of network analysis looking to explore the associations (a.k.a. links) among objects.
LKP 2010 (Albania)
The Albanian classification of occupations is based on ISCO-08 and is called Lista Kombëtare e Profesioneve (LKP) 2010. It was put in force in 2010 and consists of four levels.
Machine learning is a type of artificial intelligence (AI) that focuses on the development of computer systems that can act without being explicitly programmed. These computer programs learn to do something on their own by repeated training using big data. Ultimately, these systems are able to grow and change when exposed to new data. The process of machine learning is similar to data mining, as both search for patterns within big volumes of data. While data mining aims to increase human understanding of an issue by revealing patterns within data, machine learning uses pattern detection in order to improve the program’s actions.
Machine learning is so pervasive today that it is hard to get around: web search, practical speech recognition and self-driving cars are just a few examples.
Mechanical Turk, or The Turk is the colloquial name for an automaton chess player that was constructed in 1769 by Wolfgang von Kempelen, an Austro-Hungarian court official and mechanic. To spectators the machine gave the impression that it was playing chess on its own. In fact, there was a relatively talented human chess player hiding inside to operate the machine. Copies of the machine were used in various presentations and exhibitions until 1929, when the hoax was discovered. In German, the phrase of etwas türken (‘faking something’) has remained in use up to the present day.
Today, the so-called Mechanical Turk method is being used for projects that are much more complicated to program or solve for a computer than for humans. At JANZZ.technology, this method substitutes machine learning in areas where the latter is insufficient by our standards, as it would be too imprecise, too flawed or too slow. By means of human labor—so-called Human Intelligence Tasks (HITs)—those tasks that a computer cannot solve at all or only with disproportionately high efforts are performed much more precisely and faster. At JANZZ, such human experts include linguists, professional and educational experts, experienced specialists from domains such as medicine, engineering, IT, banking and finance, trade, as well as people knowledgeable of JANZZon!’s languages, industries and occupations. This approach ensures the unmatched semantic quality of JANZZon! and the applications based on it.
MOL (Saudi Arabia)
Besides the ASCO scheme, various Arab countries use their own national occupation classification. Saudi Arabia used to operate a classification called MOL, which was not based on ISCO-08 but could be mapped partly to the ISCO scheme. However, the MOL classification has been replaced by ASOC/NES.
The Statistical Classification of Economic Activities in the European Community (in French: Nomenclature statistique des activités économiques dans la Communauté européenne), commonly referred to as NACE, is a European industry standard classification system consisting of a 6 digit code. NACE is similar in function to the SIC and NAICS systems:Standard Industrial Classification and North American Industry Classification System
The first four digits of the code, which is the first four levels of the classification system, are the same in all European countries. The fifth digit might vary from country to country and further digits are sometimes placed by suppliers of databases.
NAICS (North America)
NAICS (North American Industry Classification System) is used by business and government to classify business establishments according to type of economic activity in Canada, Mexico and the United States of America. It has largely replaced the older Standard Insutrial Classification system (SIC).
Naive Bayes is not a single algorithm, but a family of classification algorithms that share one common assumption: Every feature of the data being classified is independent of all other features given the class.
NCO-2004 is the national classificaiton of occupations issued by the government of India. It is based on the international standard classification of occupations ISCO-88, the forerunner of ISCO-08.
NKZ 10 (Croatia & Macedonia)
The Nacionalna klasifikacija zanimanja (NKZ) 2010 (occupation classification of Croatia and Macedonia) is based on ISCO-08. It consists of four levels and is available in Croatian and English. Its latest version is from 2011.
The Canadian occupation classification National Occupational Classification (NOC) is based on the structure of ISCO-08. It is a four-level hierarchical arrangement of occupational groups, available in French and English, with its latest version from 2016.
The Bulgarian classification of occupations НКПД-2011 is based on ISCO-08. It consists of four levels and is available in Bulgarian. It was put in force in 2011.
OECD AI Principles
The OECD AI Principles were adopted on 22 May 2019 by OECD member countries when they approved the OECD Council Recommendation on Artificial Intelligence. The Recommendation identifies five complementary values-based principles for the responsible stewardship of trustworthy AI:
- AI should benefit people and the planet by driving inclusive growth, sustainable development and well-being.
- AI systems should be designed in a way that respects the rule of law, human rights, democratic values and diversity, and they should include appropriate safeguards – for example, enabling human intervention where necessary – to ensure a fair and just society.
- There should be transparency and responsible disclosure around AI systems to ensure that people understand AI-based outcomes and can challenge them.
- AI systems must function in a robust, secure and safe way throughout their life cycles and potential risks should be continually assessed and managed.
- Organisations and individuals developing, deploying or operating AI systems should be held accountable for their proper functioning in line with the above principles.
The Occupational Information Network (O*NET) in the USA is supported by the US Department of Labor/Employment and Training Administration (USDOL/ETA). It is related to the UK occupation classification SOC. O*NET is a broad database of occupation descriptions and the respective skills, competences, etc. needed for each of them. Each occupation is structured into Tasks, Tools used, Kowledge, Skills, Ability, Work Activities, Work Context, Job Zone.
Occupation classifications are classification systems to structure occupations according to different attributes and characteristics. An important example for this is the International Standard Occupation Classification (ISCO), developed by the International Labour Organization (ILO) for the first time in the 1960s as an international classification system of occupation groups. It has been adapted twice to the changes in the working environment in the industrial nations in 1988 and 2008 (ISCO-88 and ISCO-08). Based on this classification, international comparison enables the determination of different positions in a society’s hierarchy, including comparable statistics about different labour markets, education systems, unemployment rates, etc. There are nine main occupation categories (without armed forces occupations) in the International Standard Classification of Occupations of 1988 by Eurostat (for the purpose of the European Union). These main occupation categories are organized into occupation groups, sub-groups and types, which leads to a four-digit code for each occupation to enable assigning it to an occupation type.
UK & Ireland
Additional to ISCO, there are other classification systems in use, e.g. the Standard Occupational Classification (SOC) 2010 in the UK and Ireland. SOC has its own structure, which is not based on ISCO-08. However, a mapping to ISCO-08 is available, which enables comparison between the two classification systems. Other national classification systems such as NOC (Canada) are based on ISCO.
The Occupational Information Network (O*Net) in the USA is supported by the US Department of Labor/Employment and Training Administration (USDOL/ETA). It is related to the UK occupation classification SOC. O*Net is a broad database of occupation descriptions and the respective skills, competences, etc. needed for each of them. Each occupation is structured into Tasks, Tools used, Kowledge, Skills, Ability, Work Activities, Work Context, Job Zone.
Additional important classification systems exist in various regions and countries worldwide, e.g. ANZSCO, ASCO, BO&C, KldB, Ö-ISCO, and many more.
Occupation data describes a general term for related data in the field of occupation and profession concepts such as competences, soft and transversal skills, functions, specializations, education/qualification data, etc.
Occupation vs. Profession
Many people think that “occupation” and “profession” are synonyms, but this is only very rarely the case. “Occupation” denotes an activity somebody exercises to earn a living. This can be through self-employment or employment and it does not necessarily entail specific qualifications, training or professional experience. “Profession,” on the other hand, describes an activity that requires special training, knowledge, qualifications and skills – in short, something that needs to be learned. This means that a CEO ought to be classified as a function or an occupation rather than a profession, whereas a joiner is a typical example for a profession that frequently involves an apprenticeship and/or qualification. However, the boundaries between occupations and professions are often fluid: a profession can sometimes also be an occupation, but an occupation can almost never be considered a profession. These are the main differences between the two terms:
- An activity performed by a person for monetary compensation is normally known as occupation. Profession refers to vocation, which requires high degrees of education or skills.
- Unlike occupations, professions have a code of conduct.
- An occupation does not require any kind of training in a particular field, but a profession requires specialization in a specific area which is why it necessitates training.
- Professions are generally regulated by a particular or professional body statute, which is not the case for occupations.
- In an occupation, people are paid for what they produce, whereas the salaries of those in a profession depend on their knowledge, skills and professional experience.
- A profession is also an occupation if the person is paid for utilizing his or her skills and expertise.
Ontology / Knowledge Graph
In computer science and information science, an ontology or knowledge graph formally represents knowledge as a hierarchy of concepts within a domain, using a shared vocabulary to denote the types, properties and interrelationships of those concepts. These interrelationships distinguish an ontology from a taxonomy, which only builds a hierarchical order without interrelationships between the individual concepts. Ontologies /knowledge graphs are the structural frameworks for organizing information and are used in artificial intelligence, the Semantic Web, systems engineering, software engineering, biomedical informatics, library science, enterprise bookmarking, and information architecture as a form of knowledge representation about the world or some part of it. The creation of domain ontologies is also fundamental to the definition and use of an enterprise architecture framework.
Knowledge graphs enable the representation of knowledge: humans usually understand the correct meaning of a term, thanks to their background knowledge and the context in which a specific term is used. A machine lacks this ability, naturally. It can, however “learn” about the semantic meaning of a term. Often, so-called Conceptual Graphs are used to depict this meaning: concepts are linked to each other through different relations. Through the relations that have been set and the location of the term in the ontology the meaning of a specific term becomes interpretable for a machine.
Opposites are words that lie in an inherently incompatible binary relationship as in the opposite pairs
big : small
long : short
Over-(Under-) qualification/education occurs when the level of qualification/education is higher (lower) than required to perform the job adequately.
Over-(Under-) skilling means the level of skill is higher (lower) than required to adequately perform the job.
The Web Ontology Language (OWL) is a family of knowledge representation languages or ontology languages for authoring ontologies or knowledge bases. The languages are characterized by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL is endorsed by W3C and has attracted academic, medical and commercial interest.
OXML is the native ontology representation format of the ontology development environment OntoEdit from ontoprise GmbH. It is an XML application that is defined via an XML Schema definition. It provides the basic mechanisms to describe an ontology, its meta-data and especially its components, such as concepts, relations and axioms.
The Austrian occupation classification ÖISCO-08 (Österreichische Systematik der Berufe 2008) is using the same codes and structure as ISCO-08, with four levels. Structure and explanatory notes are available in English and German language, the alphabetical index is available only in German language. It was put in force in 2011.
PageRank is a link analysis algorithm designed to determine the relative importance of some object linked within a network of objects.
Parsing or syntactic analysis is the process of analysing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. The term parsing comes from Latin pars (orationis), meaning part (of speech).
Within computational linguistics the term is used to refer to the formal analysis by a computer of a sentence or other string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other, which may also contain semantic and other information.
Mereology describes a type of hierachy, focusing on the relation of parts and the wholes they form. Mereology is an important part of ontology.
A book is part of a book collection. If A is part of B and B is part of C, then A is also part of C.
The French classification Professions et Catégories Socioprofessionnelles (PCS) consists of four aggregation levels.
PCS-ESE 2003 (professions and socio-professional categories / employees in companies) is a special type of nomenclature. It is used as a description of employee activities in private or semi-public companies and enterprises.
The Latvian occupation classification Profesiju klasifikators is based on ISCO-08. It consists of five levels (an additional national occupation level) and is available in Latvian. It was adopted in 2010.
Polysemy (from Greek: πολυ-, poly-, “many” and σῆμα, sêma, “sign”) is the capacity for a sign (e.g., a word, phrase, etc.) or signs to have multiple related meanings (sememes), i.e., a large semantic field. It is usually regarded as distinct from homonymy, in which the multiple meanings of a word may be randomly, unconnected or unrelated. Polysems have the same etymology, which is not the case for homonyms. Sources of polysemy can be found in different figures of speech, such as metaphors, metonomy (associated meaning) etc.
Bank: a financial institution or the building where a financial institution offers services (river bank is a homonym to financial institution and the building as they do not share etymologies).
A qualification mismatch occurs if the level of qualification and/or the field of qualification is different from that required to perform the job adequately.
Resource Description Framework (RDF) and Resource Description Framework Schema (RDFS) is a set of classes with certain properties using the RDF extensible knowledge representation language, providing basic elements for the description of ontologies, otherwise called RDF vocabularies, intended to structure RDF resources. These resources can be saved in a triplestore to reach them with the query language SPARQL. The first version was published by the World-Wide Web Consortium (W3C) in April 1998, and the final W3C recommendation was released in February 2004. Many RDFS components are included in the more expressive Web Ontology Language (OWL).
In addition to SOAP and XML-RPC, representation state transfer (REST) is likely the most important alternative for the realization of various web services. REST is based on principles that are already applied extensively in the largest-distributed application of all –the World Wide Web. The WWW itself thus represents a gigantic REST application. Many search engines, shops, portals, networks and booking systems are therefore already available as REST-based web services. Their design means they can be connected quickly and without difficulty to the JANZZ products and SaaS solutions.
RIASEC (Holland Codes)
RIASEC or the Holland Occupational Themes refers to a theory of careers and vocational choice (based upon personality types) that was initially developed by American psychologist John L. Holland (1919-2008).
In the RIASEC model, Holland distinguished between 6 types: Realistic (Doers), Investigative (Thinkers), Artistic (Creators), Social (Helpers), Enterprising (Persuaders), and Conventional (Organizers). An updated and expanded version of the RIASEC model is used in the “Interests” section of the free online database, the Occupational Information Network (O*Net).
ROME V3 (France)
The Répertoire Opérationnel des Métiers et des Emplois, version 3, the French Occupation Classification has its own struture, thus it is not based on ISCO-08. The structure contains three levels: occupation categories, fields of work and the occupation term, to which all the occupations are assigned.
The term Software as a Service (SaaS) is part of the nomenclature of cloud computings. It is a software licensing and delivery model in which software is licensed on a subscription basis and is centrally hosted. SaaS is typically accessed by users using a thin client via a web browser. SaaS has become a common delivery model for many business applications and has been incorporated into the strategy of all leading enterprise software companies. One of the biggest selling points for these companies is the potential to reduce IT support costs by outsourcing hardware and software maintenance and support to the SaaS provider. [Wikipedia]
SBC / BO&C (Netherlands)
The Standaard Beroepenclassificatie (SBC) is the Dutch standard occupation classification. It is available in two versions, SBC 1992 and SBC 2010. It has its own structure, but a mapping to ISCO-08 is available. In difference to ISCO-08, it has an additional level and operationalised the criterion of skill specialisation explicitly into a list of 87 fields of skill specialisation. Next to these criteria it uses 128 task clusters in order to be able to further differentiate between job titles with similar level and skill specialisation. It was adopted in 1992, but isnce 2012, ISCO-08 has been used for statistical purposes in the Netherlands. In 2014, the Netherlands issued their own statistical classification system based on ISCO-08, BRC 2014.
Additionally, there exists a database for occupations and education Beroeps- en Opleidingsgegevens or Beroepen, Opleidingen & Competencie BO&C.
The directory of the Swiss State Secretariat for Education, Research and Innovation (SERI) offers an overview of all the officially recognized occupations based on the basic education, the higher vocational training as well as the curricula, courses of study, post graduate programs of the technical colleges.
SBN 2000 (Switzerland)
The Swiss occupation calssification Schweizer Berufsnomenklatur 2000 (SBN 2000) by the Swiss Federal Statistical Office is available in three languages (GER, FR, IT). It is structured into divisions, classes, groups and types. It is in use since 2000. There is no link to ISCO. However, each individual 8-digit code in the Swiss Occupational Database (each individual occupation is registered with an eight-digit, non-speaking code) is transcoded to the ISCO.
Semantic Matching (High Quality Matching)
Semantic matching is a technique used in computer science to identify information which is semantically related. Given any two graph-like structures, e.g. classifications, database or XML schemas and ontologies, matching is an operator which identifies those nodes in the two structures which semantically correspond to one another. For example, applied to file systems it can identify that a folder labeled “Medical Practitioner” is semantically equivalent to another folder “Medical Doctor” because they are synonyms in English.
Semantic matching represents a fundamental technique in many applications in areas such as resource discovery, HR and recruitment, data integration, data migration, query translation, peer to peer networks, agent communication, schema and ontology merging. It using is also being investigated in other areas such as event processing. In fact, it has been proposed as a valid solution to the semantic heterogeneity problem, namely managing the diversity in knowledge. Interoperability among people of different cultures and languages, having different viewpoints and using different terminology has always been a huge problem. Especially with the advent of the Web and the consequential information explosion, the problem seems to be emphasized. People face the concrete problem to retrieve, disambiguate and integrate information coming from a wide variety of sources.
Semantic Web (Web 3.0)
By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web, dominated by unstructured and semi-structured documents into a “web of data”. The main purpose of the Semantic Web is driving the evolution of the current Web by enabling users to find, share, and combine information more easily. However, machines cannot accomplish all of these tasks without human direction, because web pages are designed to be read by people, not machines. The Semantic Web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web. The Semantic Web, as originally envisioned, is a system that enables machines to “understand” and respond to complex human requests based on their meaning. Such an “understanding” requires that the relevant information sources be semantically structured. [Wikipedia]
Semantics (from Ancient Greek: σημαντικός sēmantikós, “significant”) is the study of meaning. It focuses on the relation between signifiers, like words, phrases, signs, and symbols, and what they stand for, their denotation. Linguistic semantics is the study of meaning that is used for understanding human expression through language. Other forms of semantics include the semantics of programming languages, formal logics, and semiotics. [Wikipedia]
The Slovakian occupation classification Štatistická klasifikácia zamestnaní is based on ISCO-08, with five levels. It is available in Slovak and English and in use since end of 2011.
The term skill gap is used when the type or level of skills is different from that required to perform the job adequately.
Skill shortage occurs if the demand for a particular type of skill exceeds the supply of people with that skill at equilibrium rates of pay.
A Simple Knowledge Organisation System (SKOS) is a W3C recommendation designed for representation of thesauri, classification schemes, taxonomies, subject-heading systems, or any other type of structured controlled vocabulary. SKOS is part of the Semantic Web family of standards built upon RDF and RDFS, and its main objective is to enable easy publication and use of such vocabularies as linked data. [Wikipedia]
The occupation classification of Montenegro (SKZ) is based on ISCO-08. It is available in two languages (MN, EN).
Smart data is not about data per se but rather refers to the ways to analyze and make sense of it. While big data is an unstructured, raw amount of data that reflects consumer behavior, smart data is how we discover the underlying rationale and predict repetition of such behavior.
Technically, it is about separating and ignoring the noise, finding relevant data points, and extracting signals of higher value. Practically, it is about generating meaningful information, change recommendations and interactive visualization – all for a specific business contexts.
In short, smart data means adding advanced business intelligence on top of big data, in order to provide actionable insights.
SOC 2010 (UK & Ireland)
The Standard Occupational Classification (SOC) 2010 of Great Britain has its own structure, but a mapping to ISCO-08 is available and where possible during the 2010 revision it was aligned to ISCO-08. It has four levels and is available in English. Its latest version is from 2010.
SOC 2018 (USA)
The Standard Occupational Classification (SOC) 2018 of the United States of America has its own structure, but a mapping to ISCO-08 is available. It has four levels and is available in English and Spanish. Its latest version is from 2018.
SSOC 2010 & 2015 (Singapore)
The Singapore Standard Occupational Classification (SSOC) 2010 and 2015 is based on SOC and ISCO-08.
Additionally, there is an education classification Singapore Standard Educational Classification (SSEC) and an industries classification Singapore Standard Industrial Classification (SSIC) in Singapore.
The Swedish occupation classification Standard för svensk yrkesklassificering is based on ISCO-08. It consists of four levels and is available in two languages (S, EN). It was officially adopted in 2012.
We see a model as something that describes how observed data is generated. For example, the grades for an exam could fit a bell curve, so the assumption that the grades are generated via a bell curve (a.k.a. normal distribution) is the model.
The Norwegian occupation classification Standard for Yrkesklassifisering (STYRK) is based on ISCO-08. It consists of four levels and is available in Norwegian. It was officially adopted in 2011.
Support vector machines
Support vector machine (SVM) learns a hyperplane to classify data into 2 classes. At a high-level, SVM performs a similar task like C4.5 except SVM doesn’t use decision trees at all.
A synonym (also metonym and poecilonym) is a word with the same or similar meaning of another word. Words that are synonyms are said to be synonymous, and the state of being a synonym is called synonymy. The word comes from Ancient Greek syn (σύν) (“with”) and onoma (ὄνομα) (“name”). An example of synonyms are the words begin and commence. Likewise, if we talk about a long time or an extended time, long and extended become synonyms. In the figurative sense, two words are often said to be synonymous if they have the same connotation.
Taxonomy is the practice and science of classification. The word finds its roots in the Greek τάξις, taxis (meaning ‘order’ or ‘arrangement’) and νόμος, nomos (meaning ‘law’ or ‘science’). Taxonomy uses taxonomic units, known as taxa and is arranged in a hierarchical structure. Typically this is organized by supertype-subtype relationships, also called generalization-specialization relationships, or less formally, parent-child relationships. In such an inheritance relationship, the subtype by definition has the same properties, behaviors, and constraints as the supertype plus one or more additional properties, behaviors, or constraints.
Civil engineer is a subconcept of engineer, so any civil engineer is also an engineer, but not every engineer is also a civil engineer.
A thesaurus lists terms/words grouped together based on the similarity of their meaning and their interrelationship. It lists synonyms as well as superordinate and subordinate concepts. In contrast to a dictionary, the definition and pronounciation of the words is not a focus in a thesaurus.
UK skills taxonomy
The UK skills taxonomy was created by Nesta, an innovation foundation based in the United Kingdom. It is the first data-driven skills taxonomy that helps measuring skill shortages in the UK. Currently, it maps 10,500 unique skills that are mentioned in 41 million UK job adverts from between 2012 and 2017. The skills include specific tasks, knowledge, software programs and personal attributes. JANZZon! is regularly mapped with these UK skills taxonomy data and constantly enhanced with further semantic content and languages.
Underemployment is a measure of employment and labor utilization in the economy that looks at how well the labor force is being utilized in terms of skills, experience and availability to work. Labor that falls under the underemployment classification includes those workers that are highly skilled but working in low paying jobs, workers that are highly skilled but work in low skill jobs and part-time workers that would prefer to be full-time. This is different from unemployment in that the individual is working but isn’t working at their full capability. [Investopedia]
The World Wide Web Consortium (W3C) is an international community that develops open standards to ensure the long-term growth of the Web. Their mission is to lead the World Wide Web to its full potential by developing protocols and guidelines that ensure the long-term growth of the Web.
A white label product or service is a product or service produced by one company (the producer) that other companies (the marketers) rebrand to make it appear as if they made it. Thus, the same generic product can be sold under different brands, each with the Corporate Identity of its marketer. White label production is often used for consumer products (e.g. blank CDs) or web applications.
The Work Skills Qualifications (WSQ) is a credentialing system of the Singapore Workfore Development Agency. Its aim is to train, develop, assesses and recognise individuals for the key competencies that companies look for in potential employees.