Automated Structuring and Processing of Job Ad Information

Job advertisements are a fairly standardized type of text that regularly contain basic information about the job, its requirements, the desired incumbent or the company aiming to fill the vacancy. Hence, automated methods of information processing seem to be a promising approach to retrieve information from job ads.

The SJMM intends, firstly, to automate the collection of many variables that are currently part of the Scientific Use File, and secondly, to provide rich, additional information extracted from job ads.

To this end, we have implemented a high-performing text zoning solution based on neural networks (Gnehm, 2018). Our text zoning identifies text parts dedicated to specific subjects of job ads, like the company description, job description or skill requirements. Text zoning is the natural starting point for our information extraction (IE) undertaking as the structuring of job ad texts facilitates all subsequent steps. Text zoning served us then, for instance, to develop a sophisticated ICT-skill extraction tool (Buchmann, Buchs & Gnehm, 2020).

A major goal of our NRP77-study Monitoring Task and Skill Profiles in the Digital Economy is to spot information on job tasks and required skills (e.g., formal education and training, language skills, social, cognitive, and personal skills as well as ICT-complementary skills) in job ad texts. With data-driven approaches, this information is then turned into structured data aligned with taxonomies and ontologies established in the field.

A further long-term goal is to extract information on the self-representation of the company advertising the ad as well as (mostly) implicit information on gender and age of the desired jobholder.


Schultheiss, Tobias, Curdin Pfister, Uschi Backes-Gellner & Ann-Sophie Gnehm. 2018. Tertiary education expansion and task demand: Does a rising tide lift all boats? Economics of Education Working Paper Series 0154, University of Zurich, Department of Business Administration (IBW), revised Jul 2019.  Link

Gnehm, Ann-Sophie & Simon Clematide. 2020. Text Zoning and Classification for Job Advertisements in German, French and English. Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science 2020:83–93. Link

Kiener, Fabienne, Ann-Sophie Gnehm & Uschi Backes-Gellner, 2020. Non-Cognitive Skills in Training Curricula and Heterogeneous Wage Returns. Economics of Education Working Paper Series 0175, University of Zurich, Department of Business Administration (IBW). Link