The SMM data infrastructure

The basis of all analyses and publications of the Swiss Job Market Monitor (SJMM) is a complex data infrastructure. This infrastructure includes (1) the collection of data, (2) data processing, (3) quality management tools, and (4) the SJMM's publications. 

These four main infrastructure components are depicted in the graphic below. We further elaborate their details in the subsequent paragraphs.

Additional information on the Adecco Swiss Job Market Index can be found here, details on the Scientific-Use-File can be accessed here.

 

Annual methodological cycle of the SMM data infrastructure

SMM methodological cycle

1. Data collection

The SJMM collects job ad data that appear in the web, either on corporate websites or on dedicated job portals. The aggregation of data from job portals is run continuously via automated software. In this step, job ads from 11 job portals are scraped three times per day, leading to the detection and storage of about 1,3 million new ads per year. Data from corporate websites are collected in a computer-assisted format. Each quarter, we store about 4.700 ads from 1.300 firms.

1. Panel of job portals

Vacancies advertised on job portals have been collected since 2006. We identify the most relevant portals through our annual corporate survey (see below). This way, we ensure that we cover about 95% of job ads published on any job portal for Switzerland.

2. Corporate panel

Data collection from corporate websites has been going on since 2001. For this purpose, we drew a representative panel (of over 1.500 firms) from the Business & Employment Register (BER) of the Federal Statistical Office. Since then, we use dedicated quality management tools (see below) and a yearly BER update to renew the sample.

The panel is stratified proportionally by economic sector and disproportionally by firm size. Through this approach, we make sure that all larger companies (particularly important for the Swiss job market) are included in our sample. In our analyses, we employ weighting techniques to adjust for the stratified sampling.

The corporate panel is the starting point for both, the job ad collection from websites, and the annual corporate survey. However, for the former, we can only include firms that have a website.

2. Data processing and information extraction

The SJMM data infrastructure relies on several tools that process job ad data and extract information from them.

1. Coding of SUF variables

We annually publish a new and updated Scientific-Use-File (SUF) including data from the previous year. The coding of variables included in the SUF is started in April, and it is the task of several trained research assistants. This personnel receives an in-depth training, and we provide them with annually updated coding schemas.

2. Automatic coding of ASJMI data

The quarterly Adecco Swiss Job Market Index is based on extractions from an automatic coding tool that generates regional and occupational information for each job ad.

3. Other information extraction tools

Information extraction is a central part of several projects that we worked on in the past, and that we will complete in the future. For example, we extracted IT skill requirements in a project on wage returns to IT skills. In a new and ongoing project, we expand this kind of work by analysing skill and task profiles in the digital economy.

3. Quality management

Collecting data from the web enables us to provide a worldwide unique dataset of job ads. At the same time, the use of such big data poses new methodological challenges. For example, web based data sources are not always representative of analysed populations. Another potential problem is called drift, which arises if data collection is affected by structural changes in the observed websites.

To minimize these risks, we employ several quality management (QM) tools:

1. Temporal comparability

To ensure that we provide consistent time-series, we use information from the annual corporate survey that tells us about how and where firms publish their vacant positions. Moreover, by inspecting time-series on a continuous basis through a dedicated tool, we detect outliers and correct for them at an early stage.

2. Representativity & corporate survey

A key quality feature of the SJMM data is their representativity. The main tool used to maintain representativity is our corporate survey, running between January and March each year. In this survey, we ask firms about which recruitment channels they use and what are the most relevant job portals for them. Using these data, we adapt our sampling each year to ensure that our analyses cover the entire Swiss job market. Carried out since 2001, the annual survey has usually reached a response rate of about 60%. Beyond the survey, we furthermore use an internal feedback system and a yearly update from the Business & Employment Register (BER) to keep our corporate panel up-to-date.

3. Other quality management

Other QM tools are used at several steps in our data collection process. This includes, for example, a tool to inspect the consistency of time-series, a tool to validate the quality of automatically collected job ad texts, and a tool to detect and correct for duplicate job ads in our corpus.

4. Publications

As a scientific project, publications are our most important output. We address the scientific community, but also politics, businesses, associations and the wider public.

Each year, we publish a new version of the SUF, updated with the job ad data from the previous year. Recurring publications in cooperation with the Adecco Group Switzerland are the quarterly Adecco Swiss Job Market Index and the annual Skills Shortage Index. These two formats generate hundreds of media responses each year.

Beyond the periodical publications, we write articles in scientific journals, policy papers and expert opinions commissioned by public or private sector actors (see here).

5. Press data collection (old)

A completed part of our data collection efforts is the harvesting of job ad data from the printed press.

This entailed the collection of job ads for a representative sample of Swiss newspapers from 1950 (German-speaking Switzerland), and 2001 (all of Switzerland). The sample was stratified by size (3 categories) and region (7 categories), and changes in the media landscaped (fusions, bankruptcies, etc.) were considered during an annual sample update. The sample included all titles with at least 75.000 printed issues, while smaller newspapers were represented through randomized sampling.

The press panel included about 90 newspapers. Job ads were sampled each year in the second week of March. Until 2000, we collected about 500 ads each year, which was increased to 650 starting in 2001.

After results from our corporate survey had strongly indicated for several years that the printed press included only a minimal proportion of unique ads, we decided to stop this part of our data collection in 2018.