One approach to estimate and track employer demand of data science software is to analyze which skills are asked for in job ads. We did this using job ads on Indeed and showed which data science software skills are most in-demand in Germany and worldwide. In this post, we describe the methods these analyses are based on. We worked with R, as it offers convenient packages facilitating the task.
There are two ways to search for jobs on Indeed, either directly on their website or using the API (for which registration is required). We used the API together with the helpful jobbR package. First, we filtered the job ads for data science related keywords such as Data Scientist, Data Analyst, Big Data or Machine Learning. For each of these search results, relevant information such as the company, job title, location and a link to the job ad could then be stored in a data frame, using jobbR. The search for the aforesaid terms also produced jobs – e.g., programmer, business analyst and scientist positions – only marginally related to the jobs we intended to analyze. In order to discard these from the analysis, we filtered our data frame so as to only include jobs containing “data” or “machine learning” in the job title, using the stringr package. The adjusted data frame left us with 28’732 jobs from 59 countries. Most jobs are located in the US (13’095), followed by Great Britain (2’921), Germany (1’840), France (1’425), and India (1’240). To analyze the requirements for different jobs, we further filtered for specific job titles, which gave us 5’988 “Data Scientist”, 7’497 “Data Analyst”, 2’197 “Data Engineer” and 1’716 “Machine Learning” positions.
Finally, we counted the number of job ads each software is mentioned in. For better comparability of the job titles, we transformed these counts to percentages in the worldwide data. We then used ggplot to plot the results.