Google became the main starting point for our online activities. The search engine processes about 40,000 searches every second or 3.5 billion searches per day. It records what people are interested in, what they worry about or where they want to travel. In a unique manner, the search engine captures trends in interests and behavior. Hidden racisms, sexual orientation or ad returns - check out the work by Seth Stephens-Davidowitz to get some inspiration for the huge potential of Google Trends data.
While the Google Trends cockpit offers a user-friendly tool to compare the popularity of keywords over time, accessing the data directly in R can make life easier especially when you want to match the data to other datasets. This tutorial offers an example of how Google Trends data can be directly retrieved to R using the package
The
install.packages('gtrendsR')
library (gtrendsR)
The package offers the same selection options as the Google Trends interface on the website. First, you select the keywords. Already here, it is important to keep in mind that the obtained values are always relative to the maximum volume for one keyword in one period (and not the absolute search volume). Thus, if the analysis contains a highly popular keyword, less popular keywords will have values close to 0 and it will be hard to analyze any variation over time.
Region: Set the region of the query. The default is 'all'. For specific countries use the country code. The
Time window: Set the specific time window, "today+5-y" Last five years (default), 'all' for all since 2004 or a specific time span using "Y-m-d Y-m-d".
Further, the package allows to specify the channel ("web" (default), "news" "images", "froogle" (shopping) and "youtube").
Let us analyze the search volume in Germany for three big cities and major tourist destinations:
#define the keywords
keywords=c("Paris","New York","Barcelona")
#set the geographic area: DE = Germany
country=c('DE')
#set the time window
time=("2010-01-01 2018-08-27")
#set channels
channel='web'
Now, we are ready to run the query. The query returns for popular queries not only the trend over time buy also interest by city and further details. We select interest over time and obtain a matrix that contains our selected values and the value 'hits' - the search volume for each month.
trends = gtrends(keywords, gprop =channel,geo=country, time = time )
#select only interst over time
time_trend=trends$interest_over_time
head(time_trend)
date | hits | keyword | geo | gprop | category |
---|---|---|---|---|---|
2010-01-01 | 22 | Paris | DE | web | 0 |
2010-02-01 | 22 | Paris | DE | web | 0 |
2010-03-01 | 24 | Paris | DE | web | 0 |
2010-04-01 | 25 | Paris | DE | web | 0 |
2010-05-01 | 24 | Paris | DE | web | 0 |
2010-06-01 | 22 | Paris | DE | web | 0 |
The matrix contains a value for each month and keyword (hits). We plot the result over time to obtain an idea
library(ggplot2)
plot<-ggplot(data=time_trend, aes(x=date, y=hits,group=keyword,col=keyword))+
geom_line()+xlab('Time')+ylab('Relative Interest')+ theme_bw()+
theme(legend.title = element_blank(),legend.position="bottom",legend.text=element_text(size=12))+ggtitle("Google Search Volume")
plot
We can see that one event is dominating the figure: The November 2015 Paris attacks caused a spike in search volume. This example demonstrates that outliers can dominate the analysis as the hits are displayed relative to the highest search volume.
Let’s remove November 2015 to get a better idea of the overall trend.
time_trend2=time_trend[time_trend$hits<45,]
plot<-ggplot(data=time_trend2, aes(x=date, y=hits,group=keyword,col=keyword))+
geom_line()+xlab('Time')+ylab('Relative Interest')+ theme_bw()+
theme(legend.title = element_blank(),legend.position="bottom",legend.text=element_text(size=12))+ggtitle("Google Search Volume")
plot
Apart from spike for Paris, we see that
We now apply some smoothing to remove the seasonality:
plot<-ggplot(data=time_trend2, aes(x=date, y=hits,group=keyword,col=keyword))+
geom_smooth(span=0.5,se=FALSE)+xlab('Time')+ylab('Relative Interest')+
theme_bw()+theme(legend.title = element_blank(),legend.position="bottom",
legend.text=element_text(size=12))+ggtitle("Google Search Volume")
plot
`geom_smooth()` using method = 'loess'
The line plots with a smoothing factor show clearly that the volume for 'New York' decreased while 'Barcelona' increased since 2015. ‘Paris’ had a lower volume after the attacks but recovered within 2 years.
The package allows to directly plotting the data, which is a great functionality if you just want to test some keywords.
plot(gtrendsR::gtrends(keyword = c("New York","Paris","Barcelona"), geo = "DE", time = "2010-01-01 2018-08-27"))
plot(gtrendsR::gtrends(keyword = c("Berlin","München","Frankfurt","Hamburg","Köln"), geo = "DE", time = "2010-01-01 2018-08-27"))