Among the variety of open source relational databases, PostgreSQL is probably one of the most popular due to its functional capacities. That is why it is frequently used among all the areas of work where databases are involved.
In this article, we will go through connection and usage of PostgreSQL in R. R is an open source language for statistical and graphics data analysis providing scientists, statisticians, and academics powerful tools for various manipulations. Besides, it allows creating and running emulations of the real-world data. Usually, R comes with an RStudio IDE, so that will be used while connecting and using PostgreSQL.
One of the great things about R language is that it has numerous packages for almost every kind of needs. Moreover, the package library is constantly growing, as the packages are set up and developed by the community.
Two main packages can be found in the library for connecting PostgreSQL in R environment: RPostgreSQL and RPostgres. Both of them provide great functionality for database interactions, the difference is only in the way of installation.
The RPostgreSQL package is available on the CRAN, a Comprehensive R Archive Network, and is installed with the following command run in the IDE:
install.packages('RPostgreSQL')
As for the RPostgres package, it can be installed in two ways: cloning from Github and installing directly from CRAN.
To install the package from Github, first, devtools and remotes packages must be installed with the commands.
install.packages('devtools')
install.packages(‘remotes’)
Then, for installing package, run
remotes::install_github("r-dbi/RPostgres")
To install package from CRAN, the next basic command is used:
install.packages(‘RPostgres’)
The difference in these two ways is that in CRAN the latest stable version of a package is stored while on Github users can find the latest development version.
The truth is, RPostgreSQL and RPostgres packages have no difference in the way they connect to the PostgreSQL database. They both use a special DBI package in R that provides a wide range of methods and classes to establishing connection with DBs.
Note: we used RPostgres package for establishing the connection.
The Postgres package comes with the next command:
con<-dbConnect(RPostgres::Postgres())
With the following steps you can set up the connection to a specific database:
library(DBI) |
To check if the connection is established, we can run the
dbListTables(con)
function that returns the list of the tables in our database.
As you can see, no tables are stored in our database, so now it’s time to create one.
As we’ve already mentioned, the R language provides a great pack of simulated datasets, that can be directly used from the IDE without downloading them previously.
For our examples, we will use a popular “mtcars” dataset example, which contains data from the 1974 Motor Trend magazine car road test. Let’s first add it to the database and then check whether it has appeared in our database.
The basic command to add “mtcars” to our database is
dbWriteTable(con, "mtcars", mtcars)
But we will do a little trick, that can make our table a little bit more readable.
What we’ve done, is set up the table as a dataframe in R, renamed the first column to ‘carname’ and then removed initial dataset with the rm(mtcars) command as it is stored in the variable my_data.
Using the
dbWriteTable
method, we can write our dataframe to a PostgreSQL table. Then, let’s check how our table looks.
Having a table in the database, we can now explore queries.
For working with queries, two basic methods are needed:
dbGetQuery
and
dbSendQuery
The dbGetQuery method returns all the query results in a dataframe. The dbSendQuery registers the request for the data that has to be called by
dbFetch
for RPostgres to receive data. The dbFetch method allows setting parameters to query your data in some batches.
The database table must have some primary key, basically, a unique identifier for every record in the table.
Let’s assign the names of the cars in our table as a primary key using dbGetQuery method.
dbGetQuery(con, 'ALTER TABLE cars ADD CONSTRAINT cars_pk PRIMARY KEY ("carname")') |
We have already used the dbReadTable method, but let’s return to it for a little bit to clarify the way it works.
The
dbReadTable
method returns an overview of the data stored in the database and basically does the same function as
dbGetQuery(con, ‘SELECT * FROM cars’)
method.
It should be noted that after using dbSendQuery requests, the
dbClearResult
method must be called, to remove any pending queries from the database to the current working environment. The dbGetQuery method does this by default and therefore there is no need to call dbClearResult after the execution.
The way of creating queries for a customized data table is basically the same as in SQL. The only difference is that the results of queries in R are stored as a variable.
First, we extracted the query with the needed data from our cars table to a new variable. Then, we fetched it to the resulting variable, from which we can create a new table in our database and analyze the output of our query.
Finally, the connection must be closed with the
dbDisconnect(con)
method.
In this article, we tried to cover the basis of connecting and using PostgreSQL in the R environment. Knowing the essentials of the SQL syntax, querying and modifying data in R is enough to connect to any standard database.. Nevertheless, we suggest reading through the package documentation, which will give you more insights on how to query data from PostgreSQL to the R environment.