Blog > API

Using Python to access the Bank of England's Statistical Interactive Database (IADB)

  

 

 

 

In this tutorial we will retrieve data from the Bank of England's Statistical Interactive Database (IADB) using Python 3. Please take a look at the following sites to get a feel for what kind of data is available:

  1. https://www.bankofengland.co.uk
  2. http://www.bankofengland.co.uk/boeapps/iadb/

As an excample, we will take a look at the fixed mortgage interest rates in the UK since the year 2000. The full description of the series is:

Monthly interest rate of UK monetary financial institutions (excl. Central Bank) sterling 2, 3, 5 and 10 year (75% LTV) fixed rate mortgage to households (in percent) not seasonally adjusted.

We will use Python to (re)construct the following example URL and retrieve the data. If you click on this link, it will download the data:

http://www.bankofengland.co.uk/boeapps/iadb/fromshowcolumns.asp?csv.x=yes&Datefrom=01/Jan/2000&Dateto=01/Oct/2018&SeriesCodes=IUMBV34,IUMBV37,IUMBV42,IUMBV45&CSVF=TN&UsingCodes=Y&VPD=Y&VFD=N

In the above URL, we used CSV as the data format. It is possible to download data in other formats, like XML, HTML and Excel. In this tutorial we will work with Pandas, so we prefer to retrieve the data in CSV format. Please note that the API takes the following parameters:

Parameter name Required format Example
Datefrom DD/MON/YYYY (mandatory parameter) 01/Feb/2006
Dateto DD/MON/YYYY or "now" (mandatory parameter) 01/Oct/2007
SeriesCodes Comma separated list of full series codes (mandatory parameter) LPMAUZI,LPMAVAA
UsingCodes "Y" (mandatory parameter) Y
CSVF "TT", "TN", "CT" or "CN" (mandatory parameter) TN
VPD "Y" or "N" (optional parameter) Y
 

NB:

  • The "SeriesCodes" parameter supports up to 300 comma separated codes in a single request.
  • The "CSVF" format parameter options are "TT" (Tabular with titles), "TN" (Tabular no titles), "CT" (Columnar with- titles) or "CN" (Columnar no titles).
  • The "VPD=Y" parameter need only be included if provisional data is desired.


As always, let's start with importing some packages we will use for this exercise.

In [1]:
import pandas as pd # 0.23.0
import requests     # 2.19.1
import io
 

First, we define the URL endpoint:

In [2]:
url_endpoint = 'http://www.bankofengland.co.uk/boeapps/iadb/fromshowcolumns.asp?csv.x=yes'
 

Secondly, we define the parameters of our request:

In [3]:
payload = {
    'Datefrom'   : '01/Jan/2000',
    'Dateto'     : '01/Oct/2018',
    'SeriesCodes': 'IUMBV34,IUMBV37,IUMBV42,IUMBV45',
    'CSVF'       : 'TN',
    'UsingCodes' : 'Y',
    'VPD'        : 'Y',
    'VFD'        : 'N'
}
 

We can use the requests package to construct the URL with the dictionary of parameters:

In [4]:
response = requests.get(url_endpoint, params=payload)
In [5]:
# Check if the response was successful, it should return '200'
print(response.status_code)
 
200
 

The URL that was constructed by the requests package is:

In [6]:
print(response.url)
 
http://www.bankofengland.co.uk/boeapps/iadb/fromshowcolumns.asp?csv.x=yes&Datefrom=01%2FJan%2F2000&Dateto=01%2FOct%2F2018&SeriesCodes=IUMBV34%2CIUMBV37%2CIUMBV42%2CIUMBV45&CSVF=TN&UsingCodes=Y&VPD=Y&VFD=N
 

We can use Pandas' read_csv function to load the data. The function takes a file (or an URL) as an argument. We already downloaded the data (as bytes), so we can pass it with 'io.BytesIO'.

In [7]:
df = pd.read_csv(io.BytesIO(response.content))
In [8]:
# Print the first 5 rows to the screen for inspection
df.head()
Out[8]:
 
  DATE IUMBV34 IUMBV37 IUMBV42 IUMBV45
0 31 Jan 2000 6.78 6.78 7.09 7.29
1 29 Feb 2000 6.79 6.80 7.11 0
2 31 Mar 2000 6.45 6.70 6.94 6.54
3 30 Apr 2000 6.31 6.68 6.92 6.54
4 31 May 2000 6.39 6.64 6.90 6.69
 

This already looks pretty good. But, notice the value of '0' at '29 Feb 2000' for the 10-year mortgage rate ('IUMBV45'), which looks suspicious. Please inspect the series df['IUMBV45'] and you'll see there are missing values between index 116 to 174, represented by '..' (two dots). Strangely, at index 157, there is a value. If you are working in Jupyter Notebooks and cannot print the entire series to your screen, you can use this little hack (converting the series to string)

print(df['IUMBV45'].to_string())

Can you see the series? Let's continue the inspection.

In [9]:
# Check the DataFrame's info
df.info()
 
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 225 entries, 0 to 224
Data columns (total 5 columns):
DATE       225 non-null object
IUMBV34    225 non-null float64
IUMBV37    225 non-null float64
IUMBV42    225 non-null float64
IUMBV45    225 non-null object
dtypes: float64(3), object(2)
memory usage: 8.9+ KB
 

Checking the DataFrame's information, it seems there are three things that need to be done:

  1. Convert the 'DATE' column to 'datetime' format (now it is 'object)
  2. Set 'DATE' to be the index (now it is a 'RangeIndex')
  3. Convert the 10-year mortgage rate 'IUMBV45' to be numeric
In [10]:
# Step 1: Convert 'DATE' to datetime format
df['DATE'] = pd.to_datetime(df['DATE'])
In [11]:
# Step 2: Set 'DATE' to be the index
df = df.set_index('DATE')
In [12]:
# Check the new index
df.index
Out[12]:
DatetimeIndex(['2000-01-31', '2000-02-29', '2000-03-31', '2000-04-30',
               '2000-05-31', '2000-06-30', '2000-07-31', '2000-08-31',
               '2000-09-30', '2000-10-31',
               ...
               '2017-12-31', '2018-01-31', '2018-02-28', '2018-03-31',
               '2018-04-30', '2018-05-31', '2018-06-30', '2018-07-31',
               '2018-08-31', '2018-09-30'],
              dtype='datetime64[ns]', name='DATE', length=225, freq=None)
 

Series 'IUMBV45' is of type "object" (and not "float64") because of the missing values are represented by '..' (two dots). We can convert it to float64 and will empty all the cells with dots by setting the parameter errors='coerce' in Pandas 'to_numeric' function.

In [13]:
# Step 3: Convert the 10 year mortgage rate 'IUMBV45' to numeric
df['IUMBV45'] = pd.to_numeric(df['IUMBV45'], errors='coerce')
In [14]:
# Check the DataFrame's info again 
df.info()
 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 225 entries, 2000-01-31 to 2018-09-30
Data columns (total 4 columns):
IUMBV34    225 non-null float64
IUMBV37    225 non-null float64
IUMBV42    225 non-null float64
IUMBV45    167 non-null float64
dtypes: float64(4)
memory usage: 8.8 KB
 

This looks better, the dates are the index now and there are 167 values left for 'IUMBV45'. Remember the value of '0' at '29 Feb 2000'? Let's see if there are any other dates with a value of '0'.

In [15]:
df.loc[df['IUMBV45'] == 0]
Out[15]:
 
  IUMBV34 IUMBV37 IUMBV42 IUMBV45
DATE        
2000-02-29 6.79 6.8 7.11 0.0
 

It happens only once. For sake of simplicity of this tutorial, let's not worry too much about it and just get rid of this one occurrence. (We did report this anomaly to the Bank of England, so perhaps it is not there anymore when you run the code).

In [16]:
# Remove the value by setting it to None
df.loc['2000-02-29','IUMBV45'] = None
In [17]:
# Finally, check the statistics of the table
df.describe()
Out[17]:
 
  IUMBV34 IUMBV37 IUMBV42 IUMBV45
count 225.000000 225.000000 225.000000 166.000000
mean 3.955422 4.305067 4.590000 4.998976
std 1.554679 1.492455 1.389015 1.326059
min 1.350000 1.600000 1.900000 2.660000
25% 2.540000 2.960000 3.440000 3.475000
50% 4.060000 4.650000 4.980000 5.570000
75% 5.160000 5.470000 5.680000 5.890000
max 6.790000 6.800000 7.110000 7.290000
 

Excellent, the data does seem to make sense. Longer mortgages normally have higher rates than shorter ones. The mean interest rates from 2, 3, 5 and 10 mortgages (left to right respectively) are indeed increasing, which we expected. To conclude, let's visualize this data.

In [18]:
# IPython's magic to plot inline
%matplotlib inline
In [19]:
df.plot()
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x1162eeeb8>
 
 

Hopefully you enjoyed this tutorial!

 

About the author:
Joris H., Python & open source enthusiast. Entrepreneur @ Automation Wizards