– Public data using a python API
Use-case:
In this tutorial, we show how to
- Download the meta data available for all the publicly available datasets or data tables from Statistics Canada in python
- Download any data table from Statistics Canada in python in a very simply lines of code
Video Tutorial for this blog is available here:
Step1: package
import requests
import json
import pandas as pd
import numpy as np
# the central package
!pip install stats_can
from stats_can import StatsCan
sc = StatsCan()
Step2: functions to download the meta data from stats can
def get_statcan_dict(url_cube):
with requests.Session() as c:
var1 = c.get(url_cube)
dict1 = json.loads(var1.text)
return dict1
#
def get_statcan_metadata(js_dict,product_id_path):
# donloaded metadata into a list
list1 = []
for i in js_dict:
list1.append(pd.DataFrame.from_dict(i, orient='index').T)
# concat that list and then create the dataframe
df1 = pd.concat(list1)
# create the download url
df1['url'] = df1['productId'].apply(lambda x: f'{product_id_path}/{x}-eng.zip')
return df1
url_cube = "https://www150.statcan.gc.ca/t1/wds/rest/getAllCubesList"
product_id_path = "https://www150.statcan.gc.ca/n1/tbl/csv"
stcan_dict = get_statcan_dict(url_cube)
df = get_statcan_metadata(stcan_dict,product_id_path)
An overview of what this metadata would look like:
Step3: downloading and reading any Statistics Canada data table in Python
data = sc.table_to_df('10100001')
The data overview of this sample table looks like:
Related Links
- code solution script: https://colab.research.google.com/drive/18zNRPnn9WDD3qniy2DaqkrOfQ-Kvu0j5?usp=sharing
- stats_can documentation: https://github.com/ianepreston/stats_can
- Statistics Canada information on how to access their public datasets: https://www.statcan.gc.ca/en/developers/wds/user-guide