Use-case:
A tutorial for how to cross reference, combine or merge two or more data tables or sheets, and specify whether to include all data rows or include only matched data rows from one or both data tables. We provide a practical conceptual tutorial of different types of merge ( left, right, inner and outer merge) with example. This is a beginner solution and accessible for every background, specially for less technical professionals or learners. We apply this solution with Python, but you do not need to have any pre-knowledge of using Python.
You may have your data in excel, google sheets, or any other source. This solution tutorial is applicable for you. In this example, our data exists in two sheets of a google sheets spreadsheet. Data and solution code all available to download here in below.
A simplified example: Assume we want to merge or cross reference two tables, the red or left and the blue or right tables. The common column, which is a unique identifier, for both tables is id column which has the same name in both tables. There are data rows with id values which are only exist in one of tables (highlighted with yellow) and there are other id values which are common in both tables. We can conduct different types of merge to combine and get a table such as the one in bottom. Different types of merge specifies whether we want to include all data rows, exclude unmatched data rows from one or both data tables.
Video Tutorial for this blog is available here:
- Follow this tutorial video to understand how the below steps work together for technical details.
Step1: package and data access to google sheets
I have two tables , profiles and orders of some employees in google sheets. I need to input some python packages we need and allow access between google colab and google sheets.
- Google Colab is the online editor freely available for all to use python portable without the need of local installation.
# required package and initial settings for reading a data file from google sheets in google colab
from google.colab import auth
auth.authenticate_user()
import gspread
from google.auth import default
creds, _ = default()
gc = gspread.authorize(creds)
# other required packages depending on the need
import numpy as np
import pandas as pd
The data tables look like this:
Step2: Possible additional steps on your sheets data before merging (filter or slice data, sort data, drop duplicated rows)
- sort sheets
refer to step by step tutorial with example in https://www.youtube.com/watch?v=qccaiFxptGA
- filter sheets or slice data
refer to step by step tutorial with example in https://www.youtube.com/watch?v=oQ4ZXvyoCig
- remove duplicates data
refer to step by step tutorial with example in https://www.youtube.com/watch?v=XFQdbuptG8A
Step3: Merge types or methods
Step4: Apply any type of merge on two data tables
#
method1 = 'left'
method2 = 'right'
method3 = 'inner'
method4 = 'outer'
output = pd.merge(profiles, orders,
left_on=['id'], right_on=['id'],
how=method1)
# or merged based on two columns first name and last name
output = pd.merge(profiles, orders,
left_on=['last_name','first_name'], right_on=['last_name','first_name'],
how=method1)
output.head()
Step5: merge with a third or more data tables
output = pd.merge(output, third_table,
left_on=['id'], right_on=['id'],
how=method1)
Step 6: Store and save the result
output = output.fillna('')
# Open our new sheet and add some data.
sheet = gc.open('Data_sample1')
sheet.add_worksheet('merged_data',rows=250, cols=20)
worksheet3 = gc.open('Data_sample1').worksheet('merged_data')
worksheet3.update([output.columns.values.tolist()] + output.values.tolist())
Related Links
- code solution script: https://colab.research.google.com/drive/11lfDJ34btjWdJywtW5PrGmgUWLpiEVyH?usp=sharing
- data example https://docs.google.com/spreadsheets/d/1P4s3vFSeJB8XIfKpuos0ZMOcOlOykqSazl1Vl-zzTzk/edit?usp=sharing
Check out these related tutorial in your convenient time:
- For python related tutorials, see this playlist of video tutorials: https://www.youtube.com/playlist?list=PL_b86y-oyLzAQOf0W7MbCXYxEkv27gPN6
- For statistical and econometric related tutorials, see this playlist of video tutorials: https://www.youtube.com/watch?v=aHBquefG6dQ&list=PL_b86y-oyLzDTtPT8Y1zTt4kLpdOBhxOZ