Skip to content

Step-by-step Data Table Merging or Cross Referencing Tutorial for Beginners – covering all types of merges

Particular useful for non-technical professionals or beginners.

What are different types of marges for data tables and how we can apply them on any data tables or sheets, for example on excel or google sheets files

Use-case:

A tutorial for how to cross reference, combine or merge two or more data tables or sheets, and specify whether to include all data rows or include only matched data rows from one or both data tables. We provide a practical conceptual tutorial of different types of merge ( left, right, inner and outer merge) with example. This is a beginner solution and accessible for every background, specially for less technical professionals or learners. We apply this solution with Python, but you do not need to have any pre-knowledge of using Python.

You may have your data in excel, google sheets, or any other source. This solution tutorial is applicable for you. In this example, our data exists in two sheets of a google sheets spreadsheet. Data and solution code all available to download here in below.

A simplified example: Assume we want to merge or cross reference two tables, the red or left and the blue or right tables. The common column, which is a unique identifier, for both tables is id column which has the same name in both tables. There are data rows with id values which are only exist in one of tables (highlighted with yellow) and there are other id values which are common in both tables. We can conduct different types of merge to combine and get a table such as the one in bottom. Different types of merge specifies whether we want to include all data rows, exclude unmatched data rows from one or both data tables.

Video Tutorial for this blog is available here:

  • Follow this tutorial video to understand how the below steps work together for technical details.


Step1: package and data access to google sheets

I have two tables , profiles and orders of some employees in google sheets. I need to input some python packages we need and allow access between google colab and google sheets.

  • Google Colab is the online editor freely available for all to use python portable without the need of local installation.
# required package and initial settings for reading a data file from google sheets in google colab
from google.colab import auth
auth.authenticate_user()

import gspread
from google.auth import default
creds, _ = default()

gc = gspread.authorize(creds)

# other required packages depending on the need
import numpy as np
import pandas as pd

The data tables look like this:

profile table
orders data

Step2: Possible additional steps on your sheets data before merging (filter or slice data, sort data, drop duplicated rows)

  • sort sheets

refer to step by step tutorial with example in https://www.youtube.com/watch?v=qccaiFxptGA

  •  filter sheets or slice data 

refer to step by step tutorial with example in https://www.youtube.com/watch?v=oQ4ZXvyoCig

  • remove duplicates data

refer to step by step tutorial with example in https://www.youtube.com/watch?v=XFQdbuptG8A 

Step3: Merge types or methods

Step4: Apply any type of merge on two data tables

#
method1 = 'left'
method2 = 'right'
method3 = 'inner'
method4 = 'outer'

output = pd.merge(profiles, orders, 
         left_on=['id'], right_on=['id'],
         how=method1)
# or merged based on two columns first name and last name 
output = pd.merge(profiles, orders, 
         left_on=['last_name','first_name'], right_on=['last_name','first_name'],
         how=method1)

output.head()

Step5: merge with a third or more data tables

output = pd.merge(output, third_table, 
         left_on=['id'], right_on=['id'],
         how=method1)

Step 6: Store and save the result

output = output.fillna('')

# Open our new sheet and add some data.
sheet = gc.open('Data_sample1')
sheet.add_worksheet('merged_data',rows=250, cols=20) 
worksheet3 = gc.open('Data_sample1').worksheet('merged_data')
worksheet3.update([output.columns.values.tolist()] + output.values.tolist())

Related Links

Check out these related tutorial in your convenient time:

  • For python related tutorials, see this playlist of video tutorials: https://www.youtube.com/playlist?list=PL_b86y-oyLzAQOf0W7MbCXYxEkv27gPN6
  • For statistical and econometric related tutorials, see this playlist of video tutorials: https://www.youtube.com/watch?v=aHBquefG6dQ&list=PL_b86y-oyLzDTtPT8Y1zTt4kLpdOBhxOZ

Leave a Reply

Your email address will not be published. Required fields are marked *