– cross reference and apply shapefile polygons on coordinates in data – Python/GIS | Geopandas & shapley
Use-case:
Applying reverse-geocoding free of charge using an appropriate shapefile (including polygons data) in Python. Let’s say you have some geographical coordinates in your data (latitudes and longitudes). The questions is how to identify the city name, province name or other geographical levels depending on which country or regions we are looking at. This creates a nice consistent location identification in your data. In this tutorial, I use the shapefile polygons from Statistics Canada – Census boundary files. You can find the shapefiles here for download: source: https://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/bound-limit-eng.cfm
In the case of Canada and this shapefile, there are geographical information related to Census Division, Census SubDivision, Census Metropolitan Area, Economic Region, Province, among some other levels. We are going to identify all these information using coordinates data.
- For python related tutorials, see this playlist of video tutorials: https://www.youtube.com/playlist?list=PL_b86y-oyLzAQOf0W7MbCXYxEkv27gPN6
- For GIS/Spatial/Map related tutorials, see this playlist of video tutorials: https://www.youtube.com/watch?v=PKiCAcnBQvE&list=PL_b86y-oyLzC52Hn6GuKC5DpdjKrqiOQo
Video Tutorial for this blog is available here:
Step1: package
import numpy as np
import pandas as pd
from datetime import datetime
import re
from tqdm import tqdm
#
from google.colab import drive
drive.mount("/content/drive")
#
# required libraries for shape file cross referencing
!pip install shapely
!pip install geopandas
import shapefile
from shapely.geometry import Point # Point class
from shapely.geometry import shape # shape() is a function to convert geo objects through the interface
from shapely.geometry import Polygon
import geopandas as gpd
Step2: data input
!cd '/content/drive/MyDrive/Data/'
# get raw data with coordinates
data = pd.read_csv("spatial_data - points.csv")
data.head()
Data sample for coordinate to reverse geocode:
One more thing here is to separate latitude and longitude from comma separated coordinates column:
data[['Latitude','Longitude']] = data['points_coords'].str.split(',', expand=True).astype(np.float64)
data.head()
Step3: Polygon data in shapefiles
### stat can census 2016 cartographic boundary _ census subdivition shape file
polygondata = gpd.read_file('/content/drive/MyDrive/Data/lcsd000a16a_e/lcsd000a16a_e.shp')
# convert statcan type of coordinate to normal coordinates
polygondata = polygondata.to_crs({'init': 'epsg:4326'})
polygondata.head(5)
We use geopandas package to read the shapefile data.
An important note here is to have mind your coordinate type should be same and consistent between data and shapefile or polygon. Here my data is in normal coordinate type that would normally see using google maps, but not my shapefile coordinate. Hence, I use to_crs function to fix it. More context on that can be found here https://gis.stackexchange.com/questions/276940/re-projecting-lat-and-long-in-python-geopandas-but-geometry-unchanged
Here is the previous of shapefile data:
Step4: test the reverse geocoding method: Point function from shapley package
polygondata[polygondata['CSDNAME'].str.contains('Toronto')].contains(Point(-79.3761569,43.6501307))
you get a true or false value from this.
Step5: the main step, scaled application of reverse geocoding on all data
Final data review:
Related Links
- code solution script: https://colab.research.google.com/drive/1jkoSHIA4nmXEvbHyuDCcbf_3gBAcCPBA?usp=sharing
Appreciate the recommendation. Let me try it out.