Saturday, November 18, 2023

Geo-spatial Analysis- An Introduction

Part 1-Basic

Introduction

Geo-spatial analysis is process of analyzing information by studying features such as location, attributes and their relationship that helps us uncover certain patterns happening at geographical level. For example, a company might want to look at how different retail stores are spread across a state and is there a possible white space opportunity available.Below are some of the other use cases where geo-spatial analysis can be used.

  • Map certain metrics such as footprint, store productivity for a given location to draw a comparison among various stores
  • White space opportunities
  • Understand store coverage using catchment analysis(using constant time or constant area as catchment metrics)
  • Competitor Intelligence: Understand the coverage between yours and competitor’s by answering key questions such as where are they located, what is their coverage, has their been a shift in certain location preferences (CBD Vs Heartland), etc

In this blog we will look at how to perform geo-spatial analysis through a step by step approach. We will discuss several use cases and how we answer certain key business questions.


Step 1:Libraries required in R

We would require libraries such as sf, leaflet, osrm, etc to play around with geo-spatial objects.These libraries also tend to use OpenStreetMap.It is a free, open geographic database updated and maintained by a community of volunteers via open collaboration. Contributors collect data from surveys, trace from aerial imagery and also import from other freely licensed geodata sources.

Lets install the libraries

## Loading required package: tidyverse
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.4     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## Loading required package: sf
## Warning: package 'sf' was built under R version 4.2.3
## Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
## Loading required package: osrm
## Warning: package 'osrm' was built under R version 4.2.3
## Data: (c) OpenStreetMap contributors, ODbL 1.0 - http://www.openstreetmap.org/copyright
## Routing: OSRM - http://project-osrm.org/
## Loading required package: leaflet
## Warning: package 'leaflet' was built under R version 4.2.3
## Loading required package: RColorBrewer


Step 2: Defining the business problem

Before doing any analysis, it is very important to identify and define the business problem.This ensures that the analysis is precise and targeted.

Lets say that McDonald want to identify some location in Indonesia where they can open their outlets.To approach this, we would be using population data and will try to narrow down on places that high population density.This would be our starting point. At the end of the analysis, we should be able to provide names of few cities which can be considered for such expansion.


Indonesia has a total of 38 provinces and then each province has several cities and towns. For our analysis, we would look at analyzing the following:

  • Province level data
  • Zooming on city level info for ach province

For instance, Bali is a province in Indonesia and it has a total of 8 cities as shown below:


Step 3: Get the shape file for Indonesia Province Level Map

Many BI experts that work in tableau know that to plot a geo-heat map, we need to have a shape file in place. The shape file is then read into tableau which has all the geographical details to plot the map. However, in R, I have experienced that it is very difficult to find a shape file online.In order to circumvent this situation, we leverage geo-json or json files. These files ar easily available on github repo and can be ready into R and converted into shape files.

Lets say we have to plot the population of various provinces in Indonesia.For this, we would require province level geo-json or json objects which can be read into R.

If you just search this on google, you might get the following options

Just open this repo


access the geo-json file


Download the file


Just follow these steps and download the file.The name of the file is “indonesia.geojson” as shown below


I have added the file to my repository. You can download from this link.

Step 4: Read the geo-json file into R

We will now read the file into R

nm_json<-"indonesia.geojson"

indo_prov <- read_sf(nm_json )
indo_prov
## Simple feature collection with 34 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 95.01083 ymin: -11.00972 xmax: 141.0118 ymax: 5.906388
## Geodetic CRS:  WGS 84
## # A tibble: 34 × 6
##    cartodb_id country    id_1 slug               state                  geometry
##         <int> <chr>     <int> <chr>              <chr>        <MULTIPOLYGON [°]>
##  1         16 Indonesia     1 indonesia-aceh     Aceh  (((97.97681 4.627501, 98…
##  2          2 Indonesia    31 indonesia-sumater… Suma… (((99.17167 -1.502501, 9…
##  3          8 Indonesia    34 indonesia-yogyaka… Yogy… (((110.702 -8.185054, 11…
##  4         20 Indonesia    33 indonesia-sumater… Suma… (((98.71384 3.769471, 99…
##  5          7 Indonesia     3 indonesia-bangkab… Bang… (((105.3475 -1.84469, 10…
##  6          5 Indonesia     7 indonesia-papuaba… Papu… (((134.2333 -1.741944, 1…
##  7          4 Indonesia    12 indonesia-jawatim… Jawa… (((113.5921 -7.714859, 1…
##  8         10 Indonesia    13 indonesia-kaliman… Kali… (((108.9246 0.558608, 10…
##  9          3 Indonesia    14 indonesia-kaliman… Kali… (((114.5128 -3.54225, 11…
## 10          6 Indonesia    16 indonesia-kaliman… Kali… (((117.0173 -1.164211, 1…
## # … with 24 more rows


There are couple of features in the data read from geo-json file. Most peculiar feature is the geometry column which contains contains coordinates of each province.If we look at the class of the data read, it comes as sf object

class(indo_prov)
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"


Step 5: Plotting the shape file

leaflet(indo_prov) %>%
  addTiles() %>%
  addPolygons(fillColor = "blue",
              color = "#000000",
              weight = .5,
              fillOpacity = 0.5
              ,
              popup = ~glue::glue("<b>{state}</b><br>{state}")
  )


For the sake of this blog,we will create a column with random values and then add this column to the shape file.We will also add another column “Type” to indicate Low, Medium and High values.

Step 5: Adding a dummy column to the shape file

indo_prov2<-indo_prov%>%
  mutate(Random_Value = rnorm(nrow(indo_prov),5,2))

indo_prov2$Type <- ifelse(indo_prov2$Random_Value < quantile(indo_prov2$Random_Value,0.30),"Low",
                          ifelse(indo_prov2$Random_Value < quantile(indo_prov2$Random_Value,0.80),"Medium","High"))

head(indo_prov2)
## Simple feature collection with 6 features and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 95.01083 ymin: -8.203611 xmax: 135.2577 ymax: 5.906388
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 8
##   cartodb_id country    id_1 slug  state                  geometry Rando…¹ Type 
##        <int> <chr>     <int> <chr> <chr>        <MULTIPOLYGON [°]>   <dbl> <chr>
## 1         16 Indonesia     1 indo… Aceh  (((97.97681 4.627501, 98…    1.76 Low  
## 2          2 Indonesia    31 indo… Suma… (((99.17167 -1.502501, 9…    4.15 Low  
## 3          8 Indonesia    34 indo… Yogy… (((110.702 -8.185054, 11…    7.28 High 
## 4         20 Indonesia    33 indo… Suma… (((98.71384 3.769471, 99…    4.64 Medi…
## 5          7 Indonesia     3 indo… Bang… (((105.3475 -1.84469, 10…    6.23 Medi…
## 6          5 Indonesia     7 indo… Papu… (((134.2333 -1.741944, 1…    6.92 High 
## # … with abbreviated variable name ¹​Random_Value


Step 6: Plotting Random Value in the Map

Plotting the map based on Type column with categorical values

We will create three separate sections of addpolygon depending upon Low, Medium and High value in the Type column

leaflet(indo_prov2) %>%
  addTiles() %>%
  addPolygons(
    # fillColor = "blue",
              color = "#000000",
              weight = .5,
              fillOpacity = 0.5
              ,
              popup = ~glue::glue("<b>{state}</b><br>{state}")
  )%>%
  addPolygons(data=indo_prov2%>%
                filter(Type=="Low"),
              fillColor = "green",
              fill=T,
              weight = .5,
              popup = ~glue::glue("<b>{state}</b><br>{state}"))%>%
  addPolygons(data=indo_prov2%>%
                filter(Type=="Medium"),
              fillColor = "blue",
              fill=T,
              weight = .5,
              popup = ~glue::glue("<b>{state}</b><br>{state}"))%>%
   addPolygons(data=indo_prov2%>%
                filter(Type=="High"),
              fillColor = "red",
              fill=T,
              weight = .5,
              popup = ~glue::glue("<b>{state}</b><br>{state}"))

We can see that there are some region with red shades such as Sumatera Utara, Sulawesi Tengara, etc. We will now zoom in on these region using the city level Map


Step 6: Get the shape file for Indonesia City Level Map

Similar to the step 3 that we did for getting the province level Map, we will now download the city level Map for Indonesia.

Go to the github repository as shown below


Open and download the Indonesia cities json file as shown below


The file will be downloaded


City levl file is very big in size and hence I am unable to upload to my repository. Instead, I can point you to the repo I downloaded from. Please download it from here

Step 7: Read the geo-json file into R

We will now read the file into R

nm2_json<-"indonesia-cities.json"

indo_city <- read_sf(nm2_json )
indo_city
## Simple feature collection with 514 features and 8 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 95.01079 ymin: -11.00762 xmax: 141.0194 ymax: 6.07693
## Geodetic CRS:  WGS 84
## # A tibble: 514 × 9
##    Kind  Code  Name          Year  Source Parent Province Country  
##    <chr> <chr> <chr>         <chr> <chr>   <int> <chr>    <chr>    
##  1 City  1101  SIMEULUE      2015  BPS        11 ACEH     INDONESIA
##  2 City  1102  ACEH SINGKIL  2015  BPS        11 ACEH     INDONESIA
##  3 City  1103  ACEH SELATAN  2015  BPS        11 ACEH     INDONESIA
##  4 City  1104  ACEH TENGGARA 2015  BPS        11 ACEH     INDONESIA
##  5 City  1105  ACEH TIMUR    2015  BPS        11 ACEH     INDONESIA
##  6 City  1106  ACEH TENGAH   2015  BPS        11 ACEH     INDONESIA
##  7 City  1107  ACEH BARAT    2015  BPS        11 ACEH     INDONESIA
##  8 City  1108  ACEH BESAR    2015  BPS        11 ACEH     INDONESIA
##  9 City  1109  PIDIE         2015  BPS        11 ACEH     INDONESIA
## 10 City  1110  BIREUEN       2015  BPS        11 ACEH     INDONESIA
## # … with 504 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]>


Step 8: Plotting the shape file

leaflet(indo_city) %>%
  addTiles() %>%
  addPolygons(fillColor = "blue",
              color = "#000000",
              weight = .5,
              fillOpacity = 0.5
              ,
              popup = ~glue::glue("<b>{Name}</b><br>{Name}")
  )


Step 9: Plotting the cities of Bali

indo_city2<-indo_city%>%
  mutate(Random_Value = rnorm(nrow(indo_city),5,2))

indo_city2$Type <- ifelse(indo_city2$Random_Value < quantile(indo_city2$Random_Value,0.30),"Low",
                          ifelse(indo_city2$Random_Value < quantile(indo_city2$Random_Value,0.80),"Medium","High"))

head(indo_city2)
## Simple feature collection with 6 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 95.38908 ymin: 2.004084 xmax: 98.19707 ymax: 5.249793
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 11
##   Kind  Code  Name          Year  Source Parent Province Country  
##   <chr> <chr> <chr>         <chr> <chr>   <int> <chr>    <chr>    
## 1 City  1101  SIMEULUE      2015  BPS        11 ACEH     INDONESIA
## 2 City  1102  ACEH SINGKIL  2015  BPS        11 ACEH     INDONESIA
## 3 City  1103  ACEH SELATAN  2015  BPS        11 ACEH     INDONESIA
## 4 City  1104  ACEH TENGGARA 2015  BPS        11 ACEH     INDONESIA
## 5 City  1105  ACEH TIMUR    2015  BPS        11 ACEH     INDONESIA
## 6 City  1106  ACEH TENGAH   2015  BPS        11 ACEH     INDONESIA
## # … with 3 more variables: geometry <MULTIPOLYGON [°]>, Random_Value <dbl>,
## #   Type <chr>


color_palet<-colorNumeric(palette = "YlOrRd",domain = indo_city2%>%
               filter(Province=="BALI")%>%
               select(Random_Value)%>%
               pull(Random_Value))

Plotting the Map

indo_city_bali<-indo_city2%>%
                filter(Province=="BALI")

leaflet(indo_city_bali) %>%
  addTiles() %>%
  addPolygons(
    # fillColor = "blue",
              color = "#000000",
              weight = .5,
              fillOpacity = 0.5
              ,
              
  )%>%
  addPolygons(data=indo_city_bali,
              fillColor = ~color_palet(Random_Value),
              fill=T,
              weight = .5,
              popup = ~glue::glue("<b>{Name}</b><br>{Name}"),
              highlightOptions = highlightOptions(color = "black", 
                                                  weight = 2,
                                                  bringToFront = TRUE))


Concluding thoughts

The purpose of the blog was to provide a step by step guide on how to access publicly available geographical maps at various levels of granularity(province vs city) and then use them to analyse a metric such as population density. In the next few blogs we will look at how to further leverage maps along with certain demographic information to analyse various use cases.

Web Scraping Tutorial 2 - Getting the Avg Rating and Reviews Count

Web Scrapping Tutorial 2: Getting Overall rating and number of reviews ...