Web Scrapping Tutorial 2: Getting Overall rating and number of reviews
In the first tutorial, we looked at how we can use Rselenium to extract the contents from the web. We specifically looked at how to leverage xpath from a web element(such as store name) to scrape information from google reviews. We looked at the following functions to extract data:
- web_driver$navigate(l1)
- web_driver$findElements
- getElementAttribute(“href”)
- web_driver\(findElements(using = "xpath", value = nm)[[1]]\)getElementText()
Moving on, in this blog we would understand how to extract the average google ratings and total number of reviews given for each store (from previous examples)
Step 0: How would Rselnium do web scraping for these two stores
We would use the following steps to get the information
- Start a headless browser
- Navigate to the google map page(shown above)
- Get the url(links) of each of these stores
- Navigate on each of these links
- Get the xpath for the store name and address
- For each of the xpaths(names and address), get the element sitting at these locations
So as the first step, we will start a headless browser.Firefox works fine in my system so I would go with Firefox browser. Before this, lets import the required libraries
for(i in package.name){
if(!require(i,character.only = T)){
library(i,character.only = T)
Step 1:Start a headless Firefox browser
The syntax for initiating a headless Firefox browser is shown below
driver <- rsDriver(
browser = c("firefox"),
chromever = NULL,
verbose = F,
extraCapabilities = list("firefoxOptions" = list(args = list("--headless")))
web_driver <- driver[["client"]]
Once I execute this, Firefox browser would pop up in the
background as shown below.