Thursday, May 23, 2024

Web Scraping Tutorial 3 - Getting Detailed Google Reviews, Star Rating and Time Stamp

Web Scrapping Tutorial 3: Scrolling Down and Expanding Reviews by Presssing More

Introduction

In this tutorial, we will look at how we can use Rselenium to :

  • Scroll Down the review page
  • Expand reviews by pressing More

We will also extract the text review along with time stamp and rating given

Step 0: Installing Libraries

package.name<-c("tidyverse","RSelenium")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}
Loading required package: tidyverse
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.0     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Loading required package: RSelenium


Step 1:Start a headless Firefox browser

The syntax for initiating a headless Firefox browser is shown below

driver <- rsDriver( 
  browser = c("firefox"), 
  chromever = NULL, 
  verbose = F, 
  extraCapabilities = list("firefoxOptions" = list(args = list("--headless"))) 
) 
web_driver <- driver[["client"]] 


Once I execute this, Firefox browser would pop up in the background as shown below.


Step 2: Navigate to the web page

Once you see the above page, we now have to go to the “patanjali store in powai” page in google maps. To do this, we will use the following lines of code

nm<-"patanjali store in powai "
ad_url<-str_c("https://www.google.co.id/maps/search/ ",nm)

# Now navigate to the URL.This is for the browser to go to that location
web_driver$navigate(ad_url)


Once I execute the above, Firefox browser would go to the Jumbo Vada Pav page.