Tuesday, December 31, 2019

Blog 7- Creating Functions in R

Creating Functions in R

Creating Functions in R


Functions

Functions are one of the most widely used concepts in programming paradigm. Functions ,also called module, can help us to automate a repetitive task and reduce time and effort. In this Blog, we will look at how to create functions in R. We will go form simple examples to complex ones and also discuss …(three dot notation) concept



Basic syntax of a function

myfunction <- function(arg1, arg2, ... ){
  statements
  return(object)
}


Function to calculate Product of two numbers

func_product<-function(x,y){
  
  z<-x*y
  return(z)
  
}

func_product(2,3)
[1] 6


Function to calculate Product of n numbers

Using 3 dot notation(…) we will be able to create functions that can accept any number of arguments without having to explicitly mention them while defining the function itself
We will also be installing purrr package as it will be required to check if a list is empty or not

if(!require("purrr")){
  
  install.packages("purrr")
}else{
  
  library(purrr)
}

# Declaring the function
func_product_n<-function(x,y,...){
  
  z<-x*y
  
# Storing elements other than x and y
params<-list(...)

# Checking if params is empty
if(is_empty(params)){

  temp<-1

                    }else{

                            for(i in params){

                              z<-z*i

                                             }
                          }# End of If statement


  return(z)
  
                                  }

func_product_n(1,2,3,4,5)
[1] 120

This is a very simple yet powerful example to show how we can pass dynamic arguments in R. It is similar to ‘args’ used extensively in python

Lets see some more examples


Write a function to remove NA from a vector

s<-c(1,2,3,4,NA,4,5,6,7)
func_Remove_NA<-function(x){
  
  pos<-!is.na(x)
  y<-x[pos]
  return(y)
  
}

func_Remove_NA(s)
[1] 1 2 3 4 4 5 6 7


Write a function to replace NA with 0 within a vector

s<-c(1,2,3,4,NA,4,5,6,7)
func_Replace_NA<-function(x){
  
  pos<-is.na(x)
  x[pos]<-0
  return(x)
  
}

func_Replace_NA(s)
[1] 1 2 3 4 0 4 5 6 7


Lets create a function that read all the csv files from a folder

 read.csv.files<-function(x){
   
   
  # Getting all the csv files within the folder
  input.file.type<-c(".csv")
  csv.files<- list.files(x,pattern=input.file.type)

    # Reading a csv files into a list
  l1<-list()
  
  for(i in csv.files){

      l1[[i]]<-read.csv(paste0(x,"\\",i),stringsAsFactors=F)

  }
  
  print(paste0("The names of the csv files imported form the folder are:"))
  print(csv.files)
  
  return(l1)
 }

Here the argument supplied to the function will be the folder path.Lets use this to read all the csv files from the ‘Parag’ folder within Documents main folder other

folder_path<-paste0(getwd(),"\\Parag")
l2<-read.csv.files(folder_path)
[1] "The names of the csv files imported form the folder are:"
[1] "file2.csv"

Lets check one of the csv files

names(l2)[1]
[1] "file2.csv"

names(l2)[1] gives the name of the first csv file stored in l2.Lets read it into a data frame element df2

nms<-names(l2)[1]
df2<-l2[[nms]]
head(df2)
                                                 No              Comment Type
0                                he said:wonderful.                    A     
1 The problem is: reading table, and also a problem  yes. keep going on.    A


Creating a function to standardise file names

As we can see that the names of the files have certain numbers apart from the text. For instance, in ‘CLI_16572948.csv’, there is text separated from numbers by a ’_‘. These numbers can be related to a time stamp and are appended to file names when the files are generated by a ’job’(normally in a production environment). We need to to remove them to do any further analysis

if(!require("stringr")){
  
  install.packages("stringr")
}else{
  
  library(stringr)
}


function_standardise_names<-function(x){

ls.names<-list()

for(i in names(l2)){
  
  new.nms<-str_split(i,"_")[[1]][1]
  ls.names[[new.nms]]<-new.nms
  
}
return(names(ls.names))


}
correctd.nms<-function_standardise_names(l2)
correctd.nms
[1] "file2.csv"

Assigning correctd.nms to l2

names(l2)<-correctd.nms
names(l2)
[1] "file2.csv"


Final Comments

We saw that it becomes necessary to create functions in order to perform repetitive task. It not only saves time but also lends certain structure to the entire code. We also went through examples where …(3 dot notation) notation was used to make the function more dynamic. Understanding of the functions is very important as in order to create a library in R, we need to create function first. Library is created when we want to share our results


Web Scraping Tutorial 2 - Getting the Avg Rating and Reviews Count

Web Scrapping Tutorial 2: Getting Overall rating and number of reviews ...