Sunday, February 23, 2020

Blog 13: Read and Write huge files in R

Read and Write Huge Dataset


Introduction

In this Blog my aim is to introduce a quick hack in R. We will look at how to read and write huge dataset from/to drive.There is a very powerful fread and fwrite function available in data.table library. It imports/export even dataset with millions of records quite easily. We will look at some inbuilt dataset in R and record time while doing so.

Installing the library: dplyr and tidyr

if(!require("dplyr")){
  
  install.packages("dplyr")
}else{
  
  library(dplyr)
}

if(!require("tidyr")){
  
  install.packages("tidyr")
}else{
  
  library(tidyr)
}


Importing the dataset

For this exercise we will look at the FARS dataset which is related to US Births in 1969 - 1988.It has 372864 records with 7 columns

# Ecdat library for importing the dataset

if(!require("mosaicData")){
  
  install.packages("mosaicData")
}else{
  
  library(mosaicData)
}

data(Birthdays)
df<-Birthdays
dim(df)
[1] 372864      7


Writing the Data set to drive

if(!require("data.table")){
  
  install.packages("data.table")
}else{
  
  library(data.table)
}

t1<-Sys.time()

data.table::fwrite(df,"dummy.csv",row.names=F)

Sys.time()-t1
Time difference of 0.02991986 secs

It took 0.3 secs to write the dataset


Reading the Data set to drive

t1<-Sys.time()

data.table::fread("dummy.csv")
        state year month day                 date wday births
     1:    AK 1969     1   1 1969-01-01T00:00:00Z  Wed     14
     2:    AL 1969     1   1 1969-01-01T00:00:00Z  Wed    174
     3:    AR 1969     1   1 1969-01-01T00:00:00Z  Wed     78
     4:    AZ 1969     1   1 1969-01-01T00:00:00Z  Wed     84
     5:    CA 1969     1   1 1969-01-01T00:00:00Z  Wed    824
    ---                                                      
372860:    VT 1988    12  31 1988-12-31T00:00:00Z  Sat     21
372861:    WA 1988    12  31 1988-12-31T00:00:00Z  Sat    157
372862:    WI 1988    12  31 1988-12-31T00:00:00Z  Sat    167
372863:    WV 1988    12  31 1988-12-31T00:00:00Z  Sat     45
372864:    WY 1988    12  31 1988-12-31T00:00:00Z  Sat     18
Sys.time()-t1
Time difference of 0.1196802 secs

It took around 0.32 secs

Final Comments

We saw an example where we can read as well write a file of decent size from/to the working directory.


Word Cloud using R

Word Cloud Using R Word Cloud Using R 2024-09-16 Introduction I...