Saturday, November 19, 2022

Summarize Multiple Columns using across and everything functions

Summarise Multiple Columns


Introduction

Lets say we have an excel file with multiple columns for various weeks. In various scenarios, we are required to summarize these columns and report mean/sum/mode values.In this blog we will look at how to create such summaries using combination of across and everything functions

package.name<-c("dplyr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Creating a data frame

Lets create a dummy data frame with details of Product Sales across various weeks .

df<-data.frame(Item_Num = c(1:5),
               Product = c("A","A","A","B","B"),
               Week1_Sales = c(1:5),
               Week2_Sales = c(11:15)
               )
head(df)
  Item_Num Product Week1_Sales Week2_Sales
1        1       A           1          11
2        2       A           2          12
3        3       A           3          13
4        4       B           4          14
5        5       B           5          15

We need to created summary at a Product Level.So for each Product, we need mean sales for each of the three weeks.

Now lets try and get the summary using dplyr and a combination of across and everything functions.

Step 2: Summarizing information at Product Level

Lets try and get the total sales for Product A and B for the three weeks.

df2<-df%>%
  select(-Item_Num)%>%
  group_by(Product)%>%
  summarise(across(everything(),list(sum)))%>%
  ungroup()
`summarise()` ungrouping output (override with `.groups` argument)
df2
# A tibble: 2 x 3
  Product Week1_Sales_1 Week2_Sales_1
  <chr>           <int>         <int>
1 A                   6            36
2 B                   9            29


Now lets try and also get the mean of the sales

df2<-df%>%
  select(-Item_Num)%>%
  group_by(Product)%>%
  summarise(across(everything(),list(sum,mean)))%>%
  ungroup()
`summarise()` ungrouping output (override with `.groups` argument)
df2
# A tibble: 2 x 5
  Product Week1_Sales_1 Week1_Sales_2 Week2_Sales_1 Week2_Sales_2
  <chr>           <int>         <dbl>         <int>         <dbl>
1 A                   6           2              36          12  
2 B                   9           4.5            29          14.5


The output summary has _1(represent sum which is the first argument in the list function) and _2(represent mean which is the second argument in the list function) extensions. The column names requires some clean up.


Cleaning up ‘1’ suffixes

nm<-colnames(df2)
nm2<-gsub("_1","_Sum",nm)
nm2
[1] "Product"         "Week1_Sales_Sum" "Week1_Sales_2"   "Week2_Sales_Sum"
[5] "Week2_Sales_2"  


Cleaning up ‘2’ suffixes

nm3<-gsub("_2","_Mean",nm2)
nm3
[1] "Product"          "Week1_Sales_Sum"  "Week1_Sales_Mean" "Week2_Sales_Sum" 
[5] "Week2_Sales_Mean"


Assigning nm3 to the colnames for df2

colnames(df2)<-nm3
head(df2)
# A tibble: 2 x 5
  Product Week1_Sales_Sum Week1_Sales_Mean Week2_Sales_Sum Week2_Sales_Mean
  <chr>             <int>            <dbl>           <int>            <dbl>
1 A                     6              2                36             12  
2 B                     9              4.5              29             14.5


My Youtube Channel

Web Scraping Tutorial 2 - Getting the Avg Rating and Reviews Count

Web Scrapping Tutorial 2: Getting Overall rating and number of reviews ...