Summarise Multiple Columns

Introduction

Lets say we have an excel file with multiple columns for various weeks. In various scenarios, we are required to summarize these columns and report mean/sum/mode values.In this blog we will look at how to create such summaries using combination of across and everything functions

package.name<-c("dplyr")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}

Step 1: Creating a data frame

Lets create a dummy data frame with details of Product Sales across various weeks .

df<-data.frame(Item_Num = c(1:5),
               Product = c("A","A","A","B","B"),
               Week1_Sales = c(1:5),
               Week2_Sales = c(11:15)
               )
head(df)

  Item_Num Product Week1_Sales Week2_Sales
1        1       A           1          11
2        2       A           2          12
3        3       A           3          13
4        4       B           4          14
5        5       B           5          15

We need to created summary at a Product Level.So for each Product, we need mean sales for each of the three weeks.

Now lets try and get the summary using dplyr and a combination of across and everything functions.

Step 2: Summarizing information at Product Level

Lets try and get the total sales for Product A and B for the three weeks.

df2<-df%>%
  select(-Item_Num)%>%
  group_by(Product)%>%
  summarise(across(everything(),list(sum)))%>%
  ungroup()

`summarise()` ungrouping output (override with `.groups` argument)

df2

# A tibble: 2 x 3
  Product Week1_Sales_1 Week2_Sales_1
  <chr>           <int>         <int>
1 A                   6            36
2 B                   9            29

Now lets try and also get the mean of the sales

df2<-df%>%
  select(-Item_Num)%>%
  group_by(Product)%>%
  summarise(across(everything(),list(sum,mean)))%>%
  ungroup()

`summarise()` ungrouping output (override with `.groups` argument)

df2

# A tibble: 2 x 5
  Product Week1_Sales_1 Week1_Sales_2 Week2_Sales_1 Week2_Sales_2
  <chr>           <int>         <dbl>         <int>         <dbl>
1 A                   6           2              36          12  
2 B                   9           4.5            29          14.5

The output summary has _1(represent sum which is the first argument in the list function) and _2(represent mean which is the second argument in the list function) extensions. The column names requires some clean up.

Cleaning up ‘1’ suffixes

nm<-colnames(df2)
nm2<-gsub("_1","_Sum",nm)
nm2

[1] "Product"         "Week1_Sales_Sum" "Week1_Sales_2"   "Week2_Sales_Sum"
[5] "Week2_Sales_2"

Cleaning up ‘2’ suffixes

nm3<-gsub("_2","_Mean",nm2)
nm3

[1] "Product"          "Week1_Sales_Sum"  "Week1_Sales_Mean" "Week2_Sales_Sum" 
[5] "Week2_Sales_Mean"

Assigning nm3 to the colnames for df2

colnames(df2)<-nm3
head(df2)

# A tibble: 2 x 5
  Product Week1_Sales_Sum Week1_Sales_Mean Week2_Sales_Sum Week2_Sales_Mean
  <chr>             <int>            <dbl>           <int>            <dbl>
1 A                     6              2                36             12  
2 B                     9              4.5              29             14.5

Link to Previous Blogs

https://www.aimlmadeeasy.com

My Youtube Channel

Link

Machine Learning Made Easy

Saturday, November 19, 2022

Summarize Multiple Columns using across and everything functions