Sunday, October 16, 2022

Market Mix Modelling in R

Market Mix Modelling


Introduction

Once a company manufactures its product, it needs a sound channel to push it for sales. This includes making customers aware of the product. This in turn is done through two broad types of Marketing Channels:

  • Direct Marketing : Includes promotional channels such as mails,newsletter, ads,websites,etc
  • Indirect Marketing : Includes Social Media, Referrals, Live Events, etc

A company uses both these strategies to market their products.Now all these marketing programs incur cost and hence it is very important to understand how effective all the channels in increasing customer sales. Based on how much each marketing tactic influence sales, the spend can then be optimized.

Some commonly used tactics for Direct and Indirect Marketing is shown below


Step 0: Importing the libraries

In this blog, we will take a sample marketing from datarium package and leverage adstock modelling to attribute product sales to different marketing acitivty.

If you unable to run the below piece of code, then you can import the dataset form my github repository.

package.name<-c("dplyr","tidyr","datarium","stats",
                "ggplot2","plotly","corrplot","ggcorrplot","RColorBrewer")

for(i in package.name){

  if(!require(i,character.only = T)){

    install.packages(i)
  }
  library(i,character.only = T)

}


Step 1: Importing the dataset

data("marketing")
df<-marketing%>%
  select(sales,everything())
head(df)
  sales youtube facebook newspaper
1 26.52  276.12    45.36     83.04
2 12.48   53.40    47.16     54.12
3 11.16   20.64    55.08     83.16
4 22.20  181.80    49.56     70.20
5 15.48  216.96    12.96     70.08
6  8.64   10.44    58.68     90.00

The attributes are as follows:

  • youtube :Advertising budget in thousand dollars
  • facebook :Advertising budget in thousand dollars
  • newspaper :Advertising budget in thousand dollars
  • sales :Sales figures in thousand dollars

Lets assume that these are the data points for the last 200 weeks for a small market.


Step 2: Correlation Between Variables

Since we are trying to find the impact of various marketing channels on sales, lets start by looking at the correlation between variables

cor_matrix <-round(cor(df),2)
cor_matrix
          sales youtube facebook newspaper
sales      1.00    0.78     0.58      0.23
youtube    0.78    1.00     0.05      0.06
facebook   0.58    0.05     1.00      0.35
newspaper  0.23    0.06     0.35      1.00


ggcorrplot(cor_matrix, hc.order = TRUE, type = "lower",
           lab = TRUE,insig = "blank")

We can see that

  • Sales has a high correlation with youtube spend
  • Sales has a medium correlation with facebook spend
  • Sales has a low correlation with newspaper spend

Since correlation between facebook ,youtube and newspaper spend is very low, hence we can just rule out the issue of multi-collinearity.

Now lets convert the marketing expenses into adstock variables.


Step 3: Defining Adstock Function

Any marketing activity has an impact on its end customer and this effect decays with time. Some channels like TV have a higher impact and its impact decay gradually whereas the impact of other channel such as YouTube ads decays very rapidly.

Our data has three marketing channels:

  • facebook: ads on facebook are very limited and hence we will assume it to have a rapid decline rate
  • youtube: ads on youtube are in the form of colorful video and hence we will assume it to have a moderate decline rate
  • newspaper: ads on newspaper are in the form of front page displays and hence have a relatively higher retention rate

We will now create the decay rate for each of these three channels


Step 3A: Defining Adstock Rate for facebook ads

In modelling adstock, we will also assume that the effect of ad exposure decays is in the form of a moving average.Hence it will be important to define till what past periods we would want to consider in the moving average term.

Lets say, if we are taking a decay rate of 0.1 for fb and if the spend for three consecutive periods are as follows

  • Period 1: 10
  • Period 2: 15
  • Period 3: 20

Then Adstock for Period 3 will be calculated as: 10* 0.1^2 + 15* 0.1 + 20, which will be equal to 21.6

decay_rate_fb <- 0.1
past_memory <- 2
get_adstock_fb <- rep(decay_rate_fb, past_memory+1) ^ c(0:past_memory)
get_adstock_fb
[1] 1.00 0.10 0.01

In short, the effect decays to 10% in the subsequent period and then becomes 1% in the period after that.


Lets look at the first few records of the facebook coluumn and try to come up with transformed variable

df[["facebook"]][1:10]
 [1] 45.36 47.16 55.08 49.56 12.96 58.68 39.36 23.52  2.52  3.12


Lets create the third term

45.36* 0.01 + 47.16*0.1 + 55.08 which gives 60.2496

We will check and see if we get the same through adstock transformation

ads_fb <- stats::filter(c(rep(0, past_memory), df[["facebook"]]), 
                        filter = get_adstock_fb,
                        method="convolution")
ads_fb <- ads_fb[!is.na(ads_fb)] # Removing leading NA
ads_fb[1:5]
[1] 45.3600 51.6960 60.2496 55.5396 18.4668

We have padded the dataset with two zeroes(rep(0,past_memory)),so that we get a valid term for the first facebook expense which from our data is 45.35.Upon adding the two zeroes, the Moving Average term will be = 0.01* 0 + 0.1* 0 + 45.36.If we dont do a zero padding, then the moving average term for the first term will be NA as there will be no past record for the first instance of facebook spend.

Now we can check the third term and it matches our calculation which is 60.2496


Lets plot the facebook adstock

fb_df<-data.frame(Week=1:nrow(df),
                  Fb_Spend=df[["facebook"]],
                  Fb_Adstock=ads_fb)


head(fb_df)
  Week Fb_Spend Fb_Adstock
1    1    45.36    45.3600
2    2    47.16    51.6960
3    3    55.08    60.2496
4    4    49.56    55.5396
5    5    12.96    18.4668
6    6    58.68    60.4716


p1<-ggplot(data = fb_df, aes(x=Week, y=Fb_Spend)) +
  geom_segment( aes(xend=Week, yend=0),color="blue") +
  geom_line(aes(y = Fb_Adstock, colour = "red"),
            size = 1) + 
  xlab("Week") + ylab("Facebook Adstock")+
  theme(text = element_text(size=15),
        axis.text.x=element_text(size=15),
        axis.text.y=element_text(size=15))
  
p1


The segments in blue represent the original spend whereas the ones in red represents adstock transformed spend.

We will repeat the above transformation for youtube and newspaper ads.


Step 3B: Defining Adstock Rate for YouTube ads

We will assume that Youtube ads decays at a rate less than facebook anis equal to 0.15

decay_rate_yt <- 0.15
past_memory <- 2
get_adstock_yt <- rep(decay_rate_yt, past_memory+1) ^ c(0:past_memory)
get_adstock_yt
[1] 1.0000 0.1500 0.0225


ads_yt <- stats::filter(c(rep(0, past_memory), df[["youtube"]]), 
                        filter = get_adstock_yt,
                        method="convolution")
ads_yt <- ads_yt[!is.na(ads_yt)] # Removing leading NA
ads_yt[1:5]
[1] 276.1200  94.8180  34.8627 186.0975 244.6944


Lets plot the YouTube adstock

yt_df<-data.frame(Week=1:nrow(df),
                  Yt_Spend=df[["youtube"]],
                  Yt_Adstock=ads_yt)


head(yt_df)
  Week Yt_Spend Yt_Adstock
1    1   276.12   276.1200
2    2    53.40    94.8180
3    3    20.64    34.8627
4    4   181.80   186.0975
5    5   216.96   244.6944
6    6    10.44    47.0745


p2<-ggplot(data = yt_df, aes(x=Week, y=Yt_Spend)) +
  geom_segment( aes(xend=Week, yend=0),color="blue") +
  geom_line(aes(y = Yt_Adstock, colour = "red"),
            size = 1) + 
  xlab("Week") + ylab("Youtube Adstock")+
  theme(text = element_text(size=15),
        axis.text.x=element_text(size=15),
        axis.text.y=element_text(size=15))
  

p2


The segments in blue represent the original spend whereas the ones in red represents adstock transformed spend.