Monday, November 8, 2021

Introduction to Probability and Statistics-L1

Introduction to Random Variable


Most things in this world have an element of uncertainty in it. For instance, look at the below examples

  • Will BJP win the 2024 Lok Sabha election
  • Will Roger Federer win another Grand Slam
  • Will Covid end by 2022

We cant say with certainty because all of these have an element of randomness in it.Until and unless we attach some number with this randomness, it is difficult to answer these questions.Probability theory helps to quantify this randomness using mathematical equation.A Random Variable is something that aids this entire process.Before we dwell further, lets take a look at some basic things used very frequently.

Lets say I am observing the process of tossing a coin.There are certain things I can define that will help me understand the process better

  • Outcome Mutually exclusive results of tossing a coin(Heads and Tail)
  • Sample Space Set of all the possible outcomes
    • Example :{H,T}
  • Probability Proportion of the times the Outcome occur in the long run
  • Event Set of one or more outcomes
    • Example :All Heads when a coin is tossed twice
    • Outcome in this case:{HH,HT,TH,TT}

Defining a Random Variable

In a process(like tossing a coin), there are a set of outcomes.For each outcome, there is an outcome probability. We use probability as the outcomes are random in nature.Random Variable(RV) is something that helps capture this probability using mathematical description.In short, a Random Variable is a numerical summary of an outcome(which is random in nature).Some examples of random outcomes are:

  • Next toss of a coin
  • India winning the cricket world cup
  • Next Gold Medal for India in Olympic


Types of Random Variable

There are two types of Random Variable:

  • Continuous : Takes on a continuum of values.For example, time taken to commute from your home to office can be 20, 21,21.5,30 mins and so on
  • Discrete : Takes on a discrete values.For example, gender of the next person you meet, runs scored by Virat Kohli, ICC Test Raking etc

For the next few sections, we will use the Discrete Random Variable as a base and try to understand certain key metrics associated with a random variable.A Discrete RV is easy to visualize and hence it would be a good candidate to consolidate our understanding

Distribution of a Discrete RV

Lets say that there is a RV variable M which denotes the number of times a computer crashes during online CAT Examination.For simplicity, lets assume that in a duration of 3 hours, it can crash anywhere between 0 and 4 times.Lets assume that the probability of the events where X takes values from 0 to 4 are as shown below


Outcome_M Probability
0 0.50
1 0.20
2 0.15
3 0.10
4 0.05


The above table represents probability of your computer crashing M Times

Lets look at some of the various outcomes associated with the process

Pr(M=0) is 0.50
Pr(M=1) is 0.20


Probability that the computer crashes at most 1 time
Pr(M=0 or M=1) is 0.50 + 0.20


Probability that the computer crashes once or twice
Pr(M=1 or M=2) is 0.20 + 0.15


If we add the probability of occurrences of all the events then it becomes equal to 1
Pr(M=0) + Pr(M=1) + Pr(M=2) + Pr(M=3) + Pr(M=4) =1


Lets plot the events on X axis and probability of occurrence on Y axis


Cumulative Probability Distribution of a Discrete RV

It is defined as the probability that the RV is less than or equal to a particular value. Lets again look at the Probability distribution table


Outcome_M Probability
0 0.50
1 0.20
2 0.15
3 0.10
4 0.05


Pr(M<=0) is 0.50

Pr(M <= 1 ) is 0.50 + 0.20

Pr(M <= 2 ) is 0.50 + 0.20 + 0.15

Pr(M <= 3 ) is 0.50 + 0.20 + 0.15 + 0.10

Pr(M <= 4 ) is 0.50 + 0.20 + 0.15 + 0.10 + 0.05

If we add the above values to the Probability distribution table, it will look something like this

Outcome_M Probability Cummulative_Probability
0 0.50 0.50
1 0.20 0.70
2 0.15 0.85
3 0.10 0.95
4 0.05 1.00


Plotting the cdf results in a curve that increase from lower values of the Event M and then eventually saturates at probability of 1

Probability Distribution and Cumulative Probability Distribution of a Continuous RV

The same explanation can be carried over to a continuous RV. The only difference is that for a continuous RV, the probability is not defined at a point but over an interval. The total probability between two points in an interval can be obtained by calculating the area under the probability distribution curve between the two points


My Youtube Channel

Web Scraping Tutorial 2 - Getting the Avg Rating and Reviews Count

Web Scrapping Tutorial 2: Getting Overall rating and number of reviews ...