Introduction to Random Variable
Parag Verma
8th November, 2021
Most things in this world have an element of uncertainty in it. For instance, look at the below examples
- Will BJP win the 2024 Lok Sabha election
- Will Roger Federer win another Grand Slam
- Will Covid end by 2022
We cant say with certainty because all of these have an element of randomness in it.Until and unless we attach some number with this randomness, it is difficult to answer these questions.Probability theory helps to quantify this randomness using mathematical equation.A Random Variable is something that aids this entire process.Before we dwell further, lets take a look at some basic things used very frequently.
Lets say I am observing the process of tossing a coin.There are certain things I can define that will help me understand the process better
- Outcome Mutually exclusive results of tossing a coin(Heads and Tail)
- Sample Space Set of all the possible outcomes
- Example :{H,T}
- Probability Proportion of the times the Outcome occur in the long run
- Event Set of one or more outcomes
- Example :All Heads when a coin is tossed twice
- Outcome in this case:{HH,HT,TH,TT}
Defining a Random Variable
In a process(like tossing a coin), there are a set of outcomes.For each outcome, there is an outcome probability. We use probability as the outcomes are random in nature.Random Variable(RV) is something that helps capture this probability using mathematical description.In short, a Random Variable is a numerical summary of an outcome(which is random in nature).Some examples of random outcomes are:
- Next toss of a coin
- India winning the cricket world cup
- Next Gold Medal for India in Olympic
Types of Random Variable
There are two types of Random Variable:
- Continuous : Takes on a continuum of values.For example, time taken to commute from your home to office can be 20, 21,21.5,30 mins and so on
- Discrete : Takes on a discrete values.For example, gender of the next person you meet, runs scored by Virat Kohli, ICC Test Raking etc
For the next few sections, we will use the Discrete Random Variable as a base and try to understand certain key metrics associated with a random variable.A Discrete RV is easy to visualize and hence it would be a good candidate to consolidate our understanding
Distribution of a Discrete RV
Lets say that there is a RV variable M which denotes the number of times a computer crashes during online CAT Examination.For simplicity, lets assume that in a duration of 3 hours, it can crash anywhere between 0 and 4 times.Lets assume that the probability of the events where X takes values from 0 to 4 are as shown below
Outcome_M | Probability |
---|---|
0 | 0.50 |
1 | 0.20 |
2 | 0.15 |
3 | 0.10 |
4 | 0.05 |
The above table represents probability of your computer crashing M Times
Lets look at some of the various outcomes associated with the process
Pr(M=0) is 0.50
Pr(M=1) is 0.20
Probability that the computer crashes at most 1 time
Pr(M=0 or M=1) is 0.50 + 0.20
Probability that the computer crashes once or twice
Pr(M=1 or M=2) is 0.20 + 0.15
If we add the probability of occurrences of all the events then it becomes equal to 1
Pr(M=0) + Pr(M=1) + Pr(M=2) + Pr(M=3) + Pr(M=4) =1
Lets plot the events on X axis and probability of occurrence on Y axis
Cumulative Probability Distribution of a Discrete RV
It is defined as the probability that the RV is less than or equal to a particular value. Lets again look at the Probability distribution table
Outcome_M | Probability |
---|---|
0 | 0.50 |
1 | 0.20 |
2 | 0.15 |
3 | 0.10 |
4 | 0.05 |
Pr(M<=0) is 0.50
Pr(M <= 1 ) is 0.50 + 0.20
Pr(M <= 2 ) is 0.50 + 0.20 + 0.15
Pr(M <= 3 ) is 0.50 + 0.20 + 0.15 + 0.10
Pr(M <= 4 ) is 0.50 + 0.20 + 0.15 + 0.10 + 0.05
If we add the above values to the Probability distribution table, it will look something like this
Outcome_M | Probability | Cummulative_Probability |
---|---|---|
0 | 0.50 | 0.50 |
1 | 0.20 | 0.70 |
2 | 0.15 | 0.85 |
3 | 0.10 | 0.95 |
4 | 0.05 | 1.00 |
Plotting the cdf results in a curve that increase from lower values of the Event M and then eventually saturates at probability of 1
Probability Distribution and Cumulative Probability Distribution of a Continuous RV
The same explanation can be carried over to a continuous RV. The only difference is that for a continuous RV, the probability is not defined at a point but over an interval. The total probability between two points in an interval can be obtained by calculating the area under the probability distribution curve between the two points
Link to Previous Blogs
My Youtube Channel
List of Datasets for Practise
https://vincentarelbundock.github.io/Rdatasets/datasets.html