# Introduction to Random Variable

#### Parag Verma

#### 8th November, 2021

Most things in this world have an element of uncertainty in it. For instance, look at the below examples

- Will BJP win the 2024 Lok Sabha election
- Will Roger Federer win another Grand Slam
- Will Covid end by 2022

We cant say with certainty because all of these have an element of randomness in it.Until and unless we attach some number with this randomness, it is difficult to answer these questions.Probability theory helps to quantify this randomness using mathematical equation.A Random Variable is something that aids this entire process.Before we dwell further, lets take a look at some basic things used very frequently.

Lets say I am observing the process of tossing a coin.There are certain things I can define that will help me understand the process better

**Outcome**Mutually exclusive results of tossing a coin(Heads and Tail)**Sample Space**Set of all the possible outcomes- Example :{H,T}

**Probability**Proportion of the times the Outcome occur in the long run**Event**Set of one or more outcomes- Example :All Heads when a coin is tossed twice
- Outcome in this case:{HH,HT,TH,TT}

**Defining a Random Variable**

In a process(like tossing a coin), there are a set of outcomes.For each outcome, there is an outcome probability. We use probability as the outcomes are random in nature.Random Variable(RV) is something that helps capture this probability using mathematical description.In short, a Random Variable is a numerical summary of an outcome(which is random in nature).Some examples of random outcomes are:

- Next toss of a coin
- India winning the cricket world cup
- Next Gold Medal for India in Olympic

**Types of Random Variable**

There are two types of Random Variable:

**Continuous**: Takes on a continuum of values.For example, time taken to commute from your home to office can be 20, 21,21.5,30 mins and so on**Discrete**: Takes on a discrete values.For example, gender of the next person you meet, runs scored by Virat Kohli, ICC Test Raking etc

For the next few sections, we will use the Discrete Random Variable as a base and try to understand certain key metrics associated with a random variable.A Discrete RV is easy to visualize and hence it would be a good candidate to consolidate our understanding

**Distribution of a Discrete RV**

Lets say that there is a RV variable **M** which denotes the number of times a computer crashes during online CAT Examination.For simplicity, lets assume that in a duration of 3 hours, it can crash anywhere between 0 and 4 times.Lets assume that the probability of the events where X takes values from 0 to 4 are as shown below

Outcome_M | Probability |
---|---|

0 | 0.50 |

1 | 0.20 |

2 | 0.15 |

3 | 0.10 |

4 | 0.05 |

The above table represents probability of your computer crashing M Times

Lets look at some of the various outcomes associated with the process

Pr(M=0) is 0.50

Pr(M=1) is 0.20

Probability that the computer crashes at most 1 time

Pr(M=0 or M=1) is 0.50 + 0.20

Probability that the computer crashes once or twice

Pr(M=1 or M=2) is 0.20 + 0.15

If we add the probability of occurrences of all the events then it becomes equal to 1

Pr(M=0) + Pr(M=1) + Pr(M=2) + Pr(M=3) + Pr(M=4) =1

Lets plot the events on X axis and probability of occurrence on Y axis

**Cumulative Probability Distribution of a Discrete RV**

It is defined as the probability that the RV is less than or equal to a particular value. Lets again look at the Probability distribution table

Outcome_M | Probability |
---|---|

0 | 0.50 |

1 | 0.20 |

2 | 0.15 |

3 | 0.10 |

4 | 0.05 |

Pr(M<=0) is 0.50

Pr(M <= 1 ) is 0.50 + 0.20

Pr(M <= 2 ) is 0.50 + 0.20 + 0.15

Pr(M <= 3 ) is 0.50 + 0.20 + 0.15 + 0.10

Pr(M <= 4 ) is 0.50 + 0.20 + 0.15 + 0.10 + 0.05

If we add the above values to the Probability distribution table, it will look something like this

Outcome_M | Probability | Cummulative_Probability |
---|---|---|

0 | 0.50 | 0.50 |

1 | 0.20 | 0.70 |

2 | 0.15 | 0.85 |

3 | 0.10 | 0.95 |

4 | 0.05 | 1.00 |

Plotting the cdf results in a curve that increase from lower values of the Event M and then eventually saturates at probability of 1

**Probability Distribution and Cumulative Probability Distribution of a Continuous RV**

The same explanation can be carried over to a continuous RV. The only difference is that for a continuous RV, the probability is not defined at a point but over an interval. The total probability between two points in an interval can be obtained by calculating the area under the probability distribution curve between the two points

**Link to Previous Blogs**

**My Youtube Channel**

**List of Datasets for Practise**

https://vincentarelbundock.github.io/Rdatasets/datasets.html