Basics of Decision Tree
Parag Verma
9th April, 2021
Introduction to Decision Trees
Decision Tree is a Supervised Machine Learning algorithm which can be used for classification as well as regression problems.It uses a tree representation of the dataset at hand where each record/case is assigned to the node of a tree.The arrangement represents a tree like structure but with roots at the top.The tree is read from root to the leaf node which contains the classification of the instance
Simple Decision Tree
Lets consider a situation where we want to play tennis and we have a model to help us out here. Now depending on the weather condition, we will be using a decision tree to predict whether we can play or not.A simple illustration is represented below
- Tree classifies Sunday morning according to whether they are suitable for playing tennis
- Classification is made by traversing from root node to leaf node
- Each node represents an attribute
- Each branch descending from the node represents one of the possible values of the attribute
- Sunny, Overcast and Rain are the unique values for 'Outlook'
- If Outlook = Sunny,Humidity = High.This will be classified as a negative example
- The output expression are presented only with respect to yes cases:
- (Outlook = Sunny; Humidity = Normal )
- (Outlook = Overcast)
- (Outlook = Rain ; Wind = Weak)
ID3 Algorithm
Most implementations of decision trees are based on the ID3 algorithm.Salient features of the ID3 are:
- A decision tree is constructed from top to bottom
- Root node is determined by attribute that best splits the data
- A descendant of the root node is created using all possible values of the attribute and the entire process is repeated
- It is called greedy as it never goes back to the previous choices
Which attribute to pick ?
Attribute that is most useful for classifying examples should be used at the node.This is quantitatively determined by Information Gain criteria.To understand Information gain we should first study Entropy that measures purity of an arbitrary collection of examples
Entropy
To understand entropy,the following illustration should be kept in mind
- Given a sample S having some positive and negative examples Entropy is given by
- S = -(p+)(log2p+) - (p-)(log2p-)
- p+ represent proportion of +ve examples in S and p- represent proportion of -ve examples
- Suppose in a sample of 14 cases,there are 9 +ve and 5 –ve instances
- S = -(9/14) log2(9/14) - (5/14) log2 (5/14)
- S = 0.94
- S = 0 when all member belong to the same class
- S = 1 when the proportions are equal
- S is between 0 and 1 when sample contains unequal number of +ve and –ve cases
Information Gain
Measures the expected reduction in Entropy.Entropy measures the impurity in a sample. Higher the impurity, higher is the entropy of the system and conversely lower is the information gain.This is used by the algorithm to determine which attribute to pick for splitting.
As a general rule of thumb,If we partition the sample based on a attribute, then there should be reduction in entropy.The attribute that results in maximum reduction in entropy is used for splitting.
Gain (S,A) = Entropy (S) - ∑ (|Sv|/|S|)* (Entropy Sv) for all values in A(A is an attribute)
Dataset for our Analysis
The example that we discussed above about playing Tennis on a Sunday morning us based on the dataset shown below.There are a total of 9 instances where we can play tennis and there are 5 instances where we cant.