347 647 9001+1 714 797 8196Request a Call
Call Me

Decision Trees – Handling Overfitting using Forests

June 11, 2015
, , , , ,

In previous posts, we discussed method of construction and scoring of Decision Tree models. We also noted that despite their many advantages, they are also prone to overfitting and sampling errors. In this post in the series, we will talk about one of the state-of-the-art methods of overcoming some of the limitations of Decision Trees.

But first, a small probability puzzle.

Let’s say that you are Police Inspector interested in solving a murder mystery. You have a crude lie-detector, which, as is case with most lie detectors, is not fool-proof. In fact, it’s barely better than random with accuracy of 65%. Fig. 1 shows how likely you are to make right decision if you rely on this tool. You want better tool, but you cannot afford one.

Figure 1

probability example

Thankfully, a newfangled business analyst friend of yours suggested that you repeatedly test the suspect, and then make decision based on majority of results. So if lie-detector indicates that suspect is lying two or more out of three times, then you can have more confidence in results. This sounds do-able, and you happily do the tests and almost feel you’ve solved the case – but – your analyst friend calls and tells you that he made a mistake! (Can you figure out what?)

He says that his conclusion is only valid if each test of lie-detector is independent of other – which may not be the case since it is the same equipment. You feel dejected, but realize, that you can easily borrow other lie-detectors from nearby police stations. While each is as good (or bad) as your, but at least those will be independent assessment. You ask your friend why combining many bad tools will make a good tool? How good? (What do you think?)

Your friend quickly runs some numbers, and presents you with Fig. 2 showing accuracy rate if you use 3 tests on 3 lie-detectors.

Figure 2

While the gain is marginal, from 65% to 72%, you quickly realize the possibilities!

That – coming back to Decision Trees – in a nutshell, is forest of Decision Trees. These types of models are called ensemble models or bagging models because they rely on many-many not-so-good models to combine and make one very-good model. A random forest model is combination of hundreds of Decision Trees – each imperfect in its own way, probably overfitted, probably prone to random sampling – and yet collectively improving overall accuracy significantly. While a forest can be of Decision Trees, you can see that concept can apply to any other type of model too. You may have ensemble of linear regression models or neural network models or what not.

Since we’d prefer each model is different and independent from other models in ensemble, we can achieve this by following two methods:

1. Different training data for each model – In practice, whole development data is first split into train and test set, and then from within train set, multiple small random samples are taken out. One of the common ways is to sample N observations from train set of N observations with replacement, effectively sampling only about 2/3rd of unique observations and some observations more than once. This is called boosting, since it boosts weight of some observations which are sampled more than once. If N is sufficiently large, then smaller sub-sample can be selected, without replacement, hence boosting is not crucial to ensemble models.

2. Different feature set for each model – Apart from sampling observations, another way to introduce variety in models is to limit set of feature variables for each model. Small random sample of all variables is only made available per model.

Since we rely on majority vote of ensemble of trees to provide robust modeling output, we don’t have to worry about overfitting of each model. In fact, overfitting is desired property of models in ensemble, since belief is that combination will take care of generalization. And this has been proven mathematically as well (by Leo Breiman), which is beyond scope of this blog post.

While more trees in the forest, the better model will be, we need to allow for practical run-time and scoring-time constraints as well. Following decisions have to be made when building an ensemble of Decision Trees:

1. Number of trees to build – depending on accuracy desired, hardware and runtime constraints, 100-1000 is typical number

2. When to stop growing each individual tree – while overfitting is not a concern, a criteria is still required – usually minimum number of observations in leaf is good criteria

3. Number of observations to sample for each tree – default is same as number in train data

4. With or without replacement sampling or observations for each tree – usual practice is with replacement

5. Number of features to sample for each tree – rule of thumb is about square-root of total number of features in train data, but there is no theoretical best

Ensemble models are really good at handling many disadvantages of Decision Trees, while still retaining many advantages. However, nothing in life is free! Ensemble models lose out on one key benefit of Decision Trees – interpretability and ease of implementation. Since outcome is vote among hundreds of trees, clear IF-THEN-ELSE intuition is lost. Also, it’s not easy to even determine which of the variables have high predictive power.

In next and last post in this series we will talk about different kind of ensemble models. We will also briefly touch upon using decision trees for regression problems which we’ve skipped so far.


About the Author

Ashish Gupa has over 8 years' experience in analytics, machine learning, and business consulting, spanning banking and retail domains, and Asia and Americas geographies


Global Association of Risk Professionals, Inc. (GARP®) does not endorse, promote, review or warrant the accuracy of the products or services offered by EduPristine for FRM® related information, nor does it endorse any pass rates claimed by the provider. Further, GARP® is not responsible for any fees or costs paid by the user to EduPristine nor is GARP® responsible for any fees or costs of any person or entity providing any services to EduPristine Study Program. FRM®, GARP® and Global Association of Risk Professionals®, are trademarks owned by the Global Association of Risk Professionals, Inc

CFA Institute does not endorse, promote, or warrant the accuracy or quality of the products or services offered by EduPristine. CFA Institute, CFA®, Claritas® and Chartered Financial Analyst® are trademarks owned by CFA Institute.

Utmost care has been taken to ensure that there is no copyright violation or infringement in any of our content. Still, in case you feel that there is any copyright violation of any kind please send a mail to and we will rectify it.

Popular Blogs: Whatsapp Revenue Model | CFA vs CPA | CMA vs CPA | ACCA vs CPA | CFA vs FRM

Post ID = 76925