November 28, 2014
Decision trees are simple and powerful types of multiple variable analysis.
Decision trees are produced by algorithms that identify various ways of splitting a data set into branch-like segments. These segments form an inverted
decision tree that originates with a root node at the top of the tree. The object of analysis is reflected in this root node as a simple, one-dimensional
display in the decision tree interface. The name of the field of data that is the object of analysis is usually displayed, along with the spread or
distribution of the values that are contained in that field. A sample decision tree is illustrated in Figure 1.1,
Algorithms that are capable of identifying various ways of splitting data sets into branch like segments produce Decision trees. These segments form a tree
that originates with a root at the top. The object of the analysis is reflected in this root node as a simple, one dimensional display in the decision tree
interface. The name of the field of data that is the object of analysis is displayed along with the spread or distribution of the values that are contained
in that field.
The above figure illustrates a sample decision tree, this depicts that decision tree can reflect both continuous and categorical object of analysis. All
the data set records, fields and field values that are observed in the object of analysis are reflected in the display of the node. The discovery of the
decision rule to form the branches or segments underneath the root node is based on a method that extracts the relationship between the object of analysis
(that serves as the target field in the data) and one or more fields that serve as input fields to create the branches or segments. The values in the input
field are used to estimate the likely value in the target field. The target field is also called an outcome, response, or dependent field or variable.
The general form of this modeling approach is illustrated in Figure 1.1. Once the relationship is extracted, then one or more decision rules can be derived
that describe the relationships between inputs and targets. Rules can be selected and used to display the decision tree, which provides a means to visually
examine and describe the tree-like network of relationships that characterize the input and target values. Decision rules can predict the values of new or
unseen observations that contain values for the inputs, but might not contain values for the targets.
Each rule assigns a record or observation from the data set to a node in a branch or segment based on the value of one of the fields or columns in the data
set.1 Fields or columns that are used to create the rule are called inputs. Splitting rules are applied one after another, resulting in a
hierarchy of branches within branches that produces the characteristic inverted decision tree form. The nested hierarchy of branches is called a decision tree, and each segment or branch is called a node. A node with all its descendent segments forms an additional segment or a
branch of that node. The bottom nodes of the decision tree are called leaves (or terminal nodes). For each leaf, the decision rule
provides a unique path for data to enter the class that is defined as the leaf. All nodes, including the bottom leaf nodes, have mutually exclusive
assignment rules; as a result, records or observations from the parent data set can be found in one node only. Once the decision rules have been
determined, it is possible to use the rules to predict new node values based on new or unseen data. In predictive modeling, the decision rule yields the
Although decision trees have been in development and use for over 50 years (one of the earliest uses of decision trees was in the study of television
broadcasting by Belson in 1956), many new forms of decision trees are evolving that promise to provide exciting new capabilities in the areas of data
mining and machine learning in the years to come. For example, one new form of the decision tree involves the creation of random forests. Random
forests are multi-tree committees that use randomly drawn samples of data and inputs and reweighting techniques to develop multiple trees that, when
combined, provide for stronger prediction and better diagnostics on the structure of the decision tree. Besides modeling, decision trees can be used to
explore and clarify data for dimensional cubes that can be found in business analytics and business intelligence.