What are decision trees?
In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision-making. As the name goes, it uses a tree-like model of decisions. Though a commonly used tool in data mining for deriving a strategy to reach a particular goal, its also widely used in machine learning, which will be the main focus of this article.
It is drawn upside down with its root at the top. In the image on the left, the bold text in black represents a condition/internal node, based on which the tree splits into branches/ edges. The end of the branch that doesn’t split anymore is the decision/leaf, in this case, whether the passenger died or survived, represented as red and green text respectively.
Although, a real dataset will have a lot more features and this will just be a branch in a much bigger tree, but you can’t ignore the simplicity of this algorithm. The feature importance is clear and relations can be viewed easily. In general, Decision Tree algorithms are referred to as CART or Classification and Regression Trees.
What’s the difference between classification and regression trees?
Primarily, there are two fundamental differences between the classification and regression trees. The classification tree splits the response variable into mainly two classes, Yes or No, also can be numerically categorized as 1 or 0. To apply recursive partitioning on the target category that can contain multiple variables, C4.5 algorithm is leveraged.
In the case of simple binary splits, CART algorithm is used. This is the reason why a classification tree is applied when there is a need for the categorical variables for the categorical outcomes. The regression trees are leveraged in cases where the response variable is either continuous or numeric, but not categorical. Regression trees can be applied in the case of prices, quantities, or data involving quantities etc.
The regression and classification trees are machine-learning methods for building the prediction models from specific datasets. The data is split into multiple blocks recursively and the prediction model is fit on each of such partitions of the prediction model. Now, each partition represents the data as a graphical decision tree.
The primary difference between classification and regression decision trees is that the classification decision trees are built with unordered values with dependent variables. Classification decision tree algorithm has several features such as pruning, unbiased splits, branches/splits, split type, user-specified priors, variable ranking, user-specified costs, missing values, and bagging and ensembles.
What are the 3 components of a decision tree?
- The Decision: displayed as a square node with two or more arcs (called “decision branches”) pointing to the options.
- The Event sequence: displayed as a circle node with two or more arcs pointing out the events. Probabilities may be displayed with the circle nodes, which are sometimes called “chance nodes”.
- The Consequences: the costs or utilities associated with different pathways of the decision tree. The endpoint is called a “Terminal” and is represented by a triangle or bar on a computer.
What are the benefits of using decision trees?
1. Easy to read and interpret
One of the advantages of decision trees is that their outputs are easy to read and interpret without requiring statistical knowledge. For example, when using decision trees to present demographic information on customers, the marketing department staff can read and interpret the graphical representation of the data without requiring statistical knowledge.
The data can also generate important insights on the probabilities, costs, and alternatives to various strategies formulated by the marketing department.
2. Easy to prepare
Compared to other decision techniques, decision trees take less effort for data preparation. However, users need to have ready information to create new variables with the power to predict the target variable. They can also create classifications of data without having to compute complex calculations. For complex situations, users can combine decision trees with other methods.
3. Less data cleaning required
Another advantage of decision trees is that there is less data cleaning required once the variables have been created. Cases of missing values and outliers have less significance on the decision tree’s data.
How are decision trees used?
1. Assessing prospective growth opportunities
One of the applications of decision trees involves evaluating prospective growth opportunities for businesses based on historical data. Historical data on sales can be used in decision trees that may lead to making radical changes in the strategy of a business to help aid expansion and growth.
2. Using demographic data to find prospective clients
Another application of decision trees is in the use of demographic data to find prospective clients. They can help streamline a marketing budget and make informed decisions on the target market that the business is focused on. In the absence of decision trees, the business may spend its marketing market without a specific demographic in mind, which will affect its overall revenues.
3. Serving as a support tool in several fields
Lenders also use decision trees to predict the probability of a customer defaulting on a loan by applying predictive model generation using the client’s past data. The use of a decision tree support tool can help lenders evaluate a customer’s creditworthiness to prevent losses.
Decision trees can also be used in operations research in planning logistics and strategic management. They can help in determining appropriate strategies that will help a company achieve its intended goals. Other fields where decision trees can be applied include engineering, education, law, business, healthcare, and finance.