Decision Trees: A powerful and intuitive process to predict churn (Case Study)
There are many techniques available for the analysis of large-scale data. Churn-data of Telco companies can be analysed in various ways. Some focus on understanding the key characteristics of customers who will end their contract in the near future. Others concentrate more on predicting as precisely as possible which customers are expected to churn. The drawback of the second group of methods is that they are often very complex and not easy to follow for people without a data-science background. We believe decision trees fit perfectly in between these two groups. It gives possible to predict churn for large-scale data, with many different attributes. Furthermore, it can be visualized as a tree, which displays very clearly how the is built and how it is used to make predictions.
An example of a decision tree for a sample dataset of a Telco company is displayed below. The target variable of this tree is customer churn. The tree can be seen as a kind of flow-chart, you start at the top and follow some questions to end up in a certain bucket at the bottom. The tree could be followed manually, such that each customer could be placed in one of the buckets. The splits in the tree are questions about the values of each attribute of the customer. All customers that end up in a bucket have certain characteristics in common. In this way, we can assign a probability of churn to each bucket.
The most complex part of the decision tree is the algorithm which is used to generate the tree, although the main idea is quite intuitive. The goal is to make each bucket as pure as possible. In a perfectly pure bucket, all customers would churn or all customers would not. We start with the whole dataset. Then the algorithm strives to find the attribute to split on, which makes the two subsets the purest as possible. After selecting the first split, the same process is repeated for the part of the data one step lower in the tree, and the same for the next step.
When the model is created the most difficult part of the work is already done. We used data from the past to build the tree. Now we can apply all that is learned from historical data to new data of different customers. It enables us to go through the tree for each customer and predict whether they are likely to end their contract in the near future or not. Of course, we are not going to follow the tree by hand but let the computers do the work. This model could be applied to an unlimited quantity of customers, who would all end up in one of the buckets.
It is important to keep in mind that we are working with a model. The goal is to predict as correctly as possible which customers are going to end their contract. We can never achieve a 100% accuracy on the model. A crucial aspect of the tree for the accuracy of prediction is how many nodes it includes. Of course, the tree could be split such that there is a separate bucket for practically every customer. This would give nearly 100% accuracy if one runs it over the same data which is used for building the tree. For new data, this would not be the correct approach. A tree that has too little nodes is not specific enough. This results in a model that performs poorly, both on the old and the new data. The trick is to find a sweet spot in between those ends.
We obtained a maximum accuracy of 79% for predicting churn on a test dataset. In other words, the model predicts in almost eight out of ten cases correctly whether the customer is going to end his/her contract soon or not. This information shows Telco companies which customers who are likely to end their contract. They could use these insights to target customers in order to retain them for a longer period.