Project Description

Predicting Customer Churn in Power BI

Power BI: Identifying key factors determining Churn for a Telco company

Modern Telco companies gather lots of data on their customers. Varying from their gender to their monthly payments to the type of contract they have. One aspect that is especially important to keep track of is the group of customers that ends or will end their contract soon. Cybiant can help Telco companies to obtain valuable insights on how to identify customers that are likely to end their contract and to find which factors are indicating such behaviour. It is really important to understand which factors contribute to churn.  Therefore we built a showcase model to make clear what we can do. To transform, perform calculations and visualize we use Power BI in combination with the R programming language. Power BI is an excellent way to visualize data and it easier to understand. We integrate R into Power BI to have more flexibility to provide measures and visualizations that fit better to the client and problem at hand.

To build this example we used a sample dataset for a telco company. The data consists of 7032 unique customers and 20 different attributes with data for each customer. The variable of interest is called churn, which can be either ‘yes’ or ‘no’, and says whether a customer has ended his/her contract in the last month or not. One can divide the attributes into three categories: services that each customer signed up for, customer account information and demographic information about each customer. Examples of information in the dataset, are the monthly payments of each customer, whether they have internet, phone and/or tv contracts, whether they have a partner or not and many more. Most of the data is of categorical nature. To clean and prepare the data for analysis, the data is first loaded into the Power Query editor. This window, which is included in the Power BI software enables us to transform and edit parts of the data to make it more suitable for calculations and visualizations. An example of a transformation in this showcase splitting of the numeric variables, such as the monthly payments per customer, in different segments.

The next step is to define some interesting measures which can be used later in visualizations on the dashboard. In this case, the most important measure is the so-called “churn-rate”. This is the percentage of the people that ended their contract in the last month.  This allows us to see the general churn rate and to segment the churn-rate on the attributes included in the dataset.

Power BI has a broad range of different visualizations to obtain new insights into your data. Because our main objective is understanding which customers will be likely to end their contract and which characteristics are typical that customers. We create bar charts which show the percentage of people that churn in the last month of the different categories.

Calculating Customer Churn in PowerBI

Figure 1: Power BI Dashboard with most important Telco Churn information

In figure 1, we show some simple Power BI visualizations that can already give very interesting insights alone and combined. The percentage of all customers that ended their contract is shown in the top-left corner. The bar chart at the bottom-left provides information on the churn rate of different customers who have different types of internet services or no internet service at all. Remarkably, the churn rate of Fiber-Optic customers more than 40% is more than twice as high as for the customers who have DSL. The churn rate for customers who do not buy internet services at the company is even below 10%. By common-sense one could interfere that it is more difficult to end your contract and look for a new telco provider when a customer has no access to the internet. This could be a cause for the low churn-rate of this group of customers. Although one has to keep in mind customers could buy internet services at other companies. The two pictures next to the bar chart help to place the numbers in perspective. The circle in the middle shows the total monthly charges per segment, which is the income the company obtains monthly from each group. The customers who have Fiber Optic bring together by far the most revenue. Since they have also by far the highest churn rate, it can be concluded that this is a problem the company has to look into. The number of customers who has Fiber optic is only 44% of the total number of customers as can be observed in the pie chart in the bottom left. This indicates that on average Fiber Optic costumers pays more per month than others.

The middle-top bar chart shows that customers who have a more flexible contract have a higher average percentage of churn. The sum of monthly charges is added as a line to the graph, which shows that the group with the most flexible contract also contributes the most to the total monthly income. So for the contract type, the segment which has the highest churn rate does also account for the largest part of monthly revenue. The company needs to be aware of this problem. The chart at the top-right informs us that the churn rate is much higher for people who are a client of the company for a longer period. These kinds of insights can already be really valuable for a company.

Churn-Variables-Power-BI

Figure 2: Calculating Variable Importance for Customer Churn

The next step is to determine which attributes are the most important in predicting which customers are going to end their contract. Nowadays businesses have access to such large amounts of data and so many different attributes of information, that it is difficult to determine where to start. Figure 2 shows the importance of each attribute for predicting whether a customer will churn or not. The type of contract is the most important attribute as shown in the figure above. This figure provides a direction to start looking for more information. It makes clear that the contract attribute has the highest predictive value for churn.  Not too much time should be spent on analysing the differences between genders since that attribute has no significant predictive value for churn. Figure 2 is just an example and can be easily extended to datasets with many more attributes.

This sounds all promising, but where do the numbers come from and what do they mean? The measure used in this graph is called Information Gain (IG). The IG of an attribute is higher when it becomes more straightforward to predict the variable of interest (churn) when the data is segmented based on that attribute. Let’s take a step away from this data to make the concept more clear. In an ideal situation, we would segment on an attribute such that all customers in one group would churn and all costumers in the other group would not(note that there can be more than two groups). A group which is homogeneous concerning the target variable (churn) is called a pure group.  Real-life Big Data is rarely this clean. Therefore, we want to segment our data, such that the groups will be as pure as possible. The measure we use for purity is called “entropy”. The formula is:

     \[$entropy = -p_1 \cdot log(p_1) - p_2 \cdot log(p_2) - ... = \sum_{i=1}^n p_i \cdot log(p_i)$ \]

Pi is the proportion of property i of the target variable in the group or set. n is the number of different properties of the target variable.  A perfectly pure set has an entropy of zero. A set that is perfectly divided over the n values, has an entropy of 1. A population which has 2 different properties in the attribute of interest is displayed in figure 3. It is a simple example which uses a small dataset. The entropy of the entire population is 0,997. Then parent set (the entire population) is split into two child sets, segmented on a balance higher or lower than 50k. The entropy of the <50k set is 0,402, and >50k set has entropy 0,795. So segmenting on this attribute helped to make the segments purer than the whole populations, hence there should be information gained. Now we want to quantify how much is gained by segmenting on different attributes. This is where the information gain (IG) comes into play. This is a function of the entropy of both the parents and the child sets. The formula is as follows:

     \begin{align*} IG  &= entropy(parent) - [p(c_1) \cdot entropy(c_1) + p(c_2) \cdot entropy(c_2) + ... ] \\ &= entropy(parent) - \sum_{i=1}^n p(c_i) \cdot entropy(c_i) \\ \end{align*}

Ci refers to child set I. The entropy of each child set is weighted by the proportion of all instances which belongs to that child. The number of child sets is n. The information gain of the example in figure 3 is:

     \[0.977 - [\frac{13}{30} \cdot 0.403 + \frac{17}{30} \cdot 0.795] = 0.352\]

Basic Concept of Entropy

Figure 3: Basic Concept of Entropy

The IG of segmentations of different attributes can be used to decide which are the most important to consider. Figure 2 can be used to make decisions on which attributes to analyse further, or where to start for building more complex segmentations of which split into different attributes. Furthermore, it will be important for companies to look in which categories of a particular attribute point at certain values of the target variable. The correction for proportion in this information gain places the entropy gain from segments in perspective.

Turning back to our Telco dataset, it will mean that a very small group of a certain attribute which contains almost no people that churn, is not making the information gain unreasonably high. The proportionality is calculated based on the number of cases in the data, which corresponds to the number of customers for the Telco dataset. We could change the information gain formula into another that uses a custom-made measure of proportionality that is more appropriate for a certain business. Think of the proportion of revenue that a certain group accounts for.

We use R to calculate the information gain of each attribute and to create a chart, such as figure 2. This can easily be extended to even larger and more complicated datasets. The data is loaded from Power BI into R. R provides a good environment to perform many kinds of calculations on the data and plot those calculations in pictures as the one above. Then we load the R script into Power BI, such that all information and visualizations can be easily viewed in one report. Note all data used is from Power BI, R is only used to perform additional operations on the data, which go beyond the functionalities of Power BI.

Back to the Telco data. We found that the type of contract that people have is the most important attribute for predicting churn. So this would be the first attribute to review more carefully. The bar chart in figure 1 shows us that people who have a more flexible month-to-month contract, are more likely to churn. Which makes perfect sense, since their contract allows them to churn in a shorter period of time. We also found that the group of people with a flexible contract is responsible for the largest part of income. The Telco company could use these insights in various ways. For instance try to move people from month-to-month to yearly contracts or to keep more in touch with clients in that group, to make sure that they will be less likely to churn. These kinds of insights can also be found for the other attributes which are at the top of figure 2.

Correlation between attributes

Figure 4 – Correlation between attributes

Thanks to the information gain values we know more about the coherence of the attributes with the churn variable. To see how the attributes relate to each other, we have created a plot of “correlation” between the attributes, which is shown in figure 4. The plot below shows how much the attributes are related to each other. The values differ from 1 (perfect correlation) to 0 (no correlation at all). It is not a correlation as one would normally see with numerical variables, but a different measure for categorical variables, which is called Cramer’s V. We are not going to go in-depth on the calculation of the variables. It is good to know that each number is based on a contingency table of the two attributes. Such a table shows all combinations of the categories within the attributes, and shows the number of customers which belongs in those combinations.

But what can we do with this information? In the attribute importance chart can be observed that OnlineSecurity and TechSupport are both important attributes. In the chart above we can see that those are highly correlated. Therefore, it is plausible that one causes another or the two go almost always together. Then they can be viewed as a one combined phenomenon. Besides the value for predicting churn, this chart can be used to develop a better understanding of the behaviour of customers. It can be easily observed which attributes are interrelated.

We can use this customer dataset to study at the variable churn in two ways. The first is to develop a better understanding of the characteristics of customers who end their contract. As second, this data could be used to predict and identify the customers who are going to end their contract in the near future. The focus of this article is on the first use. We used a combination of Power BI and R. Power BI provides powerful functionalities to transform and visualize data, in order to make the data insightful. Furthermore, the software could be used to gather and connect data from different sources and display the visualizations in one report, which is easy to understand. R offers us the flexibility to perform operations on the data and make charts that are more custom-made and go beyond the capabilities of Power BI. It is manifest that this can provide insights in which attributes are the most important, which classes of those attributes are performing good or bad and how the attributes within the data are interrelated. The figures in this article are just examples of what we can do with this software on data of a company in the Telco-industry and could easily be extended to other target variables or other businesses.

Download Case Study

References: