Retention Rates: A Brief Case Study in User Segmentation

Codeflix Retention Rates Q1 2017

Introduction

Summary

Codeflix offers a subscription program to its customers. To understand the success of the subscription model we will be examining active subscriptions and subscriptions churn rate for Q1 2017. We will also be examining possible churn rate differences in two previously identified user segments, segments 87 and 30.

Conclusions

The churn rate for users in segment 87 is unacceptably high and should be our first priority. While the churn rates for users in segment 30 is still higher than we might like, addressing issues with segment 87 will result in the greatest impact.

We recommend Codeflix conducts user interviews since this is more likely to provide clear insight into how we can or if we can improve our subscriptions product for users in segment 87. Since promotions and technical issues do not account for this churn, we should focus our research on usability and product features.

Analysis

A. Subscriptions Churn

Our findings show subscription numbers have increased 119.16% from the start of January to the start of March. To get a sense of how many active users we are losing we will start by finding the overall churn rates by month. Churn rate is calculated by dividing the total number of subscriptions canceled during the month and dividing the total active subscriptions at the start of the month.

Table 1 shows the churn rate for subscriptions is much higher than desired. Codeflix aims for a churn rate of approximately 7%. The typically churn benchmark is 5%, however, Codeflix is in its early growth stage and does not invest much in marketing efforts. On the other hand, high churn costs Codeflix a significant amount of revenue, which is particularly true due to Codeflix's small user base.

We also see that the churn rate in March is significantly higher than in January and February. The reason for this difference is unclear, no promotions were run over this time period and there were no major technical issues either. However, updates are constantly being made to the UX via design and feature additions. Further inquiry into these changes could yield reasons for the observed disparity.

Table 1.

image

image

B. Churn by Segment

In order to understand the churn rate differences between types of users, we divide users into segments based on similarities (demographic, behavioral or otherwise). Previous analysis of Codeflix subscribers has found two main user segments which we have labeled 87 and 30.

Table 2 shows the segmented churn rates. From this table, we can clearly see that the majority of total churn is coming from segment 87. The churn rate for segment 87 is significantly higher than total churn and ranges between 25% and 49%.

Churn rates for segment 30 are closer to acceptable levels and range between 7% and 12%. Further research to determine possible reason or reasons for the extreme differences in churn is necessary and user activity research, as well as user interviews, should be considered.

image

image

Table 2.

image

Apendix

A. SQL query

In [ ]:
/* table of first day and last day of first 3 months in 2017
*/  

WITH months AS
    (SELECT 
    '2017-01-01' as first_day,
    '2017-01-31' as last_day
  UNION 
  SELECT 
    '2017-02-01' as first_day,
    '2017-02-31' as last_day
  UNION 
  SELECT 
    '2017-03-01' as first_day,
    '2017-03-31' as last_day
  FROM subscriptions s),
     
cross_join AS 
  (SELECT *
  FROM subscriptions
  CROSS JOIN  months),

/* boolean is_active and is_canceled separated by segment
*/  

status AS (
  SELECT id,
    first_day AS month,
  
  --is_active_87
  CASE
    WHEN (subscription_start < first_day)
    AND (
     subscription_end > first_day OR
     subscription_end IS NULL
       )
    AND (segment = 87) THEN 1
    ELSE 0
  END AS is_active_87,
  
  --is_active_30
  CASE
    WHEN (subscription_start < first_day)
    AND (
        subscription_end > first_day OR
        subscription_end IS NULL
   )
    AND (segment = 30) THEN 1
    ELSE 0
  END AS is_active_30,
  
  --is_canceled_87
  CASE
    WHEN (subscription_end BETWEEN first_day AND last_day )
    AND (segment = 87) THEN 1
    ELSE 0
  END AS is_canceled_87,
  

  --is_canceled_30
  CASE
    WHEN (subscription_end BETWEEN first_day AND last_day )
    AND (segment = 30) THEN 1
    ELSE 0
  END AS is_canceled_30
FROM cross_join
),

/* sums for status by month for each segment
*/  
status_aggregate AS (
    SELECT month, 
    SUM(is_active_87) AS sum_active_87,
    SUM(is_active_30) AS sum_active_30, 
    SUM(is_canceled_87 ) AS sum_canceled_87,
    SUM(is_canceled_30) AS sum_canceled_30
  FROM status
  GROUP BY month
)

/* calculate churn rates for each segment each month
*/  
SELECT month,
  CAST (sum_canceled_87 AS FLOAT) / sum_active_87 AS churn_rate_87,
  CAST (sum_canceled_30 AS FLOAT) / sum_active_30 AS churn_rate_30
FROM status_aggregate
GROUP BY month;

B. Visualizations

In [ ]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
table_df = pd.read_csv('churn_rates.csv', header = 1, 
                       usecols = ['month', 'churn_rate_87', 'churn_rate_30']).dropna()
table_df['month'] = ['Jan', 'Feb', 'March']
In [ ]:
table_df.to_html(index=False)