How to Build a Cohort Analysis in Google Sheets
Building a cohort analysis can be a helpful way of understanding user behavior over time, typically to look at retention in groups of users who sign up for a service within a given timeframe like a month or a week. Learning how to build a cohort analysis in Google Sheets will help you understand your customer retention, and gain a better understanding of how to use any spreadsheet software to analyze user data.
Some modern analytics tools can display information about cohorts, but other times you might find yourself needing to export data from a system to build a cohort analysis yourself. This post will teach you how to build a cohort analysis in Google Sheets using your own data, and what you learn here can easily be applied to Excel or any other spreadsheet program.
We will be referencing and showing screenshots from an example cohort analysis that we’ve set up, which you can view here. This can be a helpful resource when building your own too, since you can directly reference (or copy and paste!) all of the functions that are used.
Prepare the Data
Export or otherwise collect the data that you want to use to create your cohort analysis. Typically this is information about the people who use your product or service over a period of time, with information on either their engagement how long they’ve been a customer. In this example, we will look at the signup date and cancel date for each customer to look at the percentage of customers in each cohort who still have active accounts.
Create one worksheet for your raw data, and copy in all of the customer data into this sheet. Create two new columns after the registration date. In the first one, add the formula MONTH() an add the cell of the registration date into the MONTH formula. In the second, do the same thing except using the YEAR() formula.
This allows us to individually check for the month and year of customers in a particular cohort. Now do the same thing with cancel date.
Finally, we’ll want to add another column at the end titled Months as Customer. This formula will compare the cancel date and the registration date for any customer who has canceled, and it will ignore those who are still active customers. The function starts with an “if” statement that checks if the cancel date column is not equal to empty (if there is a cancel date), and then if there is a cancel date it compares the cancel and registration dates. The way it compares the dates is simple subtraction, which gets you the number of days between the two. We then divide it by 30 to get the number of months and truncate so that it doesn’t produce any decimals in the number of months.
The second part of the if statement is left blank so that the function just produces an empty cell if the customer does not have a cancel date. This if statement is helpful because it allows us to ignore the customers that haven’t canceled yet so that our formula isn’t trying to compare signup date to a blank cancel date value.
Creating the Cohort Analysis
Now that our data is in place and in the format that we need, it’s time to build our cohort analysis. We’ll add a new sheet so that we can keep the analysis and our data separate, now we’ll have worksheet tabs at the bottom named Data (with the data we just input) and Cohort Analysis (blank for now). In our example, we’re looking at monthly cohorts so we start by inputting each cohort month into its own row in our new sheet:
Next, we want to add a column titled Cohort Size in column B. We’re going to use this column to see how many people are in a particular cohort. The logic behind this formula can be summarized as:
Count a cell in the data worksheet if the month and year match the cohort month and year in cell A of the same row.
The formula for the first cohort (in our example it’s in row 3 of our sheet) is:
=countifs(Data!C:C, Month(A3), Data!D:D, Year(A3))
The countifs function will count the number of cells in the Data worksheet in which the signup month (Data worksheet column C) and the signup year (Data worksheet column D) match the month and year for that cohort (Cohort Analysis worksheet column A, in this case, cell A3).
The formula is built so that we can easily apply it to all of the other cohorts as well. Simply click on the small blue square in the bottom corner of the cell when you have B3 selected and drag it down to apply the formula to all the rest of your cohort rows:
Having this column with the size of our cohort is helpful for a couple of reasons. First, it is good for us to keep cohort size in mind, in case of substantial differences in sizes among cohorts. An example of this would be a seasonal product with a few months out of the year that have significantly larger cohorts, this is something that the cohort size column helps us to keep in mind. We will also use it in the formula we’re about to build for our cohort analysis.
Now we’re ready to build the main part of the analysis. The idea is that we want to find the percentage of each cohort that meets certain criteria (in this case, not having canceled) in a certain timeframe, which in our case is the number of months that they’ve been a customer. The logic of this formula can be summarized as:
Calculate the percentage of customers in this cohort who do not have a cancel date before this many months as a customer.
The formula is:
=($B3-(countifs(Data!$C:$C, Month($A3), Data!$D:$D, YEAR($A3), Data!$H:$H, "<1")))/$B3
The core of this formula is the countifs function, which counts the number of customers in the given cohort (notice it’s the same logic for checking month and year as our earlier formula) who’s Months as Customer value is lower than the number of months they’ve been a customer (column H in the Data worksheet). At the very beginning of our function, we subtract this from the total number of people in the cohort (cell B3) and this gives us the number of customers who are still active in that time period. Then at the end of the function, we divide that by the total size of the cohort (cell B3 again), and this leaves us with the percentage who are still active customers.
With the way this formula is built, we can copy what we have in cell C3 down to the rest of our cohorts first months in column C:
From here it’s pretty easy to apply this formula to the other columns for different months as a customer (columns D-H in this example). The first step is copying cell C4 (May 2019, 1 month as a customer) over to cell D4. All we have to change is the “<1” to “<2” since we now want to find those who have been customers at least two months in that cohort.
Now we can copy this formula down to the rest of column D:
We repeat this same process to copy the formula to columns E-H:
At this point, the bulk of our cohort analysis is built. If you stopped here and didn’t do any of the rest of the tutorial, you would still know how to build a cohort analysis in Google Sheets, but there are some simple additions we can make to We can see the percentage of each cohort that has remained a customer n-months after signing up.
In order to gain a better understanding of our data, we’re going to add some conditional formatting to apply some colors to our analysis that highlight patterns. Select all of the cells of the analysis (C3-H8 in our example) and then click on Format at the top:
Next click on the right where it says “Cell is not empty” to edit the conditional formatting.
From here, select “Color scale” at the top:
Under Format rules, change the color for “Minpoint” to be red, and the color for Maxpoint to be Green. Also, set the midpoint to use Percentile and set it to yellow:
You can customize these colors to your liking, the main idea is that they give you a way to visualize the areas where customers drop off and make it easier to spot patterns. In our example, we see that retention starts to drop off around month three, and then has significantly dropped off by month six, with some cohorts performing better than others:
Let’s go one step further by also looking at the average and median values for each month as a customer. This is another good way to visualize the important patterns in our analysis and it’s quick and simple to add.
Under our analysis, let’s add text that says Average and Median in rows 10 and 11 for Column A. In C10 we add a formula for calculating the average of all values in our analysis in Column C:
We’ll need a similar formula to calculate the median for Column C in cell C11:
From here we can simply copy over these two formulas to find the average and median for each column:
This is valuable because it helps us understand the larger pattern that is occurring across cohorts. We can also now make a graph that shows the average retention over a customer’s lifetime. To do this, select the cells for your averages and then click the graph icon at the top:
To edit the chart, click on the icon with three dots in the top right of the chart and then select Edit chart:
We can add the values from row 2 in as labels for the x-axis for our graph by entering in C2:H2 into the data range field before C10:H10 so that our full value under Date range is C2:H2, C10:H10 (see how the x-axis now has labels for the number of months as a customer):
Using this graph, we can now see the pattern in our retention cohort data very clearly. Using the analysis we can also look deeper at trends that happen in different cohorts. This is very helpful in spotting the impact of different initiatives that you undertake to improve retention because you can see the effects on different cohorts as they are implemented.
It’s simple to add more data into the Data worksheet and view our updated analysis since it is built to look at all rows of that Data worksheet. We can easily add new cohorts to the analysis as time passes as well, and maintain an up-to-date understanding of our retention trends. Now that you know how to build a cohort analysis in Google Sheets, you can easily do the same in Excel or any other spreadsheet program, and gain important insights into trends that are crucial to your business.