using Turing
using Gadfly
using DataFrames, DataFramesMeta
Gadfly.set_default_plot_size(900px, 300px)
ENV["COLUMNS"] = 120;

Warning: Still a draft (note the tragic lack of delightful visuals).

Customer Lifetime Value (LTV or CLTV) is the total dollar value a consumer will spend at a business throughout their life. The concept is as important as the definition is straightforward - businesses very often want to know which consumers are their whales and which are eating up their marketing or infrastructure budgets with little or no value returned. This is pretty tricky and there are a few approaches you can take:

Observational

Naive calculation. The following will give you an average that is delightfully simple but tragically wrong:

$$\mathrm{LTV} = \frac{1}{|\mathrm{Customers}|}\sum_{\mathrm{orders}} \mathrm{Order\ Value}$$

Assuming (hmm) that LTV is constant over time, this will converge to the true average LTV value as customers churn (and thus achieve their final lifetime value). New customers will continue to weigh the average down and make it an underestimate. There are some of these sort of equations floating around the tubes.

Wait and see. Simialr algorithm to the above, the major difference is applying this to only a small cohort from a brief window in time. Just follow along with that group and add up how much they spend. This is simple and will get to the true LTV of that cohort faster but it's still typically too slow to be useful. By the time you know, it's months/quarters/years later (depending on the churn / repurchasing characteristics of your product) and most insights you might glean are no longer relevant to your product roadmap.

Modeled

Machine Learning . There are a bunch of ML approaches that can be found relatively easily online (but apparently not easy enough for me to find them again to include here). IIRC, one was using a random forest (or GBM, or whatever) to predict

$$P(\mathrm{purchase\ in\ next\ period}|\mathcal{D})$$

and then in a second stage model (conditioned on the purchase outcome) predict the order value of said purchase.

It's a reasonably standard approach: decompose the problem into churn, expected future purchases, and expected value per purchase. There are a bunch of approaches that are tailored to this decomposition by breaking down the inputs into the so called RFM metrics:

Recency: time since the last purchase,
Frequency: number of purchases per time period,
Monetary value: average order value.

Note that we'll use days for the time scale.

Buy 'til You Die. This type of model was popularized by Schmittlein, David C., Donald G. Morrison, and Richard Colombo in 1987 but was apparently not very easy to implement. A simpler version was created by Fader, Hardie and Lee and Fader at least built a company that expanded quite a bit on these sorts of models, Zodiac.

Custom Model. That's what we're going to do! Fader and Hardie do a great job of making their work look harder than necessary so I can't be bothered to decode it (and anyway, Alex did a great job). That said, I'm going to take what seems to be a similar approach and takes advantage of some more modern techniques (Julia and Turing.jl!):

Estimate churn based on Recency and Frequency.
Set up a super simple survival model to understand the expected number of future purchases using sample from (1) as the churn signal.
Scale by Monetary value.

By building these submodels out independently we can understood the whole model by figuring it out component-by-component. It also provides a quick way to make single-component adjustments that might be important. There will be many, this model has some real obvious deficiencies even though it captures the right ideas.

For instance, some retailers have an extremely wide spread of possible order values (e.g. Walmart, you can buy a stick of gum or probably a boat or something). If there are orders-of-magnitude differences in purchase value then you better model that out so you know exactly which consumers are likely to find themselves in that lucrative long tail. In my experience, lognormal is a decent start but the tail is still too light.

Our Models

Active from RF

We sample when we expect the customer's next purchase to occur based on what we've observed of their frequency, then we compare that to how long it's been since they purchased. If we expected them to have purchased already but they haven't then we count them as churned.

$$ \begin{aligned} \mathrm{next\ purchase} &\sim \mathrm{Exponential}(F) \\ \mathrm{active} &= R < \mathrm{next\ purchase} \end{aligned} $$

Note that we don't have any kind of regularization and just assume F is a fine number for us. Exercise for the reader to make that more stable . And also, what should we do about customers with only 1 purchase?

Future Purchases from RF+Active

We'd like to then take the inferences above and use them to understand churn as a function of time, or perhaps number of orders. In other words:

$$P(\mathrm{churned}_{t=i} | \mathrm{active}_{t=i-1})$$

Here we find some wrinkles. Most notably, what to do with consumers that have recently purchased and we don't know if they are going to churn before the next purchase? This is called censoring, which comes in many directional varieties and this variety is called right-censoring (on the "right" side of our time interval, we don't yet have data on the outcome). We'll ignore that for now, and instead assume "constant hazard" on the data we can observe, ie the rate at which users remain active ($\rho$) is constant across all time points.

$$ \begin{aligned} \rho &\sim \mathrm{Beta}(1,1)\\ \mathrm{purchases}_{uncensored} &\sim \mathrm{Geometric(\rho)}\\ (\mathrm{Future\ purchases}) &\sim \begin{cases} \mathrm{Geometric}(\rho) & \mathrm{if\ active} \\ \mathrm{Dirac}(0) & \mathrm{otherwise} \end{cases} \end{aligned} $$

LTV from M+Future Purchases

$$ \begin{aligned} \mathrm{Future\ value} &= \mathrm{Future\ purchases} * \mathrm{AOV}\\ \mathrm{Lifetime\ value} &= \mathrm{Future\ value} + \mathrm{Past\ value} \end{aligned} $$

Setting up some Daaaataaa

A little toy dataset to see if the Active model makes any kind of sense.

struct CustomerData
    days_since_last_purchase::Int64
    days_since_first_purchase::Int64
    total_purchases::Int64
    monetary_value::Float64
end

struct RFM
    raw::CustomerData
    recency::Int64
    frequency::Float64
    monetary_value::Float64
    total_purchases::Int64
end

function rfm(c::CustomerData)
    active_days = c.days_since_first_purchase - c.days_since_last_purchase
    period = (active_days) / (c.total_purchases - 1)  # wcgw?
    rfm = RFM(
        c,
        c.days_since_last_purchase,
        1 / period,
        c.monetary_value,
        c.total_purchases)
end;

rfm_data = [rfm(c) for c in [
    CustomerData(2,   60,  5,   3),   
    CustomerData(10, 305, 10,  23),
    CustomerData(53, 100, 40, 123),    # definitely churned!
    CustomerData(2,   29,  3, 123),
    CustomerData(10, 200,  5,  23),
    CustomerData(23, 222, 20,   3),    # probably churned..
]];

Active Model

@model function active(custs::Array{RFM})
    predicted_purchase_days = Vector(undef, length(custs))
    active = Vector{Bool}(undef, length(custs))

    for i in 1:length(custs)
        predicted_purchase_days[i] ~ Exponential(1.0 / custs[i].frequency) 
        active[i] = predicted_purchase_days[i] > custs[i].recency
    end
    
    return active
end;

iterations = 1000
ϵ = 0.05
τ = 10;

chain_ltv = sample(
    active(rfm_data), 
    HMC(ϵ, τ), iterations, 
    progress=false, drop_warmup=true);

Let's see how the model's output matches up with our expectations:

active_samples = DataFrame(hcat(generated_quantities(active(rfm_data), chain_ltv)...)')
combine(active_samples, :x1 => mean, :x2 => mean, :x3 => mean, :x4 => mean, :x5 => mean, :x6 => mean)

Here, x1_mean is the probability that the first customer is still active. These numbers look pretty reasonable to me, even though we didn't account for any uncertainty around F (or, like, what to do with customers that only purchased 1 time... alas).

Notice that I used generated_quantities here. This is possible because we have return active in the model block. The Turing handling of generated quantities is... just ok, sort of awkward to work with.

And with the CDNow dataset...

The raw data can be found here and represents a cohort of users that made their first purchase at CDNow in Q1 of 1997.

using CSV
cdnow = CSV.read("/Users/brad/data/cleaned_cdnow.csv", DataFrame) # oh no now you know where my filez
first(cdnow, 5)

We also need to get it a little closer to RFM format, which gives us the following table:

using Dates

cutoff_date = Date("1997-04-01")

cdnow_gdf = @linq cdnow |>
    where(:date .< cutoff_date) |>
    groupby(:customer) 

pre_rfm = combine(cdnow_gdf, 
    nrow  => :total_purchases, 
    :date => minimum => :first_purchase_dt,
    :date => maximum => :latest_purchase_dt,
    :usd  => sum     => :monetary_value)

function days_val(days)
    return days.value
end

rfm_df = @linq pre_rfm |> 
    transform(
        days_since_first_purchase = days_val.(cutoff_date - :first_purchase_dt),
        days_since_last_purchase  = days_val.(cutoff_date - :latest_purchase_dt)
    )

first(rfm_df, 5)

Which we can blast into RFM format. The model requires some tiny adjustments because in this dataset we have:

Customers with only one purchase,
Customers with only one purchase date but multiple purchases

rfm_cdn = [
    rfm(
        CustomerData(
            row.days_since_last_purchase,
            row.days_since_first_purchase,
            row.total_purchases,
            row.monetary_value)
    ) for row in eachrow(rfm_df)];

@model function active_cdn(custs::Array{RFM})
    predicted_purchase_days = Vector(undef, length(custs))
    active = Vector{Bool}(undef, length(custs))

    for i in 1:length(custs)
        if isnan(custs[i].frequency) | isinf(custs[i].frequency)
            predicted_purchase_days[i] ~ Exponential(15.0) # median period for multiple purchasers 
        else
            predicted_purchase_days[i] ~ Exponential(1.0 / custs[i].frequency) 
        end
        active[i] = predicted_purchase_days[i] > custs[i].recency
    end
    
    return active
end;

chain_ltv_cdn = sample(
    active_cdn(rfm_cdn[1:100]), 
    HMC(ϵ, τ), iterations, 
    progress=true, drop_warmup=true);

Sampling: 100%|█████████████████████████████████████████| Time: 0:00:24

So yeah, it's on the slow side.. well not slow given how much cool stuff is happening. There are like 20k customers in this dataset and only the first 100 took 30s.

Future Purchases and LTV Model

@model function future_purchases_cdn(custs::Array{RFM}, total_purchases::Array{Float64})

    # Active submodel
    predicted_purchase_days = Vector(undef, length(custs))
    active = Vector{Bool}(undef, length(custs))

    for i in 1:length(custs)
        if isnan(custs[i].frequency) | isinf(custs[i].frequency)
            predicted_purchase_days[i] ~ Exponential(15.0) # median period for multiple purchasers 
        else
            predicted_purchase_days[i] ~ Exponential(1.0 / custs[i].frequency) 
        end
        active[i] = predicted_purchase_days[i] > custs[i].recency
    end
    
    # Future purchases submodel
    churn_rate ~ Beta(1,1)
    future_purchases = Vector(undef, length(custs))
    for i in 1:length(custs)
        if !active[i]
            total_purchases[i] ~ Exponential(churn_rate)
            future_purchases[i] ~ Exponential(1e-3)  
            # Turing fails without this ^ because I lazily didn't declare a prior
            # and if not active the right answer is 0 so... 
            # we'll clean it up in post ;)
        else
            future_purchases[i] ~ Exponential(churn_rate)
        end
    end
    
    # LTV "model"
    ltv = future_purchases .* [c.monetary_value for c in custs]
    return active, ltv
end;

total_purchases = [float(r.total_purchases) for r in rfm_cdn]

chain_future_purchases = sample(
    future_purchases_cdn(rfm_cdn[1:100], total_purchases[1:100]), 
    HMC(ϵ, τ), iterations, 
    progress=true, drop_warmup=true);

Sampling: 100%|█████████████████████████████████████████| Time: 0:02:17

So there's the model! Unfortunately, wrangling generated_quantities can be annoying so making nice plots will have to wait (more to come!).

	x1_mean	x2_mean	x3_mean	x4_mean	x5_mean	x6_mean
	Float64	Float64	Float64	Float64	Float64	Float64
1	0.891	0.675	0.0	0.943	0.779	0.111

	customer	date	count	usd
	Int64	Date…	Int64	Float64
1	1	1997-01-01	1	11.77
2	2	1997-01-12	1	12.0
3	2	1997-01-12	5	77.0
4	3	1997-01-02	2	20.76
5	3	1997-03-30	2	20.76

	customer	total_purchases	first_purchase_dt	latest_purchase_dt	monetary_value	days_since_first_purchase	days_since_last_purchase
	Int64	Int64	Date	Date	Float64	Int64	Int64
1	1	1	1997-01-01	1997-01-01	11.77	90	90
2	2	2	1997-01-12	1997-01-12	89.0	79	79
3	3	2	1997-01-02	1997-03-30	41.52	89	2
4	4	2	1997-01-01	1997-01-18	59.06	90	73
5	5	3	1997-01-01	1997-02-04	82.2	90	56