using Turing
using Bijectors
using Gadfly
using DataFrames, DataFramesMeta
using StatsPlots
Gadfly.set_default_plot_size(900px, 300px)

Changepoint models are those in which an underlying process exhibits a discrete change in its behavior partway through your data. In other words, you have a specific point at which the behavior before and after differ in some material way. As an example, consider the following data taken from this post on the Julia discourse:

τ_true, μ₁, μ₂, σ_true = 500, 45, 30, 4
c = vcat(rand(Normal(μ₁,σ_true), τ_true), 
         rand(Normal(μ₂,σ_true), 1000-τ_true));

Gadfly.plot(DataFrame(y=c, x=1:1000), x=:x, y=:y, Geom.line)

Creating a model for this can be tricky, not least because MCMC and especially HMC differentiable (ie continuous!) variables, and the changepoint is discrete in the simplest implementation:

\begin{aligned} \tau &\sim \mathrm{DiscreteUniform}(1, \mathrm{length}(\mathcal{D}))\\ \mu_1, \mu_2 &\sim \mathrm{Normal}(\mu_0, \sigma_0)\\ d_i &\sim \begin{cases} \mathrm{Normal}(\mu_1, \sigma) & i \leq \tau \\ \mathrm{Normal}(\mu_2, \sigma) & i > \tau \\ \end{cases} \end{aligned}

From the title of the post, you might have inferred that we don't need to have a discrete signal. Here's the core idea laid out on Gelman's blog:

The other thing I think you can do is a basis expansion in a basis that includes functions like -C/(1+exp(-(x-k)/a) (ie. a shifted rescaled sigmoid), put priors on the location for the shift and the degree of the shift (C) and the quickness of the shift (a)… There’s something to be said for approximating things as step functions, and there’s also something to be said for approximating things as infinitely smooth approximations to step functions :-)getting outside the rigidity of your model and thinking about approximations to your model can be helpful.

To create a smooth approximation, we look to a scaled, shifted version of the sigmoid function:

\begin{aligned} \mathrm{sigmoid}(x) &= \frac{e^x}{1+e^x} \\ l(a, b, x) &= a(x-b)\\ \mathrm{sigm}(a, b, x) &= \mathrm{sigmoid}\circ l(a,b,x) \end{aligned}

I think of $a$ as the "specificity" of the step change, and call it "spec" below. Perhaps naming isn't my strong suit. In any case...

logit = bijector(Beta())   # bijection:  (0, 1)  →  ℝ
inv_logit = inv(logit)     # bijection:       ℝ  →  (0, 1)

function sigm(μ, σ, x)
    return inv_logit(σ*(x - μ))
end;

x=-10:.1:20
df = DataFrame(y=sigm.(10, .5, x), x=x)
Gadfly.plot(df, x=:x, y=:y, Geom.line)

See? It's a smooth step function! That means we'll get gradients galore! We use that to interpolate between the two likelihoods we want to step between:

\begin{aligned} \tau &\sim \mathrm{Uniform}(1, \mathrm{length}(\mathcal{D}))\\ \mu_1, \mu_2 &\sim \mathrm{Normal}(\mu_0, \sigma_0)\\ \sigma &\sim \mathrm{HalfNormal}(0, 10)\\ \zeta_i &= \mathrm{sigm}(a, \tau, i)\\ d_i &\sim \mathrm{Normal}((1 - \zeta_i) \mu_1 + \zeta_i \mu_2, \sigma) \end{aligned}

In Turing:

@model function changepoint(data)
    # priors
    spec = 0.01
    μ_1 ~ Normal(0, 2)
    μ_2 ~ Normal(0, 2)
    τ ~ Uniform(1, 1000)
    σ ~ truncated(Normal(1, 2), 0, 20)
    
    # likelihood
    for i in 1:length(data)
        switch = sigm(τ, spec, i)
        z = (1-switch) * μ_1 + switch * μ_2
        data[i] ~ Normal(z, σ)
    end
end;

Now we're ready to roll - I had to fuss with both $\epsilon$ and spec before landing on standardizing the data as the most important first step, so you may find better hyperparameter values.

std_c = (c .- mean(c)) ./ std(c);

# Settings of the Hamiltonian Monte Carlo (HMC) sampler.
iterations = 500
ϵ = 0.005
τ = 10;

cp_chain2 = sample(
    changepoint(std_c), 
    HMC(ϵ, τ), iterations, 
    progress=true, drop_warmup=false);

Sampling: 100%|█████████████████████████████████████████| Time: 0:00:10

StatsPlots.plot(cp_chain2, size=(3500,2200), titlefont = (34), tickfont=(18), guide="")

Voilà!