Context trees

Variable Length Markov Chain (VLMC) models provide parsimonious high order Markov chains which can have a finite but long memory without suffering from the computational and estimation problems associated to dense high order Markov chain. This is achieved using the notion of context.

Contexts

Definition

We consider a time series \(x=(x_i)_{i\geq 1}\) with values in a finite set \(S\), its state space. A context \(c\) is a finite sequence of elements of \(S\), \(c=(c_1, \ldots, c_k)\) observed in \(x\). \(c\) is observed in \(x\) if there is \(t\) such that \[ x_{t-1}=c_1, x_{t-2}=c_2, \ldots, x_{t-k}=c_k. \] Notice that \(c\) is written in a reverse order, in the sense that \(c_1\) is the most recent value in \(x\) while \(c_k\) is the oldest one. Thus the sub-sequence observed in \(x\) is \((c_k, c_{k-1}, \ldots, c_1)\). We follow Bühlmann and Wyner convention.

For instance, if \(S=\{0, 1\}\) and \(x=(0, 0, 0, 1, 1, 1)\)

Context trees

The contexts of a time series can be represented by a tree. The root of the tree stands for the empty context. The children of the root represent the contexts of length 1. In general, if a node represents the context \(c=(c_1, \ldots, c_k)\), then contexts of the form \(c'=(c_1, \ldots, c_k, c_{k+1})\) are represented by the children of the node.

Let us consider again \(x=(0, 0, 0, 1, 1, 1)\) and all contexts that appear at least twice in \(x\) (i.e. which are observed for at least two different values of \(t\)). An ASCII art representation of the corresponding context tree is:

*
+-- 0
|   '-- 0
'-- 1

The tree represents 2 size one contexts (\((0)\) and \((1)\)), the direct children of the root (shown as a star *). It represents in addition 1 size 2 contexts, \((0, 0)\). Notice that for instance, \((0, 1)\) is not a context in the tree as the node of context \((0)\) has only one child labelled by \(0\).

Extracting contexts from a time series in mixvlmc

Mixvlmc can be used to compute all the contexts of a time series using the ctx_tree function/constructor as follows:

x <- c(0, 0, 0, 1, 1, 1)
library(mixvlmc)
x_ctx <- ctx_tree(x)
x_ctx
#> Context tree on 0, 1 
#>  Number of contexts: 3 
#>  Maximum context length: 2

The result of ctx_tree() is a ctx_tree object. It can be drawn using ascii art

draw(x_ctx)
#> *
#> +-- 0
#> |   '-- 0
#> '-- 1

The default extraction is done with min_size=2 and max_depth=10 which means that

Notice that the number of potential contexts grows exponentially with the length of the time series and it is therefore advisable to keep max_depth to a reasonable value. Let us consider a simple example.

set.seed(0)
y <- sample(c("a", "b", "c"), 100, replace = TRUE)
y_ctx_def <- ctx_tree(y)
y_ctx_def
#> Context tree on a, b, c 
#>  Number of contexts: 77 
#>  Maximum context length: 6

With the default parameters, we end up with already 77 contexts. Setting min_size=1 gives an unreasonable number of contexts:

y_ctx_min_1 <- ctx_tree(y, min_size = 1)
y_ctx_min_1
#> Context tree on a, b, c 
#>  Number of contexts: 602 
#>  Maximum context length: 10

If we raise the depth limit, the number of contexts keeps increasing:

y_ctx_min_1_d_15 <- ctx_tree(y, min_size = 1, max_depth = 15)
y_ctx_min_1_d_15
#> Context tree on a, b, c 
#>  Number of contexts: 1037 
#>  Maximum context length: 15

Contexts can be extracted from a context tree using the contexts function as follows:

contexts(x_ctx)
#> [[1]]
#> [1] 0 0
#> 
#> [[2]]
#> [1] 0
#> 
#> [[3]]
#> [1] 1

In general, the raw list of contexts is not very useful and contexts can be parametrized to give additional information about the contexts. In the simple case of ctx_tree, setting the frequency parameter to "total" or "detailed" gives access to the distribution of \(x_t\) for all the \(t\) at which a context appears.

contexts(x_ctx, frequency = "total")
context freq
0, 0 2
0 3
1 2

With frequency = "total", we obtain a data frame with a column freq that contains the number of occurrences of each context.

contexts(x_ctx, frequency = "detailed")
context freq 0 1
0, 0 2 1 1
0 3 2 1
1 2 0 2

With frequency = "detailed", we obtain in addition a column for each value in the state space \(S\) which contains the distribution of \(x_t\) for the occurrences of each context. For instance in the table above, the context \((0, 0)\) appears twice in \(x\) and is followed once by \(0\) and once by \(1\).