---
title: "3 - Exploring Outputs from segclust2d"
author: "R. Patin"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{3 - Exploring Outputs from segclust2d}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---


```{r, echo = FALSE}
options(Encoding="UTF-8")
knitr::opts_chunk$set(
  fig.width = 8,
  fig.height = 5,
  collapse = TRUE,
  comment = "#>"
)
```

```{r library and data, fig.show='hold'}
library(segclust2d)
```

# Possible outputs and general functioning

Both `segmentation()` and `segclust()` return objects of `segmentation-class`
for which several functions are available (see [below](#list-of-functions)).

## General functioning

There are two types of function: (1) some are general and show likelihood for
all the different segmentations; (2) other are specific to a given segmentation
and requires selecting a number of segments and of clusters (if applicable).

### Default values for nseg and ncluster

For the functions specific to a given segmentation, if you do not provide as
argument the number of segments and of clusters, the functions will automatically
select the best arguments based on a penalized log-likelihood as following:

- for outputs of `segmentation()` the optimal number of segments is selected with
[Lavielle's criterium](https://rpatin.github.io/segclust2d/articles/v02_run_segclust2d.html#selecting-the-number-of-segments-1). 
Other numbers of segments may be provided with arguments `nseg`.

- for outputs of `segclust()` the optimal numbers of clusters and segments are selected with 
a [BIC-based penalized criterium](https://rpatin.github.io/segclust2d/articles/v02_run_segclust2d.html#selecting-the-number-of-segments-2). 
Other parameters may be provided with arguments `nseg` and `ncluster`. It is
recommended to manually choose the number of clusters based on biological
knowledge or careful exploration of the BIC-based penalized likelihood. Once the number of clusters was chosen (either manually or automatically) it is recommended to select the number of segments using the automatic BIC-based penalized likelihood criterium.

### Graphical outputs

All plot methods use `ggplot2` package and return `ggplot` objects that can be
further modified and customized using classical `ggplot2`
(see [ggplot2 function reference](https://ggplot2.tidyverse.org/reference/)).

### Default value for `order`

If you provide argument `order = TRUE` to a function specific to a segmentation,
then the different segments or clusters will be numbered ordered by the variable
provided as `order.var` in the `segmentation()` or `segclust()` call.

## List of functions

1. **Graphical outputs**

*For a specific segmentation:*

- `plot.segmentation` to show the segmented time-series, and clusters if applicable.
- `segmap`  to show the results of the segmentation as a labelled path (if applicable).
- `stateplot` plot summary statistics for all segments or clusters.

*Summary for all segmentations:*

- `plot_likelihood` for segmentation() show the log-likelihood of the
segmentation for all numbers of segments. 
- `plot_BIC` for segclust() show the
BIC-based penalized log-likelihood of the segmentation.clustering for all
numbers of segments and clusters.


2. **Extracting results**

*For a specific segmentation:*

- `augment` returns a data.frame with the original data as well as the segment
or cluster associated for each data point
- `segment` returns a data.frame with the beginning and end of each segment
- `states` for `segclust` provides a data.frame with summary statistics for all clusters

*Summary for all segmentations:*

- `logLik` for `segmentation()` returns a data.frame with the  log-likelihood
for all numbers of segments.
- `BIC` for `segclust()` returns a data.frame with the BIC-based penalized
log-likelihood for all numbers of clusters     and segments.

# Examples

As functions for segmentation and segmentation/clustering are very similar,
we will show examples mostly for the segmentation/clustering outputs, but the
use is very similar, argument `ncluster` just need to be omitted for obtaining
outputs for segmentation.

```{r loading data and segclust, fig.show='hold', message = FALSE}
data(simulmode)
simulmode$abs_spatial_angle <- abs(simulmode$spatial_angle)
simulmode <- simulmode[!is.na(simulmode$abs_spatial_angle), ]
mode_segclust <- segclust(simulmode,
                          Kmax = 20, lmin=10, ncluster = c(2,3),
                          seg.var = c("speed","abs_spatial_angle"),
                          scale.variable = TRUE)
```


## `plot.segmentation` for segmented time-series

```{r plot.segmentation, fig.show='hold', message = FALSE}
plot(mode_segclust, ncluster = 3)
```

## segmap - map the segmentation

`segmap()` plots the results of the segmentation as a labelled path. This can be done only
if data have a geographic meaning. Coordinate names are by default "x" and "y"
but they can be provided through argument `coord.names`.

```{r segmap, fig.show='hold', message = FALSE, fig.width=5, fig.height=5}
segmap(mode_segclust, ncluster = 3)
```

## stateplot - plot states statistics

`stateplot()` shows statistics for each state or segment.
```{r stateplot, fig.show='hold', message = FALSE, fig.width=3, fig.height=3}
stateplot(mode_segclust, ncluster = 3)
```

## Extract information from segmentation

### augment - get data.frame with segment/cluster information for all points

`augment.segmentation()` is a method for `broom::augment`. It returns an
augmented data.frame with outputs of the model - here, the attribution to
segment or cluster.

```{r augment, fig.show='hold', message = FALSE, eval = FALSE}
augment(mode_segclust, ncluster = 3)
```

### segment - Extract each segment (begin, end, statistics)

`segment()` makes it possible to retrieve information on the different segments for a given segmentation. Each segment is associated with the mean and standard deviation
for each variable, the state (equivalent to the segment number for
`segmentation`) and the state ordered given a variable - by default the first
variable given by `seg.var`. One can specify the variable for ordering states
through the `order.var` of `segmentation()` and `segclust()`.

```{r segment, fig.show='hold', results = "hide", message = FALSE}
segment(mode_segclust, ncluster = 3)
```

### states - statistics about each states.

`states()` returns information on the different states of the segmentation. For
`segmentation()` it is quite similar to `segment()`. For `segclust`, however it
gives the different clusters found and the statistics associated.

```{r states, fig.show='hold', results = "hide", message = FALSE}
states(mode_segclust, ncluster = 3)
```

## Get likelihood for all segmentation or segmentation/clustering

### log-Likelihood (segmentation)

`logLik.segmentation()` return information on the log-likelihood of the
different segmentations possible. It returns a data.frame with the number of
segments and the log-likelihood.

```{r simulshift, fig.show='hold', message = FALSE}
data("simulshift")
shift_seg <- segmentation(simulshift,
                          seg.var = c("x","y"),
                          lmin = 240, Kmax = 25,
                          subsample_by = 60)
```


```{r logLik, eval = FALSE}
logLik(shift_seg)

```

`plot_likelihood()` plots the log-likelihood of the segmentation for all the tested numbers of segments and clusters.

```{r plot_likelihood, fig.show='hold', message = FALSE}
plot_likelihood(shift_seg)
```

### BIC-based penalized likelihood (segclust)

`BIC.segmentation()` returns information on the BIC-based penalized
log-likelihood of the different segmentations possible. It returns a data.frame
with the number of segments, the BIC-based penalized log-likelihood and the
number of cluster. For `segclust()` only. Note that this does not truly return
a BIC. Here highest values are favored (in opposition to BIC)

```{r BIC, fig.show='hold', results = "hide", message = FALSE}
BIC(mode_segclust)
```

`plot_BIC()` plots the BIC-based penalized log-likelihood of the segmentation for
all the tested numbers of segments and clusters.

```{r plot_BIC, fig.show='hold', message = FALSE}
plot_BIC(mode_segclust)
```