Module 8 Model sequences

  • Describe the model sequence (model_seq) template
  • Create and use model sequences to implement a workflow
  • Access individual workflow steps
  • Apply a model sequence to an example dataset

8.1 The model_seq template

The purpose of the model sequence template is to allow you to chain together model objects into a workflow (or sequence) and then apply the entire sequence to a dataset in a single step.

In Modules 6 and 7 you will have applied several models in a row through multiple uses of model_apply, and passing the output data from one model into the next using the predicted method. Model sequences can do this for you.

We can create a model sequence by “adding” models to the sequence. For example, it is common practice to mean centre before PCA.

# prepare model sequence
MS = mean_centre() + PCA()

# summarise sequence
show(MS)
A model_seq object containing:

[1]
A "mean_centre" object
----------------------
name:          Mean centre
description:   The mean sample is subtracted from all samples in the data matrix. The features in the centred
                 matrix all have zero mean.
input params:  mode 
outputs:       centred, mean_data, mean_sample_meta 
predicted:     centred
seq_in:        data

[2]
A "PCA" object
--------------
name:          Principal Component Analysis (PCA)
description:   PCA is a multivariate data reduction technique. It summarises the data in a smaller number of
                 Principal Components that maximise variance.
input params:  number_components 
outputs:       scores, loadings, eigenvalues, ssx, correlation, that 
predicted:     that
seq_in:        data

If you are familiar with ggplot, then adding models to a sequence is similar to adding layers to a plot. When models are added together they automatically become a model_seq object.

class(MS)[1]
[1] "model_seq"

8.2 Applying model sequences

When model_apply is used with a model sequence, the data is input into the first model of the sequence. That model is applied to the data, and then the output of the model is used as the input to the next model. For our PCA example this means that the data is mean centred, and the then the mean centred data is used as input to PCA.

# apply model sequence to iris data
MS = model_apply(MS,iris_DatasetExperiment())

The output of a model object is specified by the pred slot, which names the output slot used as input to the next model in a sequence. This should always be a DatasetExperiment object, or the model sequence will fail.

The name of output data from a model can be displayed using the predicted_name function.

# example object
M = mean_centre()

# default output for sequences
predicted_name(M)
[1] "centred"

chart objects cannot be used in a model sequence.

8.3 Indexing steps in a sequence

Formally, model steps in a sequence are stored in a list of the model_seq object. You can access it using models method. However, it often easier to access individual steps through the use of indexing. For example, the first model of the sequence can be extracted using square brackets:

# get first model of sequence
M = MS[1]
M
A "mean_centre" object
----------------------
name:          Mean centre
description:   The mean sample is subtracted from all samples in the data matrix. The features in the centred
                 matrix all have zero mean.
input params:  mode 
outputs:       centred, mean_data, mean_sample_meta 
predicted:     centred
seq_in:        data

If the model sequence has been trained (or applied) then the indexed model will contain all outputs for that step in the sequence. This is useful if e.g. we want to produce plots and charts for the data and objects at different stages of a workflow.

# prepare a plot
C = DatasetExperiment_factor_boxplot(
        feature_to_plot = 1,
        factor_names = 'Species')

# get mean centered data from first model step
DEmc = predicted(MS[1])

# plot C for the data mean centred data
chart_plot(C,DEmc)

This approach also means that all steps are available after applying the workflow. We can therefore branch off from the workflow at any point and use the partly processed data as input into a different sequence, for example. We can also keep a record of all processing steps, and explore their impact on the data. This comes at the cost of higher computation resources required to store all of the results and pass then between methods.

8.4 Exercise

Model sequences

In this exercise you will construct a workflow, apply it to some data and then generate some plots for the different steps. You will create a number of models that might be unfamiliar to you. Don’t worry if you dont know what these steps are yet, the idea is to give you practice using model sequences; details of the steps used are not critical for this exercise.

Use the default input values for objects unless specified.