Module 5 DatasetExperiment objects

  • learn about the DatasetExperiment object
  • find out why we use the DatasetExperiment object
  • use a DatasetExperiment object containing example metabolmics data

5.1 The DatasetExperiment template

The DatasetExperiment template defines the format data should be in in order to be compatible with other struct objects. By ensuring that the input data follows a strict template, struct can ensure that the data is compatible with all steps in a workflow.

The DatasetExperiment template consists of three key elements: data, sample_meta and variable_meta.

  • data: a table of peak areas/heights
    The DatasetExperiment template defines that the data should be formatted as a data.frame with features (metabolites) in the columns, and samples in the rows. Each row of the data.frame therefore contains the peak areas for all metabolites measured in that sample.
  • sample_meta: meta data for the samples
    Contains information about the samples in addition to the sample names. It can come in many forms. For example it could be categorical: e.g. whether the sample was a control sample or a treated sample, or it could be continuous: e.g. the BMI of the subject. The DatasetExperiment template defines that the sample meta data should be a data.frame where the samples are in rows, and each column corresponds to one piece of meta data.
  • variable_meta: meta data for the features
    Like the sample meta data except that it contains additional information about each feature (variable, metabolite), such as m/z and retention time, or maybe annotation information. The DatasetExperiment template defines that this should be a data.frame where the features are in rows and the columns correspond to on piece of meta data.

The above definitions are for untargeted metabolomics, but the format is compatible with other data types provided you can arrange the data into a table with samples in rows and variables in column. For example, the data for an NMR study might have total areas for each ppm bucket in the data table instead of metabolite peak areas, and ppm ranges in the variable meta-data instead of m/z and retention time.

5.2 Iris DatasetExperiment object

For the struct package Fisher’s classic Iris dataset has been converted to DatasetExperiment format. You can import it into your environment as follows:

# import fishers iris data
DE = iris_DatasetExperiment()

We can examine the DE object in your environment using the show command.

show(DE)
A "DatasetExperiment" object
----------------------------
name:          Fisher's Iris dataset
description:   This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of
                 the variables sepal length and width and petal length and width,
                 respectively, for 50 flowers from each of 3 species of iris. The species are
                 Iris setosa, versicolor, and virginica.
data:          150 rows x 4 columns
sample_meta:   150 rows x 1 columns
variable_meta: 4 rows x 1 columns

From this output you can see that the DE object has a name and a description defined. These fields are common to all struct objects and can be accessed using dollar notation:

DE$name
[1] "Fisher's Iris dataset"
DE$description
[1] "This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica."

Fields (formally called “slots”) in a struct object can be assigned new values using the similar notation:

# change the name
DE$name  = 'Fisher/Anderson Iris dataset'
# show updated object
DE
A "DatasetExperiment" object
----------------------------
name:          Fisher/Anderson Iris dataset
description:   This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of
                 the variables sepal length and width and petal length and width,
                 respectively, for 50 flowers from each of 3 species of iris. The species are
                 Iris setosa, versicolor, and virginica.
data:          150 rows x 4 columns
sample_meta:   150 rows x 1 columns
variable_meta: 4 rows x 1 columns

The default output is to show the contents of the object, so we were lazy and skipped a specific call to show.

For this DatasetExperiment object we can see that there are 4 columns of data for 150 samples. There is a single column of meta data for the samples, and a single column of meta data for the variables. These slots can be accessed in the same way as the other slots, using dollar notation. Here we show the first 6 rows of the data.

head(DE$data)
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1          5.1         3.5          1.4         0.2
2          4.9         3.0          1.4         0.2
3          4.7         3.2          1.3         0.2
4          4.6         3.1          1.5         0.2
5          5.0         3.6          1.4         0.2
6          5.4         3.9          1.7         0.4

5.3 Exercise

MTBLS79 DatasetExperiment

In this exercise you will be able to test your understanding of DatasetExperiment objects.