- learn about the DatasetExperiment object
- find out why we use the DatasetExperiment object
- use a DatasetExperiment object containing example metabolmics data
Module 5 DatasetExperiment objects
5.1 The DatasetExperiment template
The DatasetExperiment template defines the format data should be in in order to be compatible with other struct objects. By ensuring that the input data follows a strict template, struct can ensure that the data is compatible with all steps in a workflow.
The DatasetExperiment template consists of three key elements: data, sample_meta and variable_meta.
- data: a table of peak areas/heights
TheDatasetExperimenttemplate defines that the data should be formatted as adata.framewith features (metabolites) in the columns, and samples in the rows. Each row of thedata.frametherefore contains the peak areas for all metabolites measured in that sample.
- sample_meta: meta data for the samples
Contains information about the samples in addition to the sample names. It can come in many forms. For example it could be categorical: e.g. whether the sample was a control sample or a treated sample, or it could be continuous: e.g. the BMI of the subject. TheDatasetExperimenttemplate defines that the sample meta data should be adata.framewhere the samples are in rows, and each column corresponds to one piece of meta data.
- variable_meta: meta data for the features
Like the sample meta data except that it contains additional information about each feature (variable, metabolite), such as m/z and retention time, or maybe annotation information. TheDatasetExperimenttemplate defines that this should be adata.framewhere the features are in rows and the columns correspond to on piece of meta data.
The above definitions are for untargeted metabolomics, but the format is compatible with other data types provided you can arrange the data into a table with samples in rows and variables in column. For example, the data for an NMR study might have total areas for each ppm bucket in the data table instead of metabolite peak areas, and ppm ranges in the variable meta-data instead of m/z and retention time.
5.2 Iris DatasetExperiment object
For the struct package Fisher’s classic Iris dataset has been converted to DatasetExperiment format. You can import it into your environment as follows:
We can examine the DE object in your environment using the show command.
A "DatasetExperiment" object
----------------------------
name: Fisher's Iris dataset
description: This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of
the variables sepal length and width and petal length and width,
respectively, for 50 flowers from each of 3 species of iris. The species are
Iris setosa, versicolor, and virginica.
data: 150 rows x 4 columns
sample_meta: 150 rows x 1 columns
variable_meta: 4 rows x 1 columns
From this output you can see that the DE object has a name and a description defined. These fields are common to all struct objects and can be accessed using dollar notation:
[1] "Fisher's Iris dataset"
[1] "This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica."
Fields (formally called “slots”) in a struct object can be assigned new values using the similar notation:
A "DatasetExperiment" object
----------------------------
name: Fisher/Anderson Iris dataset
description: This famous (Fisher's or Anderson's) iris data set gives the measurements in centimeters of
the variables sepal length and width and petal length and width,
respectively, for 50 flowers from each of 3 species of iris. The species are
Iris setosa, versicolor, and virginica.
data: 150 rows x 4 columns
sample_meta: 150 rows x 1 columns
variable_meta: 4 rows x 1 columns
The default output is to show the contents of the object, so we were lazy and skipped a specific call to show.
For this DatasetExperiment object we can see that there are 4 columns of data for 150 samples. There is a single column of meta data for the samples, and a single column of meta data for the variables. These slots can be accessed in the same way as the other slots, using dollar notation. Here we show the first 6 rows of the data.
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
5 5.0 3.6 1.4 0.2
6 5.4 3.9 1.7 0.4
5.3 Exercise
MTBLS79 DatasetExperiment
In this exercise you will be able to test your understanding of DatasetExperiment objects.
The structToolbox packages provides the MTBLS79 dataset as an example DatasetExperiment object.
- find out about
MTBLS79_DatasetExperiment()using the package documentation. - import MTBLS79 into your environment and display a summary of the contents.
- update the description to better reflect to documentation.
- compare the data with
filtered = TRUEto the data whenfiltered = FALSE.
- You can view the help by preceding the function name with a question mark
- all slots of a DatasetExperiment can be accessed using
$notation - how many rows/columns does each dataset have?
We can append a question mark to the function name to obtain documentation for a function.
Note that in R studio the documentation will be displayed in the
helptab (default location is a tab in the bottom right panel).The
showfunction summarises the contents of astructobject.A "DatasetExperiment" object ---------------------------- name: MTBLS79 description: Converted from SE provided by the pmp package data: 172 rows x 2063 columns sample_meta: 172 rows x 7 columns variable_meta: 2063 rows x 0 columnsWe can use dollar notation to get and set values for a slot.
# update description e.g. DE$description = 'A systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts' show(DE)A "DatasetExperiment" object ---------------------------- name: MTBLS79 description: A systematic evaluation of the reproducibility of a multi-batch DIMS metabolomics study of cardiac tissue extracts data: 172 rows x 2063 columns sample_meta: 172 rows x 7 columns variable_meta: 2063 rows x 0 columnsWe can use
filtered = TRUEas an input to the function and compare the output of show for the filtered and unfiltered data.A "DatasetExperiment" object ---------------------------- name: MTBLS79 description: Converted from SE provided by the pmp package data: 172 rows x 1579 columns sample_meta: 172 rows x 7 columns variable_meta: 1579 rows x 0 columnsThe unfiltered data has 172 samples and 2063 features. After filtering the dataset has 1579 features.