首页 > > 详细

BINF90004 assignment BEME: Microbiome data analysis, Facility effect

BEME: Microbiome data analysis, Facility effect
BINF90004 assignment prepared by Kim-Anh Lê Cao
September 21, 2020
Contents
Guidelines 1
1 Background 1
2 Load and normalise data 2
2.1 R libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Load data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.3 Normalise data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3 Data analysis aims 3
3.1 Exploratory analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.2 Discriminant analysis and feature selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.3 Conclusion and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
4 Session information 4
Guidelines
Ensure the captions, legend and title of plots are informative. ‘Less is more’: comment your code appropriately
but do not overload the code. You will be assessed based on your critical thinking, interpretation and ability
to reuse the R code provided. We advise you use an .Rmd format to generate your report in a reproducible
way (our motto) and we provide such template on the LMS (you can choose not to use the template, as long
as you respect the number of page limit to 10 pages, excluding the sessionInfo()).
General feedback from past assignments: comment and interpret figures and outputs. Your assessor
will not second guess what the outputs mean, nor will any collaborator or client you may work with in your
future career when you hand out a professional report.
If stuck in R, check the help file of any R function by typing ?function_name to check the argument options
and the usage (type: example(function_name) or scroll down to the example section in the help file).
1 Background
BEME stands for Batch Effect Microbiome experiment, a project that was conducted at the University of
Queensland (UQ), led by Dr Kim-Anh Lê Cao with Dr Muralidhara Maradana and Mr Nicholas Matigian.
BEME was designed to study potential confounders (or batch) effects in murine microbiome experiments,
as well as studying the effect of a high fat high sugar diet on the microbiome. Batch effects are defined as
systematic non-biological variation between groups of samples (or batches) due to experimental artefacts.
Melbourne Integrative Genomics, School of Mathematics and Statistics | The University of Melbourne, VIC 3010
http://mixomics.org/ | http://lecao-lab.science.unimelb.edu.au/
2
These technical sources of variation occur as measurements are affected by laboratory conditions, reagent
lots, sequencing runs, differences in sequencing techniques or protocols, and technicians.
Microbiome data were generated using 16S rRNA sequencing, a targeted approach that sequences particular
regions of the 16S rRNA gene found exclusively in bacteria and archaea. The 16S rRNA method is a
cost-effective way of assessing the microbiota composition. The BEME experiment was carefully designed to
report potential sources of batch effects, including cage, animal facility, sequencing run etc (see Figure 1).
Figure 1: BEME Experimental Design. C57/B6 female black mice were housed in different animal facilities at
UQ (PACE and TRI) in different cages (3 animals per cage; 24 mice per facility, except for TRI where we
lost one mouse). They were fed with a high fat high sugar diet (HFHS) or a normal diet. Stool sampling
was performed at Day 0, 1, 4 and 7. We used miSeq sequencing to obtain the 16S data.
2 Load and normalise data
The data are available as .RData.
2.1 R libraries
You will need to install the following libraries on bioconductor to analyse the BEME data:
library(mixOmics)
library(kableExtra) # for fancy tables
2.2 Load data
The data contain the microbiome composition of all mice in rows for different taxa (Operational Taxonomy
Unit, OTU) in columns. We also have different types of meta data information on each sample. The different
Melbourne Integrative Genomics, School of Mathematics and Statistics | The University of Melbourne, VIC 3010
http://mixomics.org/ | http://lecao-lab.science.unimelb.edu.au/
3
Table 1: Snippet of Taxonomy ranks information
Kingdom Phylum Class Order Family Genus Species
OTU_13 Bacteria Bacteroidetes Bacteroidia Bacteroidales S24-7
OTU_21 Bacteria Bacteroidetes Bacteroidia Bacteroidales S24-7
OTU_7 Bacteria Bacteroidetes Bacteroidia Bacteroidales S24-7
OTU_29 Bacteria Bacteroidetes Bacteroidia Bacteroidales Rikenellaceae
OTU_30 Bacteria Firmicutes Clostridia Clostridiales
OTU_14 Bacteria Bacteroidetes Bacteroidia Bacteroidales S24-7
Table 2: Taxonomy ranks information for the whole data set
Kingdom Phylum Class Order Family Genus Species
Length:419 Length:419 Length:419 Length:419 Length:419 Length:419 Length:419
Class :character Class :character Class :character Class :character Class :character Class :character Class :character
Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character Mode :character
taxononmy levels of each OTU as also available. Note that for 16S we may not always have the Genus and
Species information of the OTUs and we often have a look at the Family level.
load('BEME_data.RData')
#ls()
#colnames(sample.info)
dt = head(taxonomy_tab) # just a head, see Table 1
kable(dt, caption = 'Snippet of Taxonomy ranks information') %>%
kable_styling(bootstrap_options = "striped", font_size = 7)
# Summary of all taxonomy ranks, counts in each type and at different levels
# see Table 2
dt = summary(taxonomy_tab)
kable(dt, caption = 'Taxonomy ranks information for the whole data set') %>%
kable_styling(bootstrap_options = "striped", font_size = 5)
2.3 Normalise data
The data were scaled for sequencing library size (Total Sum Scaling, or TSS) and log ratio transformed, as
described in http://mixomics.org/mixmc/pre-processing/.
dim(data.TSS)
# snippet of data
data.TSS[1:5,1:5]
3 Data analysis aims
In this assignment, we are interested in comparing the two facilities using multivariate dimension reduction
methods. Justify any parameter you choose for each method, and detail your interpretations of the graphical
or numerical results.
Melbourne Integrative Genomics, School of Mathematics and Statistics | The University of Melbourne, VIC 3010
http://mixomics.org/ | http://lecao-lab.science.unimelb.edu.au/
4
3.1 Exploratory analysis
Conduct an in-depth exploratory analysis with PCA with a particular focus on identifying any type of
confounder / batch effects in this experiment. Remember that batch effects are technical effects we do not
expect to observe. Do not forget to describe the dataset and meta data.
3.2 Discriminant analysis and feature selection
We are interested to see whether the facility has an effect on the microbiome. Subset the data to focus on a
facility and a particular day (see the R code below for subsetting). Perform a sparse PLS-DA on Day 7 in
the facility TRI to discriminate Diet and identify a microbial signature that characterises HFHS vs normal
diet. You will set the number of OTUs to select to 50 per component. Perform a similar analysis but for the
PACE facility. Discuss the differences or communalities of the results you obtain in TRI and PACE at the
sample and OTU signature level (use the Family taxonomy level when you compare the signatures).
Example of code to subset to facility TRI and specific day:
# check your numbers first!
kable(table(sample.info$Day, sample.info$Facility))
PACE TRI
Day0 24 23
Day1 24 23
Day4 24 23
Day7 24 23
# keep TRI samples only
keep.TRID7 = which(sample.info$Facility == 'TRI' & sample.info$Day == 'Day7')
data.TSS2 = data.TSS[c(keep.TRID7),]
sample.info2 = sample.info[c(keep.TRID7),]
# drop levels when there is a factor
for(k in 1:ncol(sample.info2)){
if(is.factor(sample.info2[,k])) sample.info2[,k] = droplevels(sample.info2[,k] )
}
# check dimensions of data and meta data are matching and are ok
dim(data.TSS2)
## [1] 23 419
dim(sample.info2)
## [1] 23 9
3.3 Conclusion and discussion
Provide an overall conclusion about the different types of analysis you conducted. Do you think it would be
appropriate to rename this experiment SBEME (Surprising Batch Effect Microbiome Experiment)?
4 Session information
sessionInfo()
Melbourne Integrative Genomics, School of Mathematics and Statistics | The University of Melbourne, VIC 3010
http://mixomics.org/ | http://lecao-lab.science.unimelb.edu.au/
5
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!