Explicet Basic Tutorial
The information below comes from the Explicet tutorial which is automatically installed on your computer by the Windows, Mac, and Linux installation procedures (Documents or My Documents/Explicet_User_Data on Windows, and Users/Shared/Explicet_Documents on Macs). The tutorial pdf is also available by clicking here: Explicet Downloads.
Click on any of the blue text below to see the topic indicated. However, if you jump around to non-consecutive links, you may miss necessary setup steps; some examples are dependent on previous steps. Simply scroll down to start at the beginning of the tutorial!
The data used in this tutorial come from an analysis of 16S ribosomal RNA gene sequences obtained from many distinct skin sites of healthy humans (Grice EA, et al. (2009) Topographical and Temporal Diversity of the Human Skin Microbiome. Science 324(5931): 1190–1192). To produce a concise tutorial, the data have been reduced from the original dataset and may not represent the findings of the original study.
VI. Make an OTU Stacked Bar Chart
X. Beta Diversity (Morisita-Horn)
I. Begin a New Project
- An Explicet project is a single file that contains all of the OTU data, sample names (a.k.a. library names) and metadata that are to be analyzed as a unit. In other words, all data analyzed for one publication are drawn together into a single Explicet project, independent of how many 454/Miseq runs are involved.
- We will begin by creating a project and importing an OTU table. The tutorial example we have selected is based on the Human Skin Microbiome paper published by Grice, et al. This example was picked because it is relatively small and has a nice set of intuitive metadata available.
Please do not hesitate to ask questions or make suggestions via our Explicet Forum.
A. Create a New Project
- Open Explicet
- A pop-up window will open with several different options
Click Create Project
Enter Project Name, "Tutorial_HSM", when prompted
Click OK
- We now have a blank project in Explicet, and the name of the current project is displayed in the upper left corner of the current workspace window.
II. Import OTU Data
The first step in a new project is to import the data that comes out of the 16S pipeline runs into Explicet. In general, OTU tables are the most convenient form of data commonly generated by pipelines. For detailed information on how OTU tables are formatted, please see the Explicet Handbook. In short, OTU tables are a delimited file (tab-separated or comma-separated file) in which the rows are the OTUs, and the columns represent the number of each OTU seen in a given sample.
Explicet supports many other formats for importing the OTU data. For more details on the other OTU import formats, please see the Explicet Handbook. Later, we will discuss more data management tools that allow you to explore and modify subsets of the dataset without disrupting the larger project.
- Now we will import the data that will belong to the new project. Once data are imported to a project, they are permanently associated with the project. Additional data can be incrementally imported to the same project. Thus, the Explicet project file can grow as a project evolves.
A. Import the OTU Data
File → Import → File → OTU Table Counts
- Select “Tutorial_HSM_OTU_2_Explicet”
Click Open
- A dialog box below will open
On this dialog, Explicet tells the user how it is interpreting the rows and columns in the OTU table. The user needs to verify that Explicet has interpreted the table correctly. Note that in this case, Explicet is telling the user that it is not going to import column 2, “Total”, as it will generate that sort of information itself. If Explicet gets it wrong, the user can adjust the interpretation using the provided pull down lists under Column Type.
Click Import
- The OTU data now appear in the main Explicet window
III. Import Metadata
Now we will import the metadata associated with the OTU data. Metadata refers to information about the sequence data - in this case, a description of the samples and subjects from which the sequence data were generated. In our nomenclature, a “library” represents all of the sequences generated from a single sample (multiple libraries may be generated from a given sample, for example through multiple PCR reactions, but for this tutorial we will assume a one-to-one relationship between libraries and samples). In this study, the metadata for each library includes the anatomical position, microenvironment description, sample acquisition method, and side of the body associated with each skin sample. Just like the OTU data, metadata need be imported only once (unless you choose to add more metadata) - imported metadata are also incorporated into the Explicet project file. For detailed information on how to format metadata files, please see the Explicet Handbook. In short, the metadata file is a tab-separated or comma-separated file organized by columns, generally prepared with a spreadsheet package like Microsoft Excel. The first column contains the names of the libraries in the dataset; all subsequent columns are metadata items and their values associated with each library.
A. Import the Metadata
File → Import → Metadata
- Select “Tutorial_HSM_Metadata”
Click Open
- A pop-up window will open
- Make sure that the column containing the library name is selected
- Explicet searches all of the columns in the metadata file looking for the library names that were found when the taxonomy data were imported. In all but rare cases (e.g., when only a small portion of the sample names are present in the imported taxonomy data), Explicet will find the library column automatically.
Click Import
- A new pop-up window will open which displays the imported metadata
Click Done
For our example dataset, all of the library names were found in the metadata file, as indicated in the left-hand pane: i.e., the number under Used (30) matches the total number of libraries shown above the two panes (30 Total Libraries).
IV. Save the Project
- Now that all of the data associated with the project are imported, the file should be saved. Explicet does not auto-save, so remember to save your project frequently!
A. Save the Project
Click the Save button in upper right corner of the window
- Enter desired project name and location when prompted
- The default file name is the project name with an “_Explicet_Project” extension.
Click Save
- All of the imported information is now saved within the project file.
V. Adjust the Display
- Now we will adjust the project window display for ease of use.
A. Hierarchy, OTU, or Both
Both is the default
- This option creates two panes on workspace screen; the upper pane shows the Hierarchy, and the lower pane shows the OTUs. The Hierarchy pane allows exploration of the dataset in a “big tree” hierarchical context, whereas the OTU pane shows a more literal view of the data from the 16S pipeline. The information in the OTU pane is used for input into the statistics and most of the plots (except for pie charts, which are graphical depictions of the Hierarchy pane).
B. Counts, % of Library, % of Total
Select % of Library (Counts is the default)
- While Counts is the default (raw sequence data counts in integers), % of Library tends to be more useful. % of Library is relative abundance, which is important since the total number of Counts received from any library is beyond our control. Using the relative abundance, or % of Library, allows us to fairly compare libraries. Otherwise, the libraries that have a very large number of counts will skew conclusions.
C. OTU Displays
- These options control the manner in which the taxonomy lines are displayed on the OTU pane.
OTU Start: 1 is the default
This is the position (counting from one) of the first taxonomic category that the user desires to be displayed. In our tutorial example, the taxonomy lines in the OTU pane display will start with Bacteria (Bacteria is the “1”st lineage level).
Set OTU Width to 2 (“all” is the default)
- This is the number of positions on the line to be displayed. To save space on the screen, now only 2 taxonomic levels will be displayed in the OTU taxonomy line. Taxonomies with more than 2 levels will be shown with an embedded ellipsis; for example, “Bacteria/Actinobacteria/Acidimicrobiia/Acidimicrobiales” becomes “Bacteria/Actinobacteria/…/Acidimicrobiales”.
OTU Show Last on is the default
- This option appends the last item in the taxonomic line onto a truncated OTU lineage.
D. Hierarchy Level
Hierarchy Level: 3 is the default
- This controls the number of taxonomic categories that will be opened on the hierarchy pane.
- Since libraries are often cryptically named, it’s nice to add a readable metadata tag in the view so that we have some context for the libraries we are viewing. To do this, we will sort the libraries in the view based on anatomical position.
E. Sort Libraries Based on a Metadata Tag (Anatomical Position)
View → Sort Libraries
- A pop-up window will open
In left panel, select Anatomy
Click Add button between panels
- Name of metadata descriptor will appear in the right panel
Click Sort
- Pop-up window will disappear
- Both the hierarchy and OTU tables are now sorted by anatomical position
VI. Make an OTU Stacked Bar Chart
- Before diving into a detailed analysis, generating an overview of the dominant organisms that exist in the dataset can be useful. One way to do this is through an OTU stacked bar chart.
A. Create an OTU Stacked Bar Chart of the Top 10 Most Prevalent Taxa
Tools → Plot → OTU Stacked Bar
- A new window will appear with the OTU data available in the workspace
Click the Total column header to re-sort the OTUs by decreasing abundance
To display only the top 10 taxa in the project, note that the Total value of the 1st OTU in the column is 31.35
Note that the Total value of the 10th OTU in the column is 1.41
In the Include items between field, enter “1.41” into the first box (the lower bounding limit)
In the Include items between field, enter “31.35” into the second box (the upper bounding limit)
Click Select Range
- The top 10 OTUs are now highlighted
Click Plot
- A new window will appear containing stacked bar display options
To create a stacked bar chart which displays a big picture of the project components, select % of Total
Click OK
- A pop-up window with the OTU stacked bar chart appears
- We will now change the default title of the stacked bar chart and add axis labels.
B. Change the Title and Label the Axes
In the Plot Results window, click Plot Attributes
- A pop-up window will appear
On the Titles/Axes tab, enter “Top 10 Taxa” into the Plot field
Enter “OTU % of Total” into the X Axis field
Enter “Library Name” into the Y Axis field
Click Save
Plot Attributes window will disappear; labels appear on the plot
- Red and brown appear to be dominant colors in this plot. According to the legend, these colors belong to the “Actinobacteria” phylum. This information may be useful in guiding us toward a hypothesis involving the dominant taxa.
- Saving figures in Explicet is easy and convenient. Figures are saved within the larger project, so they stay linked to the data from which they were created and do not create additional files on your computer.
C. Save the OTU Stacked Bar Chart as a Figure
Click Done in the stacked bar chart Plot Results window
- The OTU Stacked Bar setup window is back on the screen
Click Save as Figure
- Enter stacked bar chart name in pop-up window
Click OK
Click Done
Once saved, the stacked bar chart and associated figure data can be recalled at any point by clicking the Figures button on the workspace window. This provides a convenient mechanism for editing figures during manuscript preparation. Figures can also be exported in a format suitable for further modification in dedicated drawing software.
VII. Make a Pie Chart
- Another useful way to generate an overview of the organisms that exist in the dataset is through a pie chart, which allows graphical depictions of the taxonomic hierarchy.
A. Create a Pie Chart of the Project Components
Tools → Plot → Pie Chart
- A new window will appear with the hierarchical data available in the workspace
- Shift-click all of the phyla in the list
Click Add to Pie
- The selected phyla which were added to the pie are now bold
Click Plot
- A new window will appear containing pie chart display options
To create only a single pie chart displaying the combined libraries’ data, select 30 Total Libraries
Click OK
- A pop-up window with the pie chart appears
- By looking at the pie chart of the phyla, it is clear that the brown wedge, Actinobacteria, is the most prevalent phylum in the data.
- Additionally, we can see that the green wedge, Proteobacteria, makes up the second largest portion of the total. To visualize the classes present within the Proteobacteria phylum, we can create pie chart sub-wedges.
B. Make a Pie Chart with Sub-Wedges
In the pie chart Plot Results window, click Done
The Taxonomy Pie Chart setup window is back on the screen
- Use the drop down arrow to the left of “Proteobacteria” to find the classes within the phylum
- Shift-click all of the classes in the list
Click Add to Pie
- The selected classes that were added to the pie are now bold
Click Plot
- A new window will appear containing pie chart display options
Again, we will create only a single pie chart displaying the combined libraries’ data, so click OK
- A pop-up window with the pie chart appears. We now see the classes within Proteobacteria represented as sub-wedges.
- In order to better differentiate between the different classes, we can change the color of the sub-wedges.
C. Change Wedge Colors in the Pie Chart
In the pie chart Plot Results window, click Plot Attributes
- A pop-up window will appear
Click on the Colors tab
- To pick a different wedge color, click on the color, and select a new color from the pop-up display
When finished, click Save in upper right corner of window
Plot Attributes window will disappear; changes will be shown on the plot
You may choose to save the pie chart as a figure. To do so, continue as shown earlier in the stacked bar chart example; close the graphics window, and select Save As Figure in the Pie Chart window.
VIII. Create a Workspace
- A workspace is a way for users to make experiments on copies or subsets of their entire data set, while keeping the original data fully intact.
- Although the skin is a single organ, it harbors microbial communities that live in a range of physiologically and topographically distinct niches. The back is typically a sebaceous region, whereas the umbilicus is often a moist region of the body. Therefore, these two niches may have different taxa present. We will create a workspace for a mini-experiment to compare data from only these two anatomical positions.
A. Create a New Workspace
File → New → Workspace from Current Workspace
- “from Current Workspace” allows us to copy all of the display changes we’ve already made to the new workspace.
- Enter desired workspace name in the pop-up window
Click OK to create the new workspace
- The name of the current workspace is displayed in the upper left corner of the window
IX. Apply a Filter
- To compare data from only the back and umbilicus, we need to separate these libraries from the other body parts. This is done in Explicet via “filters”.
A. Create a Filter
Data → Select Libraries
- New pop-up window appears for creation of filters
Click New on far right side of window
- Enter desired filter name in the pop-up window
Click OK
- The filter name will appear in upper left corner of window
- Now that we have created a new filter, we need to set up the parameters to filter by. We will select for all libraries that were sampled from the “back” or “umbilicus” anatomical sites.
B. Set Up the Filter Parameters
Click Add in the Metadata Criteria pane
Use the first pull-down menu to select “Anatomy” (Metadata to filter by)
Use the second pull-down menu to select “contains” (filter Operator)
Enter “back” into Value
Click Add in the Metadata Criteria pane
- Use the first pull-down menu to select “or”
Use the second pull-down menu to select “Anatomy” (Metadata to filter by)
Use the third pull-down menu to select “contains” (filter Operator)
Enter “umbilicus” into Value
To apply filter, click Select in upper right corner of window
Click Save Filter on far right side of window to keep the filter
Click Done in upper right corner of window
- Pop-up window will disappear
On the current workspace window, Selected Libraries is now selected, and the name of the Current Filter is displayed in the upper left corner of the window. The workspace window now only displays libraries from the 20 back and umbilicus samples.
X. Beta Diversity (Morisita-Horn)
- By viewing our libraries in a Morisita-Horn heatmap, we can estimate the similarity of the microbial communities present in the samples at these two anatomical positions. Morisita-Horn is an often used metric that can give insight into how similar or how different sets of samples are from each other by looking at the patterns of all of the different OTUs at the same time.
A. Create a Morisita-Horn Heatmap
Tools → Analyze → Beta Diversity → Morisita-Horn
- A new window will appear with a table of the sequence variant counts
Click Plot
- A new window will appear containing the heatmap of Morisita-Horn sequence variant counts
- Note: In our workspace, we have a filter applied, so the heatmap will only display results from our libraries of interest (only those libraries sampled from the back or umbilicus).
Anatomical positions with Morisita-Horn values near 1 (implying the samples’ constituent taxonomy patterns are very similar) appear black. Anatomical positions with Morisita-Horn values near 0 (implying the samples’ constituent taxonomy patterns are very different) appear red. Based on this data, the back is more similar across subjects than the umbilicus. Plot attributes allow control of plot characteristics and color usage as described earlier.
You may choose to save the Morisita-Horn heatmap as a figure. To do so, continue as shown earlier in the stacked bar chart example; close the graphics window, and select Save As Figure in the OTU Heatmap window.
XI. Alpha Diversity
- The alpha diversity statistics computed by Explicet are generally shown in one of two ways: either as a single value calculated at the size of the smallest library (known as the rarefaction point) or as multiple values plotted as collector’s curves for each library. Collector’s curves are the classic way to evaluate the impact of increasing sample size (i.e., more sequencing) on the information content of the dataset. All collector’s curves in Explicet are computed with rarefaction, meaning all libraries are resampled to allow fair comparison between libraries of greatly different size. The higher the resolution of the calculations (large number of bootstrap iterations, large number of steps), the slower the computations will proceed. It is recommended that users start with the defaults and then increase as needed to get the curves to smooth out. Very large bootstrap iterations and a large number of steps may result in a run of multiple days… So, start small and work up.
- The alpha diversity metrics are often quick, reliable ways to determine if samples in a dataset are sequenced adequately. Since we have a workspace set up to run mini-experiments on a subset of our data, we should make sure that the data is representative. We need to make sure that enough sequences were generated from the back and umbilicus samples to be considered representative of the anatomical position for a subject. We can test this by running an alpha diversity test called Good’s Coverage.
A. Run a Good's Coverage Test
Tools → Analyze → Alpha Diversity
- New pop-up window appears
To create curves, deselect Single statistic at Rarefaction point only
Click Bootstrap
When Bootstrap is finished running, click Plot
- A new pop-up window appears which lists the various alpha diversity tests
Select Goods
Click OK
- A new pop-up window appears showing the Good’s Coverage plot
- Since the curves on the plot generally reach asymptotes, we conclude that both sites were sampled reasonably well to be considered representative of the anatomical positions.
You may choose to save your Good’s Coverage plot as a figure. To do so, continue as shown earlier in the stacked bar chart example; close the graphics window, and select Save As Figure in the Two-Part window.
XII. Two-Part Test
- Now that we know our data are representative, we will continue with another statistical test. A Two-Part statistical test can identify taxa that differ between two groups. We will use the Two-Part test to compare sequence counts between the back and umbilicus. The Two-Part Test is a combined statistic that examines both the proportion of the samples that contain a given OTU and the median relative abundance of the OTU across two categories. Because microbiome data often are non-normally distributed, parametric tests such as the familiar t-test may not be appropriate. Consequently, we use a non-parametric Wilcoxon test to examine percent abundance data. For more information on the Two-Part Test, please see:
- Wagner BD, Robertson CE, Harris JK (2011) Application of Two-Part Statistics for Comparison of Sequence Variant Counts. PLoS ONE 6(5): e20296.
A. Run a Two-Part Test
Tools → Analyze → Two-Part
- A new pop-up window appears
- In order to compare the back data against the umbilicus data, we need to set up individual filters for each anatomical position. To do so, we will proceed as discussed earlier in “To create a filter…”.
Click Setup Filters
- New pop-up window appears for creation of filters
Click New on far right side of window
- Enter desired filter name in the pop-up window
Click OK
- The filter name will appear in upper left corner of window
- Now that we have created a new filter, we need to set up the parameters to filter by. We will select for all libraries which were sampled from the “back”.
B. Set Up Filter Parameters
Click Add in the Metadata Criteria pane
Use the first pull-down menu to select “Anatomy” (Metadata to filter by)
Use the second pull-down menu to select “contains” (filter Operator)
Enter “back” into Value
To apply filter, click Select in upper right corner of window
Click Save Filter on far right side of window to keep the filter
- Now we will create a separate filter for the umbilicus.
Click New on far right side of window
- Enter desired filter name in the pop-up window
Click OK
- The filter name will appear in upper left corner of window
- Now that we have created a new filter, we need to set up the parameters to filter by. We will select for all libraries which were sampled from the “umbilicus”.
Click Add in the Metadata Criteria pane
Use the first pull-down menu to select “Anatomy” (Metadata to filter by)
Use the second pull-down menu to select “contains” (filter Operator)
Enter “umbilicus” into Value
To apply filter, click Select in upper right corner of window
Click Save Filter on far right side of window to keep the filter
Click Done to return to the Two-Part test setup window
Select “Back” for the Category 1 Filter
Select “Umbilicus” for the Category 2 Filter
Click Calculate
Click Plot
A pop-up window with the Two-Part results displayed as a Manhattan Plot appears
- The Manhattan Plot displays logarithmically transformed p-values, with higher peaks representing lower (more significant) p-values. The horizontal lines represent p-values of 0.10, 0.05, and 0.01. Inclusion of the p=0.10 line is intended to highlight taxa that are approaching significance in an analysis. The x-axis represents the alphabetical position, by number, of each OTU name in the Two-Part setup dialog above.
- In the Manhattan Plot, the first significant peak (position 6) corresponds to Corynebacterium, which have a higher proportion and relative abundance in the umbilicus samples. The second peak (position 21) represents Propionibacterium that is present at a higher proportion and relative abundance in the back samples. The third peak that approaches significance (position 49) represents Anaerococcus. This taxon is not seen in many of the libraries generated from back samples, and thus is present at higher proportion and relative abundance in the umbilicus samples.
Data can be exported from the Plot Results window as tab delimited text using the export button (available in all graphics windows). The data incorporated for each taxon in the Two-Part statistic are summarized for each category. The number of samples with sequences belonging to an OTU within each category is designated “m”, proportion of positive libraries in a category “p”, and median relative abundance “med”.
You may choose to save the Two-Part test as a figure. To do so, continue as shown earlier in the stacked bar chart example; close the graphics window, and select Save As Figure in the Two-Part window.
This tutorial has provided a quick overview of how to use Explicet. For more complete information on Explicet capabilities, please see the Explicet Handbook. We will now save our changes and close the project.
XIII. Close the Project
Click the Close Project button in upper right corner of the window
- A pop-up window will open
Click Save
- The Explicet window will close, and all of the OTU data, metadata, and figures are now saved within the project file.
Thus ends a basic overview of some functions contained in Explicet. Please do not hesitate to ask questions or make suggestions via our Explicet Forum.