SPADE V2.0 - Spanning-tree Progression Analysis of Density-normalized Events
SPADE is a new analytical tool for single-cell cytometry data analysis. It views single-cell data as a high-dimensional point cloud and extracts the shape of the cloud, details described in:
SPADE was originally developed and implemented in MATLAB. The version that generated all figures in the above paper can be found at SPADE V1.0.
In the following, we provide an updated version, SPADE V2.0, with better graphical user interface and faster running speed. The improvement is achieved by re-implementing heavy-lifting calculations in C. If you find any bugs or have any question about the user's manual below, please contact Peng Qiu at peng.qiu@bme.gatech.edu, or qiupeng81@gmail.com.
Updates and Mailing List:
SPADE is still under active development. The software is updated every other month. If you would like to be informed on the updates, please join our Google group for SPADE updates, by following the link or by emailing me. I promise there won't be many emails.
License conditions:
The SPADE software is freely available for academic use. A patent for SPADE has been applied for on behalf of Stanford University. For license conditions, please contact the Office of Technology Licensing at Stanford (Kirsten Leute, kirsten.leute@stanford.edu).
Installation of source code version that requires Matlab:
(1) Requirement: install Matlab 7 or higher.
(2) Install Microsoft Visual C++ 2010 Express: vc_web.exe (a free c++ compiler, url)
, open Matlab,
type "mex -setup", and select the compiler. If you cannot or do not wish to use the suggested compiler, refer to (5).
(3) Download latest version 2.0 of SPADE: SPADE2_2013_06_27.zip
(4) Unzip the downloaded file, and add the directory of the unzipped files to matlab path.
Note: the bulk part of SPADE2.0 is written in Matlab. As long as your matlab is properly installed, the bulk part will run just fine. A few heavy-lifting calculations are written in C for faster speed. SPADE2 is about 15-20 times faster than the original prototype provided with the above NBT paper. In order for C code to run properly, the your computer has to have a C/C++ compiler that can be recognized by the Matlab on your computer. When SPADE is opened, it will check whether the c scripts are properly compiled, and automatically compile them if not. As long as your computer has a C/C++ compiler installed (usually yes for PCs, and almost always yes for Mac and Linux), the SPADE software will work fine.
Installation of pre-compiled standalone version in a PC without Matlab:
(1) Install MATLAB Compiler Runtime (MCR) version 7.14. MCRInstaller.exe
(2) Install Microsoft Visual C++ 2010 Express: vc_web.exe (a free c++ compiler, url).
(3) Download pre-compiled version of SPADE2.0: SPADE2_2013_06_27.exe
(4) Double click the downloaded exe to start the software.
Note: unfortunately, I don't have a precompiled standalone version for Mac without Matlab.
Previous Versions:
User Manual:
(1) Create a new folder, copy to this new folder all the FCS files you want
to analyze together. Data in the FCS files should be after compensation. A sample FCS file can be downloaded [here]. SPADE2 can handle compensation now!
(2) If you use the version that requires matlab, you need to open matlab and change directory to
the folder that contains the FCS files to be analyzed. If you use the pre-compiled versoin, skip this step.
(3) If you used the source code version, type "SPADE", and press enter. The main control window of the SPADE
software will show up. If you use the pre-compiled version, double click the exe file to start the software.
(4) Use the top panel of the GUI to select the directory that contains
the FCS files to be analyzed. Another window will pop-up, which lists all FCS
files in the selected folder. The second column is editable, in case the user
wants to define a short name for each FCS file. Click the "close" button when
finishing editing the second column.
(5) Use the second panel of the main control window to setup algorithm
parameters. The parameter setting will be stored in a file named "SPADE_paremeter.mat",
in the same directory.
Notes: - Although users are able to tune the values of the two parameters in the local
density calculation (bottom left panel), we advise against changing those
parameters. For all the flow and mass cytometry datasets we analyzed so far, we
have been using the above default values. - For compensation option: For fcs files from CyTOF, we should choose "ignore compensation", because there is no compensation needed and no compensation matrix stored in the file header. For fcs files from flow cytometry, if we choose the option of "apply compensation", SPADE2 will derive compensated data from the compensation matrix and raw data, so that the software operates on the compensated data. To check whether SPADE2 reads the data correctly, please refer to Note 3 under step (1) above. - If arcsinh transformation is needed: For CyTOF data, the cofactor should be 5; For flow cytometry data, the cofactor can be 100, 150, or 200. I typically use 150. - Which markers to use depends on what cell types/phenotype do we want to see in the SPADE tree. If we want the SPADE tree to repesent cell types A, B, C, we need to include the protein markers that can define those cell types for SPADE tree construction, typicall, the cell surface markers. Be sure to move markers to the list used for SPADE tree. In the default setting when the parameter setting window is first opened, no marker is in the list for building the SPADE tree, and such setting will result in error in the subsequent steps. - The outlier and target density parameters are the settings of how to downsample one fcs file. The above setting means: we will throw away 1% of cells with the lowest local density, and then choose downsampling target density such that 20000 cells will survive the downsampling process. If the total number of cells (after the 1% is thrown away) is smaller than 20000, no downsampling is performed. - If multiple fcs files are processed together and each results in 20000 cells after downsampling, the number of cells in the pooled downsampled data will contain a large number of cells, and the subsequent clustering step could be very slow. To guard against that, we have a parameter "Max allowable cells in pooled downsampled data" whose default is 50000. If the pooled downsampled data has more cells than this parameter, we will perform uniform downsampling to reduce the number of cells to this parameter.
(6) After closing the parameter setting window, run SPADE by clicking the buttons in the third panel of the main
control window. You can click the three small buttons sequentially (please wait
until one button to finish before clicking the next one). Or, you can click the
bigger button, which is equivalent to clicking the three smaller buttons
sequentially.
Note: SPADE will produce a number of .mat files to store intermediate results.
One .mat file for each FCS file, plus two other .mat files.
SPADE2_2013_03_29.zip
SPADE2_2013_03_29.exe
SPADE2_2013_02_22.zip
SPADE2_2013_02_22.exe
SPADE2_2012_11_15.zip
SPADE2_2012_11_15.exe
SPADE2_2012_06_16.zip
SPADE2_2012_06_16.exe
SPADE2_2012_04_02.zip
SPADE2_2012_04_02.exe
Note 1: For raw fcs data files generated by CyTOF, SPADE2 can handle them directly. If you would like to use FlowJo to perform some initial gating and export the gated data to SPADE, please use the latest version of FlowJo V10, and export in FCS3.0 format. Otherwise, the exported fcs file won't be opened correctly by SPADE2.
Note 2: For fcs files generated by flow cytometry, SPADE2 is now able to read the compensation matrix embedded in the file header and apply it to obtain compensated data.
Note 3: Since different flow cytometors and softwares may save fcs files into slightly different format, it is difficult to make sure that SPADE2 works well with all fcs files. Therefore, we now provide a way to allow users to check whether SPADE2 "sees" the correct (compensated) data. During the computation of local densities in step (6) below, SPADE2 creates a folder named "check_loaded_data". The folder contains one tab-delimited txt file for each fcs file. Each txt file contains the data for the first 10 cells/events, and can be opened by excel, which show the data that SPADE2 sees. If you load a fcs file into FlowJo and export it into CSV format, you will be able to use excel to see the uncompensated and compensated data that FlowJo sees. A comparison between the txt from SPADE2 and csv from FlowJo will be able to confirm whether SPADE2 loads the data correctly. If yes, we are good to go. If not, send me a message and I might be able to help.
- For the particular sample FCS file provided here (2-marker simulated CyTOF data, FCS3.0 format, no compensation),
the appropriate setting is shown in the above figure.
use both markers to construct the tree;
use cells in all files (only one here) to construct the tree;
use arcsinh and cofactor=5 to transform the data;
choose the outlier and target densities as indicated above;
choose the number of desired clusters to be 100;
(7) Finally, to visualize the SPADE result, click the bottom button in the main control window. The following window will show up.
Meaning of the plots:
- The upper left panel shows the SPADE tree.
- The bottom panels are the dot plot and contour plot for the pooled downsampled
cells. The axes are the values after compensation and transformation defined in the parameter setting window. Since these panels are based on downsampled data, in which the rare populations are enhenced, the contour plot here may look different from the contour plots drawn by FlowJo or other software.
Editing the tree:
- The tree nodes can be selected by mouse. The software also supports the use of
"Ctrl" and "Shift" keys together with the mouse. Numerical index of the selected
nodes are listed in the upper right editbox (next to the button named "add to
annotation"). The selection can be changed by
editing the content of that editbox.
- Selected nodes can be dragged around by mouse.
- Using the buttons in the edit SPADE tree panel, we can change the position of the selected nodes: rotate, expend and shrink.
- We can globally change the node size of all the tree nodes.
- By clicking the "Update plots" button in between the two bottom plots, cells in
the selected nodes will be overlaid on the biaxial plots.
- By clicking the "Export selected cells to fcs", a series of fcs
files will be generated, each containing the cells in selected nodes for one fcs file. This
function can be slow if the number of fcs files in this folder is large.
Annotate the tree:
- If a set of nodes are selected and the user wants to draw a bubble around
it to annotate this part of the tree, use the "add to annotation" to save the selection.
- The saved annotations are displayed in the listbox in the second panel of the
right half of this window.
- The use can choose whether to display the annotations or not, using the three
radiobuttons below the listbox for annotations.
Color the tree and interpretation:
- Select "expr" as color definition + select a marker used for tree construction + POOLED file: the tree shows how the selected protein marker behave across the entire tree. If the numerical range of the colorbar is comparable to the range of axis in the bottom panels (around of less than 5 fold difference), it means the cluster average of the marker is not drastically different from its expression in individual cells, and the color variation is interpretable. Otherwise, if the numerical range of the colorbar is too small (i.e. 20 time smaller than the range in the biaxial plot), the color variation is not interpretable, the select marker has little correlation with the other markers used to build the tree, and contribute little to the tree structure.
- Select "expr" as color definition + select a marker used for tree construction + indivudial file: this selection does not make much sense, because the resulting colored tree will be extremely similar to the color when POOLED file is selected.
- Select "expr" as color definition + select a marker not used for tree construction + POOLED file, the tree again shows how the selected protein marker behave across the entire tree. If the selected marker is highly correlated to the ones used to build the tree (as if it can be included in defining the phenotypes), the numerical range of the colorbar will be comparable to the range of axis in the bottom panels (around of less than 5 fold difference), and the colored tree has the same interpretation as above. This typically works for protein markers whose behavior do not change across the fcs files. For proteins that behave differently across different fcs files (i.e., experimental conditions), we should not use POOLED file, it is better to use individual file, as in the next bullet.
- Select "expr" as color definition + select a marker not used for tree construction + indivudial file: this selection shows the behavior of one protein marker in a specific fcs file (experiment condition). It is helpful to compare the difference of the marker's behavior across conditions.
- Select "ratio" as color definition + select a marker not used for tree construction + indivudial file + another individual file for reference: this selection takes the behaviors of the selected marker in the two individual fcs files, and shows their difference. It is a subtraction in the transformed data. Since the arcsinh transformation is log-like, substraction in the transformed domain can be viewed as ratio in the original intensity scale.
- Select "ratio" as color definition + select a marker used for tree construction: this does not make sense, similar to the second point above.
- Selecting "cell freq" as color definition + an individual file is equivalent to selecting "expr" + "CellFreq" marker + individual file. This option colors the tree by the percentage of cells in each node for a selected file. This shows which parts of the tree are occupied by cells in the selected file.
- Selecting "ratio" as color definition + "CellFreq" marker + individual file + another individual file as ref. This option colors the tree by the percentage difference of cells in each node for the two selected files. It is equivalent to take two colored trees in the above bullet and display their difference in percentages.
- For the options "expr" and "ratio", if the color is defined based on an individual file and no cells in this file
belong to a certain tree node, the node will be colored as white.
Export results:
- Export tree figures colored by expression of each marker based on the pooled
data, into one single file "SPADE_tree_colored_by_markers.ps".
- Export tree figures colored by cell frequency computed based on each individual file,
into one single file "SPADE_tree_colored_by_CellFreq.ps".
- Export the topology and layout of the tree into GML and txt format, "SPADE_tree.gml",
"SPADE_tree_adjacency.txt" and "SPADE_tree_node_positions.txt".
- Export the average expression for each marker and each node/annotation ("SPADE_tree_node_annot_marker_expr.txt"),
counts of cells that belong to each node/annotation for each fcs file ("SPADE_tree_node_annot_cell_freq.txt"),
and definitions of the annotations ("SPADE_tree_annotation_definition.txt").
- Export the clustering results. For each fcs file, the software exports another
file, which contains an additional column that describes which node of the SPADE
tree each cell belongs to. This function might be slow if there are a large
number of fcs files in this analysis.
Note: upon closing this window, edits of the tree layout and the annotations will be saved, so that the same figures will show up next time is window is opened.
(8) When selecting the working directory from the main control window, if the selected directory is already analyzed, the user can directly view the results, by clicking the bottom button in the main window.
(9) If you have matlab, there is a command line version that can replace the parameter-setting and run-SPADE buttons in the main control window. Copy "SPADE_script_for_the_SimulatedRawData.m" to the same folder that contains the above example fcs file, and run this script. After the script is finished, invoke the GUI to visualize the results.