Single-Cell Analysis ==================== Go to the bottom of the page for a list of useful links to packages / algorithms used in this module of PalmettoBUG. Loading a Single-Cell Analysis from Image Processing ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To pick where we left of in the image processing documentation, we had just completed region measurements, and were going to load an analysis: |image1| When the “Go to Analysis!” button is clicked, PalmettoBUG will attempt to load the analysis folder and should launch a new window containing the Analysis_panel and metadata tables: |image2| In these windows, our mission – should we choose to accept it – will be to categorize the antigens in the dataset and assign key metadata categories, particularly those pertaining to the groups being compared in the experiment (treatment vs. control). Specifically, in the first table (Analysis_panel) there are two editable columns, ‘antigen’, and ‘marker_class’: 1). ‘antigen’ column: this is here mainly for when loading an analysis from FCS files directly, as when loading from image processing, these names are usually already correct and don’t need editing. This column will contain the names of the antigens as displayed in plots, and in the GUI options. You want these names to be biologically relevant, and not confusing or biologically irrelevant labels like “Nd142Di”. .. 2). ‘marker_class’ column: This column can have three values – ‘type’, ‘state’, or ‘none’ – these are categories of antigen which are important for the single-cell analysis. Antigen’s that you don’t want included in most plots, such as segmentation markers like DNA1/2, should be set to ‘none’. Antigens that you want to use as lineage marker for clustering the cells should be set to ‘type’. Finally, antigens that you wish to use as indications of the state of a given cell type, or if you want to do statistics on any antigen’s expression after clustering, you set to ‘state’. This convention is inherited / borrowed from :ref:`CATALYST `. For the second (metadata) table, there are again two columns that can be edited, ‘patient_id’ and ‘condition’, but instead of assigning values to the antigens in the dataset, we are assigning values to the images / regions of interest / individual FCS files in the dataset: 1). ‘patient_id’ column: this column can be used to apply an additional grouping to the images / FCS files that is not the treatment condition of interest. For example, if a set of images all came from the same patient (hence the column’s name), then that could be noted here. However, it does not have to be only patient id’s in this column – any valid grouping could be considered, including things like data collected / batch numbers. .. 2). ‘condition’ column: This column should contain the treatment vs. control, healthy vs disease state, or other comparison of interest in the experiment. It is used to group the data when performing certain comparisons and statistics. When you press the “Accept Choices” button, both of these tables are written as CSV files to the analysis folder (as Analysis_panel.csv and metadata.csv inside the */main* sub-folder) and you should be immediately moved into the program’s Analysis tab: |image3| .. note:: If you re-load a previously created Analysis, then the *Analysis_panel* and *metadata* files will be read from the analysis directory, so you do not have to recreate these files every time an analysis is reloaded. However, you will still be offered the opportunity to examine & edit the files. Loading a Single-Cell Analysis from the starting screen: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Loading from the starting screen requires that you select the folder containing the analysis. As covered in the Getting Started / Loading data documentation page, the key requirement is that within the folder you select there is a */main/Analysis_fcs* folder containing the FCS files of your data (and only those FCS files). If you are doing a solution-mode experiment, then you will need to manually step up these folders, and then load the directory. For imaging projects in PalmettoBUG, every sub-folder of */Analyses* (i.e., every individual analysis made for the imaging project) will be a valid folder for loading in this way. .. note:: This means that you can load the single-cell analysis portion of an imaging project in 2 ways – either from the starting page, loading from FCS & selecting the analysis sub-folder, or by loading entire imaging project then entering the analysis from the image processing tab as described above. |image4| Once you have selected the correct folder, the loading process is very similar as above – you will provide the information for the two tables (*Analysis_panel* and *metadata*) in an identical pop-up window. Again, if you are reloading the analysis, then the prior *Analysis_panel* and *metadata* files will be used so you do not have to re-do or edit them (unless you want to). Beginning a Single Cell Analysis: initial plots ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The analysis tab of PalmettoBUG has its buttons arranged in a number of major regions: |image5| Here I’ll start by showing some preliminary plots you might make – countplots, MDS plots, plus a heatmap & smoothed histogram of marker expression in the ROIs: |image6| These plots are all mainly concerned with broad patterns in the data, such as whether your treatment and control groups have identifiably different expression patterns, as well as the identification of outliers that may need to be dropped before proceeding with the analysis. For example, you might consider dropping a sample / ROI (as in, a unique sample_id) from the analysis if it has too few cells in it, is distant on the MDS plot from all the other sample_id’s, and/or has a highly divergent marker expression pattern on the heatmap/smoothed histogram plots. These plots can also give you a indication of whether batch correction might be needed, such if the samples in your patient_id clusters tightly together, indicating that whatever grouping you placed in patient_id (whether batch or patient numbers, or any other confounding variable) may be influencing the data more than your variable of interest in the condition column. Do note, however, that these plots are capturing coarse, global differences between the samples, and don’t yet take into account cell type abundances and other finer details – so even if your conditions don’t show any clear separation in their expression profiles at this step, there might still be differences once you get into cell clustering and annotation. There is one more preliminary plot to look into: the NRS plot. Like many of the function in PalmettoBUG, it was borrowed from the CATALYST analysis package (https://github.com/HelenaLC/CATALYST). The NRS (non-redundancy score) plot seeks to identify the antigens in the data set that contain the most information or are the least redundant with other antigens in the experiment. It can give an indication if a marker will be useful clustering the cells or not. |image7| .. attention:: The preliminary plots that deal with marker expression (all except the countplot) are sensitive to how the data is scaled / batch corrected! The expression patterns / NRS scores could change if you intend on scaling the data. .. attention:: For all plotting windows in PalmettoBUG (and also for windows that create data tables), you will typically be prompted for a filename. **A default filename is always present as an example, but might not match the plot you are actually intending on making!** Be sure to rename these as needed, and to avoid overwriting prior plots. Scaling and Cleaning Data ~~~~~~~~~~~~~~~~~~~~~~~~~ Now that we’ve looked at some of the preliminary plots that can be generated in PalmettoBUG, we can turn to functions and buttons for cleaning the data. |image8| There are four major actions in this bank of buttons: 1). Reloading the experiment. This will reset the experiment by re-reading from the files on the disk, both the FCS files and the metadata/Analysis_panel csv’s. This **undoes any non-permanent data transformations**, **including** all three of the other functions below (**batch correction, scaling, filtering**) **as well as clearing any cell groupings and dimensionality reductions**, while also taking in any changes made to the metadata and Analysis_panel csv’s. For example, if you realize you made a mistake in labeling a the sample’s condition, or if you wanted to change an antigen’s marker_class you would want to edit the appropriate csv file, then reload the experiment – but then remember to re-do any scaling / data filtering, etc. that you needed for the experiment. .. 2). Filtering the data (or dropping data). This windows allows the selection of a grouping in the data – whether that’s sample_id, patient_id, condition, or even a cell clustering/annotation – and drop one of the unique groups within that grouping out of the data. This is commonly done to remove outlier samples or to drop cell types that are not of interest (erythrocytes in a solution-mode immune dataset for example). .. important:: Data dropped from an analysis in this way can only be restored by reloading the analysis! 3). Batch correction. ComBat batch correction in PalmettoBUG is performed by a thin wrapper on the scanpy function that performs this style of batch correction, you can read their documentation for more information (https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.combat.html). .. important:: If a batch correction step is performed BEFORE any scaling of the data, then it can only be un-done by reloading the experiment. However, if a batch correction step is performed after scaling the data, then a second scaling of the data (including unscaling) will discard that batch correction. .. 4). Scaling. I address scaling last because there is the most to say about it. PalmettoBUG lets you perform a few different methods of scaling which I will address below. Unlike filtering & batch correction, scaling can always be un-done without reloading the experiment, as the data before scaling is preserved, allowing easy switching between different scaling options. However, still note that when an experiment is reloaded the data will automatically be in its unscaled state again (just as it will also be un-batch corrected and all dropped data will be restored). **Scaling details** The scaling option in PalmettoBUG scales data within in channel / antigen of the data. There are several different ways to scale the data inside PalmettoBUG: .. Important:: The data is always **first** transformed using the data = arcsinh(data / 5) transformation commonly used in mass cytometry – **the scaling options are all done on the arcsinh transformed data**. .. 1). Min-max scaling. This simply scales each channel such that its maximum value is 1, and its minimum value is 0, and all the intermediate values are between 0 and 1. This method preserves the distribution of the original and does not attempt to deal with outliers within a channel. For example, if a channel has a few extremely bright cells, but mostly cells with dim expression, then those bright cells will be close to 1 in value, while all the other cells in the data may have values compressed around 0. **The main advantage** of this method is that is places all the channels “on the same playing field”. As in, every channel will now have value in the same range (0-1), unlike the raw data where an antigen could end being much brighter than another. For certain methods (such as FlowSOM clustering) that use the “distance” between cells in expression-space, having some channels with much higher values would mean that those channels could dominate the clustering, while dimmer channels are ignored. 2). %quantile scaling. This form of scaling is very similar to min-max, with one difference: the channels are not only scaled to values between 0-1, but extremely bright cells (those above a provided quantile threshold) do not influence the distribution of data as much. Specifically, any cells with higher values than the value at the provided quantile (often 99.9%, but can be chosen by the user) will all be set == 1. This can help deal with extremely bright outlier cells in the data, while still provided the benefit of min-max scaling by placing all the channels in the range 0-1. 3 & 4). Standard and Robust scaling. These methods seek to center the data at 0, with variance of +/- 1. Standard scaling assumes a normal distribution (using mean + standard deviation to scale), while the robust scaled uses quantile values and is therefore less influenced by outliers. However, unlike the previous two scaling methods, these do not guarantee data values in a particular range – outlier cells are not removed with these methods and can have different maximum values in different channels. For more information about these scaling options as well as min-max scaling see this helpful page in scikit-learn’s documentation: `https://scikit-learn.org/1.5/auto_examples/preprocessing/plot_all_scaling.html#sphx-glr-auto-examples-preprocessing-plot-all-scaling-py `__ (note that quantile transformer in that link is NOT qnorm!). 5). Quantile Normalization (qnorm). This type of scaling uses a rank-based procedure (see wikipedia https://en.wikipedia.org/wiki/Quantile_normalization and/or the python package being used https://github.com/Maarten-vd-Sande/qnorm) to aggressively reshape the data from every channel into both a similar distribution and the same range of values at the same time. **While scaling was covered in this second section of the documentation, it is frequently performed first, before the preliminary plots are made.** This is in part because it can affect the outputs of these preliminary plots, so it is best to do the scaling first. However, the preliminary plots may still be useful to create with unscaled data as that could help indicate whether scaling is needed or not. *Examples of the effects of scaling on the data (using violin plots – more on those in the clustering plots):* |image9| Dimensionality Reduction (PCA, UMAP) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ One frequently important step in analyzing high-dimensional single-cell data Is visualizing the data using a dimensionality reduction algorithm. In PalmettoBUG, the two dimensionality reduction algorithms available are PCA or UMAP. UMAP tends to produce more visually distinct and interpretable groups and shapes once plotted, while PCA is faster to run and may be easier to interpret the individual X & Y value, as these are the top two principal components in the data. Dimensionality reductions are typically run on a down-sampled subset of the data, where a set number of cells is sampled from each FCS / ROI in the dataset (or the entire FCS is used if the down-sample number is greater than the number of cells in that FCS). This is for a few reasons, 1). UMAP can be computationally expensive so fewer cells makes it run faster, 2). This helps prevent excessively imbalanced representation of the samples in downstream plots and 3). the final dot plots created after dimensionality reduction can easily get over-crowded and hard to interpret, so limiting the number of cells can help prevent that. This down-sampling is also applied when running PCA, even though PCA is a much faster algorithm. *If you don’t want this down-sampling, set the down-sampling number to be greater than the number of cells in the largest FCS / ROI.* Once the dimensionality reduction is created, it can be plotted either for the whole dataset or as a set of facetted subsets of the dataset. For example, you can plot a separate UMAP for each condition in the dataset, to see if the cells in the same condition form groups. Dimensionality reductions are also very helpful after clustering (see next section), as they can help visualize the cell clustering and indicate how distinct the discovered cell clusters are. *Example of Dimensionality Reduction plots:* |image10| Clustering and Annotation ~~~~~~~~~~~~~~~~~~~~~~~~~ Once you’ve had a first look at the data and did any needed data cleaning, it is time to try to cluster the cells into groups. In PalmettoBUG this is mainly done in one of two different unsupervised clustering algorithms, both being non-deterministic and requiring a random seed: 1). FlowSOM clustering. Like the CATALYST workflow that PalmettoBUG was originally modeled on, FlowSOM clustering is a major and fast way to group cells into clusters. This method work by initializing a self-organizing map (SOM) as a grid of points within the data, that will then be fit to the shape of the data (see an example of what that looks like: https://upload.wikimedia.org/wikipedia/commons/6/61/2D_data_training_SOM.gif ). The grid points of the SOM are then grouped together into a preset, user-specified number of “metaclusters” by an automatic merging process to produce the final clustering groups. .. 2). UMAP + Leiden clustering. In this method, a UMAP embedding is performed using all the cells in the dataset (this can be a slow process), followed by Leiden clustering on that UMAP embedding. In this style of clustering, the final number of groups is determined by the algorithm itself, and not preset by the user to a specific number of final clusters. This means that getting the hyperparameters of this style of clustering can be important to prevent an excessive (or too few) set of clusters being generated that is difficult to annotate. There is one major alternative way to group cells, based on pixel classification, but that is mainly discussed in the documentation about pixel classifiers (if you are interested in that method see here: :doc:`UsingPixelClassifier` ). Just note that if a pixel classifier is created in an image-based project, and cell masks are classified into groups from its output, those cell classes can be loaded into a single-cell analysis created from the same cell segmentation masks, and used instead of the primary pipeline of FlowSOM / Leiden + annotation. *FlowSOM and Leiden windows in the GUI:* |image11| **Plotting the Clusters** Once a clustering has been created, the next step is to visualize what that clustering looks like. There are variety of useful plots for this: two types of heatmap, violin plots, dimensionality reductions, and smoothed histograms. The main purpose of these plots are to see if the clusters generated are reasonable and align with the biological expectations (identifiable cell lineages) as well as to assist in the annotation and merging of the clusters in those final biological labels. *My favorites for annotation are the violin plots, dimensionality reduction plots, and heatmaps* (Note how the violin plots are HUGE, and typically require some zooming in to see the details, but that excellent level of details is why they are so useful): |image12| |image13| In addition to the plots shown above, you can look at statistical comparisons of marker expressions between the groups as an additional way to gauge what each cluster is. This is done by comparing the cell expressions of antigens in each cluster and comparing those with the expressions in the rest of the dataset (using an ANOVA test to compare the two for each cluster one-by-one). The statistics from this can then either be exported in a table on a cluster-by-cluster basis, or used to create a heatmap of the statistical significance of the difference in expression between groups. ] |image14| .. note:: Because the clustering algorithms used are non-deterministic, it is valuable to test multiple different random seeds to see if the clustering achieve is consistent across those seeds, or if it is unstable and shifts dramatically between the seeds. Annotation ~~~~~~~~~~ Once you have examined your cell clustering and are satisfied with their quality and stability, the next step is to use the plots you have generated to annotate the clusters into identifiable, biologically relevant lineages. The output of the unsupervised clustering algorithms in the prior section is just to assign every cell to a number which by itself carries no explicit biological meaning. Our job now is to identify what cell type each clustering number is most likely to be. Usually, this is accomplished by identifying the cells high in a particular marker (say CD4) and assigning them to the expected cell type that expresses that marker (such as CD4+ T cells). To do this in PalmettoBUG, we use a merging table, where the first column contains every cluster’s number and the second column contains a biological name that matches the cluster. If the same biological name is used for more than one cluster number, then those two clusters are merged into a single group with the assigned biological label. In fact, it is preferred when FlowSOM clustering produces somewhat MORE clusters than there are relevant cell types (so called “overclustering”), as having more clusters increases the likelihood of identifying rare cell types, while the merging process during annotation can easily fix the excess / redundant of clusters. This advantage of overclustering is in tension with not wanting to over-complicate the annotation process for the scientist, so often a slight / moderate over-clustering is aimed for. *Example merging table:* |image15| Note that in the merging table window, there is both an option for choosing to merge either a metaclustering (from FlowSOM) or a leiden clustering, since both can be present at once in an experiment. Additionally, if you have performed any merging earlier in the same analysis, there is a drop down at the top of the merging window that allows you to select your prior merging and load it into the window (the values from the that merging with populate the widgets of the table). This allows you another way to easily reload a cluster merging, without needing to re-do everything. Once an annotation of the cells has been made, it is possible to make all the same plots as above (for cell clusterings), just now using the merged & annotated labels\ : |image16| Additionally, the abundance plot is available – while this plot type could be used for a metaclustering / leiden, because they are dedicated to visualizing the differing quantities of cell groupings across the conditions of the experiment, they are usually saved for the final merging: |image17| **Saving and reloading annotations** To help save time, and allow you not have to redo clustering every time you re-enter or reload an experiment, PalmettoBUG allows you to save a cell clustering or annotation. |image18| When saving, if a cell grouping is available to save, then it will appear in the drop down as an option, and similarly when loading, if a saved clustering is available it will show up in the drop down menu (this works by looking in the expected */main/clusterings* folder in the analysis directory for any saves). When a grouping is saved, it is saved in a csv file using the type of grouping (metaclustering, leiden, merging, classification) + the identifier in the text field below the drop-down menu in order to name the file. This means that if you want to save two different metaclusterings, for example, you will need to save with two different identifiers (often, this would be the random seeds of the FlowSOMs). When reloading, note that the data ideally should be the same state (the same batch correction, scaling, etc.) as when the grouping being loaded was originally saved. If not, you will likely get an error or warning message, depending on if the difference was enough to block the reload. For example, if you dropped a sample, ran the FlowSOM clustering, and then saved that (meta)clustering, if you reload the analysis you MUST drop the sample again to reload the clustering – because the number of clustering labels and the number of cells would then not align and the program will refuse to load it. The same window that handles clustering save / load also handles the loading of cell type annotations derived from a pixel classifier – if you are interested in that capacity, go to the pixel classification documentation pages! I’ll only mention here that if a pixel-derived classification does exist, it should be visible in the drop down, and that if you do use this then it will be referred to as “classification” by the Analysis tab's widgets, and can be saved, used to make plots, statistics, etc. the same as any other cell grouping. Statistics ~~~~~~~~~~ In PalmettoBUG, there are two major types of statistical test available: comparison of cell type abundance between conditions, and comparison of antigen expression within cell types between conditions. These two major types of test are inherited from the CATALYST / diffcyt workflow on which PalmettoBUG was modeled / derived. However, due to the differences in statistical tests easily available in Python vs R, PalmettoBUG implementation tends to rely on simpler tests (for example, PalmettoBUG only offers ANOVA or Kruskal-Wallis tests for expression comparisons, and does not offer GLMM models for either comparison type). For abundance tests, PalmettoBUG can do 1). ANOVA or a Gaussian GLM on the portion of cell types in each sample. As in, cell type abundances are converted to fractions of the total in each sample, and those fractions are used to perform the test. This is a very simple type of test, but is likely not as good as option 2: .. 2). Poisson or Negative Binomial Generalized Linear models (GLMs) on the count data from each sample. Since the abundance of cell types in each sample / image can be thought of as counting process, using these tests should better model the data. For testing the expression of antigens (usually ‘state’ markers) within cell types, PalmettoBUG can only do ANOVA or Kruskal-Wallis tests, using either the mean or median expression value for each cell type within each sample / image. Notice that this test is not performed using the cells as the sample population, but instead the aggregate statistic (mean / median) of each sample / image in the dataset. **Selecting the Experimental 'N'** PalmettoBUG lets you select an experimental 'N' for how statistical tests are performed. By default, this is 'sample_id' -- essentially treating each ROI / image as an experimental unit when plotting and doing hypothesis tests. However, it is not uncommon to have multiple images / ROIs from one animal where the animal is the real biological / experimental unit of interest and not each individual image. In this case, if you are intending on doing stats inside PalmettoBUG (although generally it is better if stats are done outside PalmettoBUG - after exporting the data - by an expert using a more statistically-focused software) then you can use an alternate column in the data as your experimental N. In thge GUI, this is usually patient_id. The way this works is that an aggregate statistic (the mean) is taken for the data within each category of the selected N column, and then those means are used for the statistical tesst / plot. Normally, this means aggregating the cell data within each ROI, if applicable. The functions affected by this include the statistical tests, cell-type abundance plots where a distribution is shown, and the state expression plots (in scripting, these are the methods of a palmettobug.Analysis object which have an 'N_column' parameter). It also affects certain spatial functions, including SpaceANOVA & the statistics run on EDT transforms. **Critically for the Experimental N:** whatever column you use must contain categories that are unique to each condition - you absolutely cannot have categories that are shared between conditions. In practice, this means that if one or more patient_id's is shared across the conditions, then you must edit it to make sure that the patient_id's become unique for each condition (say, by appending the condition name to the patient_id, even if it 'splits' a patient_id category). .. warning:: Variability or batch effects between different runs of the imaging mass cytometer or in the stability of FFPE tissue can make the comparison of expression unreliable, unless you have a plan for controlling these effects! .. note:: For both types of statistical test, PalmettoBUG does all possible comparisons at once, and automatically calculates a False Discovery Rate (FDR)-corrected p-value to account for the multi-comparison using the Benjamini-Hochberg test (as implemented in scipy). Both the unadjusted and adjusted p-values are displayed in the final exported stats tables: |image19| These statistics tables (like most other tables generated by this portion of the program) are saved in a *main/Data_tables* folder of the analysis directory. **Be careful with the filenames and the default values in the GUI – change them to fit your needs!** Export / Reload of Data ~~~~~~~~~~~~~~~~~~~~~~~ The majority of the data in a PalmettoBUG analysis is just a large excel table at heart, with a couple smaller tables of metadata (specifically, the data is *mostly* stored within an anndata object, see: https://anndata.readthedocs.io/en/stable/ if you want to know more of the inner workings of the python code). The data & critical metadata can be exported as a single large CSV file by PalmettoBUG, allowing easier access by another software (such as excel, Prism, or other code-based pipeline), in case you want to do some of your data analysis outside PalmettoBUG. |image20| Additionally, a CSV exported in this way can be reloaded back into PalmettoBUG, as an easy way to transfer an analysis between two different installations of the program, although there are a few important things to note about this export if you intend on reloading it into PalmettoBUG: 1). While *most* of the information can be transferred into the CSV file, currently **no spatial information** is. This means that if you re-load the CSV of an imaging experiment back into PalmettoBUG, you will be able to do the single-cell analysis steps & plotting as show in this document, but you will not be able to do spatial analysis. .. 2). The data will be saved exactly as it is – meaning that if you scaled or batch corrected the data, etc. those changes will be propagated into the exported file. **If you want the raw, unaltered data** (except arcsinh transformation, which is always done), **load / reload the analysis immediately before exporting!** 3). If your intention is to ONLY reload back into PalmettoBUG, then you will want to transfer the marker_class information ('type' / 'state' / 'none') as well. There is a check box for this purpose in the exportation window which writes the marker_class information to the final row of the exported CSV However, you will NOT want to do this if you intend on loading the CSV into any other program (it will probably cause errors unless you manually remove that final row of marker_class information). You can also choose to only export a subset of the data or choose to export aggregate statistics from data. For example, you could choose to only export the counts of celltypes in samples 1-5. These abilities are provided so that you can easily drop unwanted data when exporting, and allow you to prepare the data so that it is more straightforward to analyze by another software. Of course, if you do export an aggregate statistic instead of the raw data, do not try to load the CSV back into PalmettoBUG! Finally, you also can export the (X, Y) coordinates from the dimensionality reductions (UMAP and PCA embeddings), so that it is possible to plot the UMAP in another software. Because of the down-sampling for UMAP / PCA, the export will also include the cell number from the original data for each point – allowing the dimensionality reduction to be merged with the original data before plotting. Miscellaneous: Loading Regionprops & more ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ **Load Regionprops** One of the final PalmettoBUG capacities worth mentioning is the ability to load the region properties of the cell masks into the analysis as if they were antigens. This button depends on the analysis having been derived from an imaging experiment. It will load the shape-based properties of the cell mask like **area, perimeter, etc.** into the analysis as if these properties were antigens / metal channels. This allows you to set these properties to a marker_class and then plot or cluster the data using them (+ any other antigens of the same marker_class), just as you would with any other antigen. Region properties tend to have very different distributions / scales than normal antigens and the types of regionprops offered by the PalmettoBUG GUI are limited, so this is not anticipated to be an option that is commonly used. |image21| **Scatterplots** PalmettoBUG has a rudimentary scatterplot option to help visualize the distribution of one channel's expression per cell vs. another channel's expression. Note that no gating is possible in PalmettoBUG for these plots, they are only available to aid in visualizing the data. **Clustering-to-ClassyMasks** If you want to directly visualize the spatial characteristics of a clustering (FlowSOM metaclustering, Leiden, or merging) that you made in the Analysis tab of the program -- especially in Napari -- you can convert the clustering labels to "classy masks". Classy masks are created by assigning each unique clustering value to a unique integer value, then creating a copy of the original cell segmentation masks where each cell mask takes on its unique clustering value. These converted masks are then written to the /classy_masks folder in the higher-level project directory (outside the analysis), along with a table of each cell's clustering label & integer value. This is analogous to the classy masks derived from pixel classifiers (see :doc:`UsingPixelClassifier`). Classy masks are different than regular segmentation masks because each cell in an image not longer has a unique value -- so cells are no longer uniquely identified -- but instead can share a value if they belong to the same cluster. These can be very useful to open in Napari, and overlay with the original image -- allowing you to scroll through the channels of the original image with the classy masks on top to see if cell clustering labels visually make sense with the actual expression of the channels. Naturally, this option only works for imaging projects -- not solution-mode / FCS-only projects. .. _Links: Links ~~~~~ These are links to some packages / software / manuscripts that can be helpful to understand this page of documentation, as either code or the techniques / ideas from these are used in PalmettoBUG's single-cell analysis module. `CATALYST `_ `CATALYST workflow `_ `anndata `_ `scanpy `_ `FlowSOM `_ .. |image1| image:: media/SingleCellAnalysis/ImageProcessing12.png :width: 5.72982in :height: 3.08774in .. |image2| image:: media/SingleCellAnalysis/SCAnalysis1.png :width: 4.93942in :height: 3.44809in .. |image4| image:: media/SingleCellAnalysis/SCAnalysis2.png :width: 5.8578in :height: 3.42096in .. |image3| image:: media/SingleCellAnalysis/SCAnalysis3.png :width: 6.5in :height: 3.74097in .. |image5| image:: media/SingleCellAnalysis/SCAnalysis4.png :width: 6.17833in :height: 4.10766in .. |image6| image:: media/SingleCellAnalysis/SCAnalysis5.png :width: 6.46445in :height: 3.52506in .. |image7| image:: media/SingleCellAnalysis/SCAnalysis6.png :width: 5.16711in :height: 4.02535in .. |image8| image:: media/SingleCellAnalysis/SCAnalysis7.png :width: 6.5in :height: 2.88819in .. |image9| image:: media/SingleCellAnalysis/SCAnalysis8.png :width: 6.5in :height: 3.00556in .. |image10| image:: media/SingleCellAnalysis/SCAnalysis9.png :width: 6.5in :height: 3.63889in .. |image11| image:: media/SingleCellAnalysis/SCAnalysis10.png :width: 5.91061in :height: 3.19463in .. |image12| image:: media/SingleCellAnalysis/SCAnalysis11.png :width: 6.60215in :height: 4.81759in .. |image13| image:: media/SingleCellAnalysis/SCAnalysis12.png :width: 6.43071in :height: 3.98653in .. |image14| image:: media/SingleCellAnalysis/SCAnalysis13.png :width: 6.5in :height: 3.83958in .. |image15| image:: media/SingleCellAnalysis/SCAnalysis14.png :width: 6.5in :height: 3.4in .. |image16| image:: media/SingleCellAnalysis/SCAnalysis15.png :width: 6.28597in :height: 3.54272in .. |image17| image:: media/SingleCellAnalysis/SCAnalysis16.png :width: 6.5in :height: 3.63056in .. |image18| image:: media/SingleCellAnalysis/SCAnalysis17.png :width: 6.5in :height: 2.6in .. |image19| image:: media/SingleCellAnalysis/SCAnalysis18.png :width: 6.5in :height: 4.45903in .. |image20| image:: media/SingleCellAnalysis/SCAnalysis19.png :width: 6.5in :height: 4.41806in .. |image21| image:: media/SingleCellAnalysis/SCAnalysis20.png :width: 7.2in :height: 4.0in