Advanced PalmettoBUG and its Directory Structure ================================================ This page of the documentation is meant to help with understanding the directory structure and data table files of PalmettoBUG and particularly with troubleshooting the program or using the program with other software. .. note:: The “HackingPalmettoBUG” slide deck inside the GitHub / PalmettoBUG package is an even more thorough, **animated** documentation resource for understanding the various directories of the program. The Imaging Project Directory ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The directory is a key structure in a PalmettoBUG project - it is where data is expected to be read from, and where data is saved in expected locations for later use. This well-defined structure is how PalmettoBUG can often easily re-enter an experiment at the same step where you left off last time. Knowing how PalmettoBUG reads & writes to this directory -- and where data is expected to be found and in what format -- can be very helpful for understanding errors and troubleshooting issues with your project, as well as for mix-and-matching PalmettoBUG inputs / outputs with other data analysis pipelines. |image1| The upper level of an imaging project includes a number of folders, as well as a single file (panel.csv). However, to begin a project, only the */raw* subfolder, containing the starting data files (TIFF or MCD) is necessary, with the rest of the directory being populated as the analysis is performed. On creating a project, other folders in this directory are created including */images, /masks, /Pixel_Classification, /classy_masks,* and /*Analyses*. 1). The */images* folder contains subfolders of images as .tiff files. Within it, the */images/img* is a special subfolder containing the images directly converted from the raw folder. Other folders of images can be made by the user with custom names - for example after denoising or other image modifications. .. 2). The /*masks* folder contains subfolders of cell masks as .tiff files. */masks/cellpose_masks* and */masks/deepcell_masks* are special sub-folders containing the cell masks generated by cellpose / deepcell algorithms, respectively. Other subfolders can be generated by the user by modifying those masks or by using pixel classifiers. 3). *Pixel_Classification* contains subfolders, each dedicated to a particular pixel classifier. Each classifier folder contains not only the primary outputs of the classifier - folder(s) of .tiff files with the predicted classes for each input image - but also a variety of other folders and files that provide information about the classifier (such as .json files containing the parameters of the classifier, or a folder of user-annotated images for training supervised classifiers). .. 4). C\ *lassy_masks* contains subfolders for every time you classify cell masks into cell types / groups using a pixel classifier. This means a folder of “classy masks”, which are cell masks (.tiff files) that have been grouped by the pixel classifier into a few categories. There is also a table of these cell classifications, one for entry for every cell masks, that can be imported into an analysis. This serves as an alternative way to cluster / label cell types. 5). *Analyses* contains ‘analysis’ sub-folders, each with data measured from the images for clustering, plotting, etc. Each analysis folder has the same structure as an “FCS directory” project and once created can be entered directly from the starting screen. Images and Masks sub-folder are fairly simple and need little explanation. They only contain images as .tiff files, all with the same filename for each ROI across all folders. In general, PalmettoBUG automatically uses the filename of the source image / ROI when writing a new .tiff file - this convention is critical because it is how images / masks / pixel classes, etc. are matched together - so if you make edits or add data to a PalmettoBUG project, **be sure that this convention is maintained.** The */Analyses*, */Pixel_classification*, and */classy_masks* sub-folder has more complex internal structures, and will be discussed in greater detail later in this document. The */panel.csv* file is also critical, and improper values in it can be a source of common errors. It is composed of 4 columns, the first two dedicated to the names of the channels in the */raw* dataset, and the second two dedicated to how they will be used. |image2| The first column are the labels for the channels that can be derived */raw* data, usually the metal channels themselves if derived from MCD files. The second column is meant to be the biologically useful labels for the channels and sometimes needs editing, if the labels in the */raw* dataset are missing or unhelpful. The third column, “keep” is where things get more complicated and more important. It defines what channels in */raw* will be carried over into the rest of the PalmettoBUG project. Critically, PalmettoBUG uses the order of the channels in the images in order to match channels to their labels, so **the** **order & number of channels in the images themselves is incredibly important.** This order is changed by what channel are kept, and those changes are tracked by the panel file. This is why editing the “keep” column after converting the data from */raw*, is very risky - the order of the channels in the panel file is changed, while the data itself is unchanged, creating a mismatch in expectation that can create errors or inaccurate analysis! Only edit the “keep” column if you intend to re-do all the steps of the pipeline, in order, immediately afterwards. The last column of the panel.csv file (“segmentation”) is only used by the Cellpose / DeepCell segmentation options, in order to identify the nuclear and cytoplasmic regions of cells in the images. This always must be manually set by the user! Analysis / FCS Directories ~~~~~~~~~~~~~~~~~~~~~~~~~~ The directory for each analysis within an imaging project (in the */Analyses* folder) is roughly the same as the directory for a solution-mode / FCS project. They are so similar, in fact, that the “Choose FCS directory” button on the starting page can be used to load an individual analysis of an imaging project. Let’s get into the details of how these directories work. The critical component for launching an analysis directory, is the presence of FCS files in the */main/Analysis_fcs* subfolder of the directory. However, in imaging experiments, there are additional, spatial information files required, and the FCS files themselves are derived from .csv files read from the images & segmentation masks. These additional data .csv files needed by imaging experiments are why analysis directories have folders parallel with */main* (specifically, */intensities*, */regionprops*, and sometimes */spatial_edts*). However, the meat of the analysis directory is contained inside the */main* sub-folder - this is where plots, cell clusterings, and statistics tables are written to as you perform your analysis. |image3| Sub-folder of */main*: 1). */Analysis_fcs* is the starting folder with raw FCS files. .. 2) */Clusterings* contains .csv files of cell clusterings / groupings. 3). */Mergings* contains .csv files of the manual merging of FlowSOM clusters by the user. .. 4). */Data\_tables* contains .csv files with data or statistics exported by the user for use outside PalmettoBUG. 5). */Plots* contains the plots generated by PalmettoBUG. .. 6). */Spatial_plots* contains the outputs of the spatial tab of the program - including plots AND statistics tables. *Key CSV’s - metadata and Analysis_panel & regionprops_panel* As described in the single-cell analysis page, when an analysis is run two CSVs are necessary - an Analysis_panel file, which helps with the handling of antigen names / classes, and a metadata file which handles the experimental details of the samples in the dataset. If you choose to load region properties into the analysis, then a regionprops_panel.csv will also be created - this contains the name / marker_class of the regionproperties being added to the data. |image4| If you are familiar with the anndata structure in Python (https://anndata.readthedocs.io/en/stable/) — which is the primary data structure inside the PalmettoBUG analysis modules — you may recognize the information in the metadata file as being part of *anndata.obs*, and the information in the panel file(s) as part of *anndata.var*. Pixel Classifier and Classy Mask folders ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Inside the */Pixel_classification* folder, each pixel classifier you make will receives its own folder. For both types of classifier folder, the initial output of the classifier will be written to a */classification_maps* sub-folder, and if any merging on biological labels is performed (the details of that merging being in a *biological_labels.csv* file), the result of that merging will be written to */merged_classification_maps*. Further, both types of classifier have a {classifier_name}\ *\_details.json* file that contains information about the setup and parameters of the classifier. However, only supervised classifiers will have a second .json file - containing the neural network information (including any training weights, which allows prediction after reload) - and only these will have a */training_labels* sub-folder. Unsupervised classifiers, on the other hand, contain a parameter file (*flowsom_panel.csv*) absent in supervised classifiers, as well as a *cluster_heatmap.png* containing a useful heatmap plot for examining & annotating the expression of markers in the primary, pre-annotation clusters. |image5| For either type of classifier, you may see a */Whole_class_analysis* sub-folder. This contains a very similar structure to a normal PalmettoBUG analysis directory, although it is also much simpler and does not use FCS files at all (just intensity .csv’s are used). This folder will only show up inside a classifier directory if needed. *Classy Mask folders:* Each time you classify a set of masks with a classifier, a new sub-folder is created in */classy_masks* to hold the output. These subfolders contain, at minimum a */primary_masks* folder of classified cell masks (.tiff files), and a .csv file with the same name as the */classy_masks* subfolder, which contains the class assignments for each cell mask. These are the only outputs for a cell mask classifier using the “mode” method. |image6| However, classy masks made using a secondary FlowSOM have a number of extra folders because of the need to annotate & merge the initial output of the FlowSOM. This includes a */secondary_masks* and a *secondary_cell_classification.csv* which contain the annotated cell classes and supersede the primary outputs, as well as a heatmap file (PNG) that is used to assist in the annotation of the FlowSOM metaclusters. For either classy masks or pixel classifiers, there can also be a sub-folder of PNG files generated from the classification maps / classy masks (these can be optionally created as an alternate way to visualize the .tiff files). Examples of How to Use this Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Knowing the directory structure can help in a couple main ways: 1). Troubleshooting. If you encounter an error while running the program, it can be helpful to know where the program is looking for information. For example, if you encounter an error in the analysis portion of the program with a message related to a “patient_id” column - it may be useful to check the metadata.csv file inside *{*\ analysis name\ *}/main*, and see if the metadata file contains the expected information. Or, if you are performing a step requiring a folder of pixel classifier outputs, you could check that folder to be sure that there is a pixel classification for every image, and you did not accidently only predict for one or a few of the images. 2). Combining PalmettoBUG with other software. One of the benefits of PalmettoBUG is that many of intermediate outputs of the program (like segmentation masks, pixel classifications, etc.) are automatically exported in common file formats like .tiff, allowing relatively easy integration of data between analysis pipelines, at certain steps of the program. But to be able to do this, you need to understand the PalmettoBUG directory to both find the data you want to take out of PalmettoBUG or to know where/how to try and insert the data you want to add to a PalmettoBUG project. For example, you may want to predict segmentation masks using a custom-trained Cellpose model, instead of the generalist models available in PalmettoBUg (a custom-trained model is likely to perform better than the base generalist model, if you want to invest the time to do the training, since it will have gained “experience” or your specific data!). This doesn’t mean you can’t use PalmettoBUG – all you have to do is take the predicted cell masks from your custom Cellpose model, and place them all as .tiff files (with the same filename / format as you would get from the Cellpose predictions in PalmettoBUG) into a subfolder of */masks*. Or as an example in the opposite direction, if you just wanted the pixel classifier outputs from PalmettoBUG, and wanted to do most of your analysis in another software, you can know to go to the appropriate folder inside */Pixel_classification* to find those classifier predictions, and then copy those files to wherever you needed to for the other software. .. |image1| image:: media/Advanced1.png :width: 6.5in :height: 4.9in .. |image2| image:: media/Advanced2.png :width: 3.94669in :height: 4.41262in .. |image3| image:: media/Advanced3.png :width: 6.5in :height: 5.1525in .. |image5| image:: media/Advanced4.png :width: 6.5in :height: 3.06806in .. |image4| image:: media/Advanced5.png :width: 5.56551in :height: 3.71807in .. |image6| image:: media/Advanced6.png :width: 6.5in :height: 3.25in