palmettobug
===========

.. py:module:: palmettobug

.. autoapi-nested-parse::

   Welcome to PalmettoBUG! 

   PalmettoBUG's analysis functions are available either thorugh the GUI, or through scripting.
   To launch the GUI, run the command 'palmettobug' in a CLI environment where PalmettoBUG is installed, or use the function 
   PalmettoBUG.run_GUI() in a script.


   License: GPL3
   Author: Ben Caiello
   Institution: FlowCytometry and Cell Sorting Shared Resource of the Hollings Cancer Center at the Medical University of South Carolina


   This script contains the code for exporting the various functions / classes of PalmettoBUG's non-GUI API


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/palmettobug/Analysis_functions/index
   /autoapi/palmettobug/Entrypoint/index
   /autoapi/palmettobug/Executable/index
   /autoapi/palmettobug/ImageProcessing/index
   /autoapi/palmettobug/Pixel_Classification/index
   /autoapi/palmettobug/Utils/index


Classes
-------

.. autoapisummary::

   palmettobug.ImageAnalysis
   palmettobug.SupervisedClassifier
   palmettobug.UnsupervisedClassifier
   palmettobug.WholeClassAnalysis
   palmettobug.Analysis
   palmettobug.SpatialAnalysis
   palmettobug.TableLaunch


Functions
---------

.. autoapisummary::

   palmettobug.run_GUI
   palmettobug.CyTOF_bead_normalize
   palmettobug.mask_expand
   palmettobug.imc_entrypoint
   palmettobug.read_txt_file
   palmettobug.txt_folder_to_tiff_folder
   palmettobug.setup_for_FCS
   palmettobug.plot_pixel_heatmap
   palmettobug.plot_class_centers
   palmettobug.segment_class_map_folder
   palmettobug.plot_classes
   palmettobug.merge_classes
   palmettobug.merge_folder
   palmettobug.slice_folder
   palmettobug.mode_classify_folder
   palmettobug.secondary_flowsom
   palmettobug.classify_from_secondary_flowsom
   palmettobug.extend_masks_folder
   palmettobug.run_napari
   palmettobug.print_license
   palmettobug.print_3rd_party_license_info


Package Contents
----------------

.. py:function:: run_GUI() -> None

   Launches the PalmettoBUG GUI 


.. py:function:: CyTOF_bead_normalize(bead_fcs_folder: str, to_normalize_fcs_folder: str, output_folder: str, bead_channels: list, channels_to_normalize: Union[list, None] = None, include_figures: bool = True) -> None

   This function performs the Premessa-style CyTOF normalization of the function above (normalize_pipeline_one_fcs) on all the FCS files 
   in a pair of beads / non-beads folders.

   Args:
       bead_fcs_folder (str | Pathlike): 
           file path to a directory containg fcs files of only beads

       to_normalize_fcs_folder (str | Pathlike): 
           file path to a directory containg fcs files that you want to be normalized. Usually 
           the non-beads, live, singlets (already gated in a program like FlowJo)

       output_folder (str | Pathlike): 
           file path to a directory where you want the normalized fcs' and figures will be written. Will 
           attempt to create the directory if it does not exist 

       bead_channels (list): 
           a list of the columns (i.e., metal channel / antigen) in the beads that will be used to calculate the slopes for normalization

       channels_to_normalize (list | None): 
           a list of the columns (i.e., metal channel / antigen) that will be normalized for analysis. 
           If None is passed in, will automatically try to detect all metal channels by looking for the 
           following regex: 

                   r".+Di"

           It is best to explicitly define the channels to normalize, as this regex may not capture the channels in your data successfully! 

       include_figures (bool): 
           whether to export figures of the normalization (True) or not (False)

   Returns:
       None

   Inputs/Outputs:
       Inputs: 
           reads in two folders, specified by bead_fcs_folder & to_normalize_fcs_folder, containing data from a CyTOF experiment
           with only beads remaining in bead_fcs_folder (non-beads gated out)) and with the events to nomalize in to_normalize_fcs_folder. 
           Normally, the to_normalize_fcs_folder .fcs files would be gated to exclude beads / dead cells, which would be more computationally
           efficient, but technically that isn't necessary as long as those are gated out afterwards. 
           Each folder should contain .fcs files and ONLY .fcs files.
           
           NOTE: It is assumed that the order of the files in each folder match (as in, the first file in os.listdir(bead_fcs_folder) are 
           the matching beads from the first file in os.listdir(to_normalize_fcs_folder))

       Outputs: 
           Outputs normalized beads .fcs files to output_folder/normalized_beads folder and normalized non-beads .fcs files to
           output_folder/normalized folder. The files in the output folders should match the same filenames as the two input folders.
           Additionally, if include_figures is True -- will export plots of the normalization to the output_folder/normalization_figures
           folder (as PNG files).  


.. py:class:: ImageAnalysis(directory: Union[pathlib.Path, str, None], resolutions: list[float, float] = [1.0, 1.0], from_mcds: bool = True)

   This handles Image Processing steps of PalmettoBUG, such as conversion from mcd's and segmentation measurements

   Args:
       directory (str, Path, or None):
           The directory to step up the image analysis / PalmettoBUg project inside of. Is expecting the directory to already exist & there
           to be .mcd or .tif / .tiff files in a /raw subfolder of directory.
           If None, then initiates an ImageAnalysis object without needing to set up the directory. This can be useful if you don't need the 
           /raw --> /images/img conversion step and you intend to manually set the input / output folder paths at each step.

       resolutions (list of float): 
           Default = [1.0, 1.0]. Represents the width of the pixels in [X, Y] directions in micrometers.

       from_mcds (bool): 
           If True, assumes that there will be MCD files in /raw. If False, presumes there are .tif/.tiff files in /raw. 

   Key Attributes:
       directory (str): 
           the path to a folder containing a /raw/ subfolder where the MCD or TIFF files are

       directory_object (DirSetup): 
           this attribute is a sub-class from Utils/SharedClasses.py module. it coordinates directories of the typical
           PalmettoBUG project --

           *Key Subattributes*

           - (self.directory_object).main == self.directory >>> The highest-level directory of the project

           - raw_dir == {directory}/raw >>> the folder containing the MCD or TIFF files of raw data

           - img_dir == {directory}/images >>> the folder containing sub-folders of images (such as the img_dir/img sub-folde, which contains
               the initial images directly converted from the raw_dir)

           - masks_dir == {directory}/masks >>> the folder containing sub-folders of cell masks (such as the masks_dir/deepcell_masks sub-folder, 
               which contains the cell masks created by deepcell, before any modifications / expansions are performed on those masks.)

           - classy_masks_dir == {directory}/classy_masks >>> the folder containing sub-folders of classy cell masks. Each subfolder is named
               by convention using the pixel classifier + cell mask pair that the clasy masks were derived from.

           - px_classifiers_dir == {directory}/Pixel_Classification >>> the folder containing sub-folders of pixel classifiers

           - Analyses_dir == {directory}/Analyses >>> the folder containing sub-folders of Analysis directories. 
           
           - logs == {directory}/Logs --> the folder containing the .log files generated by the GUI

       from_mcds (boolean): 
           whether the files in directory/raw are MCD files (True), or are TIFF files (False)

       resolutions (list[float, float]): 
           the X and Y resolutions of the images, in micrometers / pixel

       panel (pandas dataframe): 
               This is a pandas dataframe read-in from & written to directory/panel.csv
               This is a steinbock-style panel (with changes), with four columns = "name","antigen","keep","segmentation"

       metadata (pandas dataframe): 
               This is the dataframe containing a CATALYST-style metadata file. It is really intended for the 
               Analysis portion of the program and is not used in image processing, however this class produces a preliminary
               version of this dataframe as a part of the transition from image processing --> analysis

               It has four columns -- 

                   "file_name" == the filenames of the .fcs files in the analysis

                   "sample_id" == numbers to identify each filename quickly (zero-indexed)

                   "patient_id" == a secondary grouping / covariate / batch

                   "condition" == the independent variable / treatment vs. control grouping

       Analysis_panel (pandas dataframe): 
               This is the dataframe containing a CATALYST-style panel file. It is really intended for the 
               Analysis portion of the program and is not used in image processing, however this class produces a preliminary
               version of this dataframe as a part of the transition from image processing --> analysis

               It has three columns:

                   "fcs_colname" == the name of the marker in the .fcs files. when coming straight from solution-mode fcs files, 
                   these names can be non-straightforward or confusing (often metal names. Ex: Ce140Di) When produced from an 
                   imaging experiment the fcs names are usually identical to the "antigen" names:

                   "antigen" == the name of the marker to use in analysis plots, usually its straightforward, biological name (ex: CD4)

                   "marker_class" == 'type','state', or 'none' -- used as in CATALYST-style workflow to determine how markers are used.


   .. py:attribute:: from_mcds
      :value: True


   .. py:attribute:: resolutions
      :value: [1.0, 1.0]


   .. py:attribute:: metadata


   .. py:attribute:: Analysis_panel


   .. py:method:: _panel_setup() -> None

      This will either read in the panel.csv file in the top-level project folder, or failing that will attempt to generate a panel file automatically.

      Note:
          -- This method depends on the self.from_mcds attribute to know whether mcds or tiffs are in the /raw folder

          -- Any automatically generated panel file will have an entirely blank 'segmentation' column. This requires editing 
          before segmentation can be performed using deepcell or cellpose

          -- additionally, the automatically generated 'keep' column may not accurately reflect what you want, although it often does. 


   .. py:method:: panel_write() -> None

      This method writes the self.panel dataframe to the disk at the expected location for future re-read 
      (panel.csv in the top-level folder of the project) 


   .. py:method:: raw_to_img(hpf: int = 50, input_directory: Union[str, pathlib.Path, None] = None, output_directory: Union[str, pathlib.Path, None] = None) -> None

      This method converts/moves files from the /raw folder --> /images/img folder. It always exports in .ome.tiff format
      with two transformations of the images in the raw files:

          1. channels with 'keep' == 0 in the panel file will be dropped from the exported .ome.tiffs

          2. hot pixel filtering will be performed before exporting to the /images/img folder if hpf > 0

      It depends on self.from_mcds to know whether to expect MCD or TIFF files in the /raw folder.

      Args:
          hpf (int >= 0): 
              an integer denoting the threshold used for steinbock-style hot pixel filtering. This means that pixels
              that are brighter than each of their surrounding neighbor pixels by more than the inputted threshold 
              will have their values reduced to match the value of their brightest neighbor pixel. So lower 
              thresholds will more aggressively filter "hot" / bright pixels, while higher thresholds will filter less. 
              The default (50) matches the default value in steinbock. ** If hpf == 0, no hot pixel filtering will occur. **

          input_directory (str, Path, or None): 
              a path to the folder containing the .mcd or .tiff files to be converted and 
              hot-pixel filtered. Assumes that the folder chosen ONLY contains files (no sub-folders) and 
              that all of those files are of the appropriate format.

          output_directory (str, Path, or None): 
              A path to a folder to write the .ome.tiff files to. If None, then defaults 
              to the self.directory.main + "/images/img/" subfolder

      Returns:
          None: (its output is in writing to the disk, not returning a value)


   .. py:method:: instanseg_segmentation(re_do: bool = False, input_img_folder: Union[pathlib.Path, str, None] = None, single_image: Union[pathlib.Path, str, None] = None, output_mask_folder: Union[pathlib.Path, str, None] = None, channel_slice: Union[None, numpy.array] = None, merge_channels: bool = False, pixel_size: Union[float, None] = None, target: str = 'cells', mean_threshold: float = 0.0, model: str = 'fluorescence_nuclei_and_cells') -> None

      Instanseg is an open-source (no non-commercial issues) deep-learning segmentation algorithm: https://github.com/instanseg/instanseg/tree/main

      Channels are scaled by a min_max transformation before being used (and before being merged together, if that is chosen).

      Args:
          input_img_folder (str, Path, or None):
              The path to a folder containing the images to segment. If None, defaults to f"{self.directory_object.img_dir}/img"
          
          output_mask_folder (str, Path, or None):
              The path to a folder where you want the segmentation masks to be written to. If None, defaults to f"{self.directory_object.masks_dir}/instanseg_masks"

          single_image (str, Path, or None):
              If not None (and not the empty string ""), this parameter provides the name or path of a single image in the input_img_folder to segment.

          channel_slice (integer numpy array or None):
              If provided, will be used to slice each image array to subset the channels provided to the instanseg model. The length of this array
              must be the same as the number of channels in the images.
              Specifically, the channels in the image that will be used will be:  image[channel_slice > 0]
              If None, all channels in all images are used as independent channels.

          merge_channels (boolean):
              IF channel_slice is provided, this determines whether the selected channels are merged into two (cytoplasmic / nuclear -- True) or left
              as separate channels (False, default).

          pixel_size (float or None):
              resolution of the pixels in the images. If None, defaults to using self.resolutions (self.resolutions[0] == self.resolutions[1] must be true)
              Provided to the pixel_size argument of the instanseg model 

          target (str):
              "cells", "nuclei", or "all_outputs". Whether to try to segment whole cells, only the nuclei, or both. 
              Provided to the target argument of the instanseg model 

          mean_threshold (float):
              Higher values decrease the number of cells (higher threshold for identifying a cell) while lower number should increase the number of 
              detected cells

          model (str):
              what pre-trained instanseg model to use. Currently instanseg only offeres two models (fluoresence-based, the default, and a H&E based model)
              More options will hopefully open up as this segmentation model is developed. Theoretically, there should be a way to allow custom-trained
              models to loaded as well, which could be quite nice.

      ## example test script  -- results so far: it works in that it runs, but the results are very poor compared to deepcell
      ## maybe instanseg needs a dedicated IMC model, or needs a larger training set (like TissueNet, but that particular dataset would create 
      ## license issues))
      import palmettobug
      proj_dir = f"{my_computer_path}/Example_IMC"
      image_object = palmettobug.ImageAnalysis(proj_dir, resolutions = [1.0,1.0])
      panel_keep_only = image_object.panel[image_object.panel['keep'] == 1]
      nuclei_slice = panel_keep_only['segmentation'] == "Nuclei"
      image_object.instanseg_segmentation(channel_slice = nuclei_slice, target = "nuclei", mean_threshold = -1.0)


   .. py:method:: mask_intersection_difference(masks_folder1: Union[str, pathlib.Path], masks_folder2: Union[str, pathlib.Path], kind: str = 'intersection1', object_threshold: int = 1, pixel_threshold: int = 1, re_order: bool = True, output_folder: Union[None, str, pathlib.Path] = None)

      Provide two folders of masks, and derive a third folder of masks from them transformed in some way. Masks are dropped as a whole (not pixel-wise),
      and there are a limited set of possible transformations:

           intersection1 (one-way) -- This keeps the masks from the first folder of masks, but only the masks that overlap with sufficient masks from folder2
                                      No masks from folder2 carry over to the output folder

           intersection2 (two-way) -- This keeps masks from both folders, as long as they overlap sufficiently. HOWEVER, masks from folder1 take precedence
                                      As in, where overlap exists only mask1 values will be carried over first and mask2 values will only end up in the output
                                      after that where the output has values of 0 (which is to say, only in regions outside the remaining masks from mask1)
                                      Additionally, masks from the second folder are given a value = mask2 + max(mask1) in the output so that they will remain distinct
                                      from folder1-derived masks. 

           difference1 (one-way) -- This keeps masks from folder1 only if a sufficient number of masks from folder2 do NOT overlap with them

           difference2 (two-way) -- This keeps masks from both folders, but only if they do not overlap with sufficient masks from the opposite folder. 
                                  HOWEVER: Masks from the first folder take precedence over mask from the second! 
                                  As in, the masks from folder1 which are kept in the transformation are carried over into the output first, and after that the 
                                  masks from folder2, but only into pixels with value 0 in the output. 
                                  This precedence should only matter if the thresholds are increased above the defaults of 1, as otherwise there should be no
                                  overlap at all between the saved masks from the two folders.
                                  Additionally, masks from the second folder are given a value = mask2 + max(mask1) in the output so that they will remain distinct
                                  from folder1-derived masks. 


      an overlapping mask is determined by the pixel_threshold value -- a mask from folder 2 is considered to overlap with a mask from folder 1 if the number of overlapping
          pixels between the two is greater than or equal to the pixel_threshold value. 

      'sufficient masks' is determined by the object threshold (default = 1, as in, just 1 overlapping mask form folder2 within a mask from folder 1 means 
          triggers the transformation) 

      Together, this should allow this function to be used to do things like only keeping cell masks within a particular region of the tissue or only keeping tissue 
      regions with sufficient number of cell masks inside them, etc. Or, by chaining this operation together, only keeping cells within particular region of tissue, where 
      those regions of tissue have sufficient numbers of cells (of a particular cell type, even, if using classy masks to further sophisticate things).
      The (possible) addition of this function was inspired by analyses performed in the following paper using pancreatic islets: 
          Damond, Nicolas et al. “A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry.” Cell metabolism vol. 29,3 (2019): 755-768.e5. 
          doi:10.1016/j.cmet.2018.11.014
      The publicly-available data from this paper is also planned to be analyzed in PalmettoBUG, in order to compare the effectiveness of PalmettoBUG at 
      replicating prior work.

      Args:
          masks_folder1 / 2 (string, Path): 
              paths to two folders of masks -- as in, each folder is expected to contain single-channel, integer-valued tiff files where each integer represents
              a unique cell (or other object). There must be files in each folder with matching file names -- only these can be processed!
              Note that the order of the folders (as in, which is masks_folder1 vs masks_folder2) is very important for some transformations!

          kind (string):
              One of ['intersection1', 'intersection2', 'difference1', or 'difference2']. Determines how the maasks are transformed. See description above for details.
              Note that when kind = 'difference2', the object/pixel threshold comparisons are also utilized in the reverse (from mask folder 2 --> 1)

          object_threshold (integer):
              when determining whether a mask from folder1 overlaps with masks from folder2, this determines how many 'overlapping' objects inside it are sufficient
              to trigger keeping / discarding the mask. The default is 1, meaning that even a single overlapping mask is sufficient to trigger the transformation. 
              

          pixel_threshold (integer): 
              when determing whether a mask from folder2 overlaps with a mask from folder1, this determines how many pixels of mask2 is sufficient to consider
              it overlapping. When == 1 (default), this means that even a single pixel of overlap will count mask2 as an object inside mask1. The total count
              of such overlapping mask2 objects inside mask1 are then compared to the object_threshold to determine whether to keep / discard mask1 from the output.

          re_order (boolean):
              Whether to re-index the masks, starting from 1 and continuously increasing in increments of 1, so that there are no gaps / discontinuities in the values.
              Default = True, which re-indexes to start from 1 etc. However, if you want to preserve the original mask values (of mask1 only), so that they can be 
              matched to the original masks, set this parameter == False. Because of how two-way methods work, the original values of mask folder2 are not preserved
              regardless of this parameter.

          output_folder (string, Path, or None):
              The path to a file folder where the output, transformed masks can be written. If None (default), then the file folder name is automatically derived
              from the names of folder1 and folder2 (specifically: {self.directory_object.masks_dir}/{folder1}_{folder2} ). This folder automatically inside the masks
              directory of the PalmettoBUG project folder. 
              If provided, should be a FULL path to a create-able folder where the masks will be written. 

      Returns:
          None     (does, however, read & write .tiff files)


   .. py:method:: _mask_bool(mask1: numpy.ndarray[int], mask2: numpy.ndarray[int], kind: str = 'intersection1', object_threshold: int = 1, pixel_threshold: int = 1, re_order: bool = True) -> numpy.ndarray[int]

      helper for self.boolean_mask_transform, executing the operation on a single pair of masks


   .. py:method:: make_segmentation_measurements(input_img_folder: Union[pathlib.Path, str], input_mask_folder: Union[pathlib.Path, str], output_intensities_folder: Union[pathlib.Path, str, None] = None, output_regions_folder: Union[pathlib.Path, str, None] = None, statistic: str = 'mean', re_do: bool = False, advanced_regionprops: bool = False) -> None

      This method measures statistics and regionproperties from cell masks + images, and writes these as intensity 
      and regionprops csv files in an output folder. This output folder is structured such that a PalmettoBUG-style 
      analysis can easily be launched from it. 

      It is derived & and relies upon steinbock region_measurements functions (and through them, skimage). 

      Args:
          input_img_folder (Path, str): 
              the file path to the folder containing the images to be used for measuring 
              statistics / intensities

          input_mask_folder (Path string): 
              the file path to the folder contianing the masks to be used for measuring both 
              stats / intensities AND region properties (the region properties depend only on 
              the shape of the masks, not of the channels of the matching images)

              NOTE! -- the input_img_folder and input_mask_folder are presumed to have the same number of files & these files share 
              filenames / order. This is the default way that PalmettoBUG / isoSegDenoise exports these files, but be 
              careful to make sure to change this to be the case if your data does not match this pattern!

          output_intensities_folder (Path, string, None): 
              the file path to the folder to export the intensity csv files (these 
              csv files are effectively like fcs files with events ['Object'] each with intensity measurements / statistics for each channel). 
              If None, then the self.directory_object.intensities_dir will be used -- this depends on having set up an analysis using 
              the self.directory_object.make_analysis_dirs(analysis_name) method beforehand

          output_intensities_folder (Path, string, None): 
              the file path to the folder contianing the regionprops csv files for each mask. 
              Like the intensities csv's these are structured with measurements for each object/event/cell, but are measurements like area, perimeter, etc.
              If None, then will use the self.directory_object.regionprops_dir -- this depends on calling the self.directory_object.make_analysis_dirs(analysis_name) method beforehand

          statistic (string): 
              The statistic to report for each cell/object for each channel in the instensity csv files.
              One of ["mean","median","min","max","sum","std","var"]. Default is "mean", and you will rarely need any of the other options. 

          re_do (boolean): 
              Determines whether to check if each intensities/regionprops csv has already been generated (by matching the file name) 
              and ONLY export NEW csv files (if = True).
              Alternatively, export every csv file regardless of whether or not the export folder already contains identically named files (if = False). 

              Consider re_do = True if you are adding new files to a large existing project and don't want the old files to be redone or are 
              generating advanced regionprops (saves time), 

              otherwise re_do = False is frequently better as writing each file without advanced regionprops is not a very long process & ensures 
              consistency at this step. 

          advanced_regionprops (boolean): 
              If True, will calculate a few 'advanced' region properties, like --> [number of branch points, tortuosity, etc...]
              of the masks. This is a slow process so the default advanced_regionprops = False is preferred unless these additional region properties are greatly desired. 

              WARNING!! -- This option is currently broken / unreliable (algorithms & packages used to derive these regionproperties appear to make errors
              in determining branching.)

      Returns:
          None -- (its output is in writing to the disk, not returning a value)


   .. py:method:: _advanced_regionprops(input_mask_folder: Union[pathlib.Path, str], output_regions_folder: Union[pathlib.Path, str]) -> None

      Helper function for self.make_segmentation_measurements(). Calculates 'advanced' region properties: [n_slab, n_branch, tortuosity, cycles]

      WARNING!! -- Currently not accurate / functional (inaccurate, additional branches in simple tests). Error may propagate from NAVis library


   .. py:method:: to_analysis(Analysis_tab=None, metadata_from_save: bool = False, gui_switch=None) -> None

      This function prepares / sets up an Analysis folder, by converting intensity csv files to fcs files and generating preliminary / semi-empty metadata / panel 
      pandas dataframes, which require editing before writing to the disk at the newly prepared self.Analysis_panel_dir and self.metadata_dir filepaths.

      Depends on self.directory_object.make_analysis_dirs(analysis_name) being called first to set up the directory structure and direct the discovery
      of the intensity files which will be used to generate FCS files & the intial panel/metadata dataframes.

      DOES NOT export the panel / metadata files to the disk, but returns them, along with the file paths where they are expected to
      be wrtten to, in the following order: (panel dataframe, metadata dataframe, panel path, metadata path)

      Args:
          Analysis_tab / metadata_from_save (Only for use inside GUI): 
              IGNORED outside of GUI. in the GUI they assist in coordinating the widgets & choosing to load the Analysis_panel/metadata files 
              from the directory_object.Analyses_dir (if someone makes a second analysis in one project, instead of requiring a fresh panel/metadata 
              set up each time a new analysis is made). 
          gui_switch (Boolean or None) -- only needed if an error is making palmettobug think it is in the gui. Needed for a testing error


   .. py:method:: _intense_to_fcs(input_intensity_directory: Union[pathlib.Path, str, None] = None, ouput_fcs_folder: Union[pathlib.Path, str, None] = None) -> None

      Helper method for self.to_Analysis --> writes .fcs files from .csv files in the intensities folder.

      When called with default arguments, depends on self.directory_object.make_analysis_dirs(analysis_name), however the input and output folders
      can be specified separately to allow its use in any context.

      Args:
          input_intensity_directory (Path, string, None): 
              the path to a folder where the intensity csv files are (exported by self.make_segmentation_measurements)
              If None, then a default path is presumed which requires a prior execution of the self.directory_object.make_analysis_dirs(analysis_name) method

          ouput_fcs_folder (Path, string, None): 
              the path to a folder where the FCS files will be written to, with the same behaviour as (input_intensity_directory) - if None, a 
              default path self.directory_object.fcs_dir is presumed, etc.


   .. py:method:: _initial_Analysis_panel() -> pandas.DataFrame

      Helper method for self.to_Analysis generates an initial Analysis panel (marker_class column blank)


   .. py:method:: _initial_metadata_file() -> pandas.DataFrame

      Helper method for self.to_Analysis --> generate the initial metadata file (patient and condition columns blank)


.. py:function:: mask_expand(distance: int, image_source: Union[pathlib.Path, str], output_directory: Union[pathlib.Path, str, None]) -> None

   Function that expands the size of cell masks:

   Args:
       distance (integer):  
           the number of pixels to expand the masks by

       image_source (string or Path): 
           the file path to a folder containing the cell masks (as tiff files) to expand

       output_directory (string or Path): 
           the file path to a folder containing the cell masks (as tiff files) to expand

   Inputs / Outputs:
       Inputs: 
           reads every file in image_source folder (expecting .tiff format for all)

       Outputs: 
           writes a (.ome).tiff into the output_directory folder for each file read-in from image_source
           The filenames in the image_source folder as preserved in the output_directory, so if image_source == output_directory
           then the original masks will be overwritten. 


.. py:function:: imc_entrypoint(directory: Union[pathlib.Path, str], resolutions: list[float] = [1.0, 1.0], from_mcds: bool = True) -> ImageAnalysis

   This function is the entrypoint for a project using MCDs or TIFFs. It initializes and return an ImageAnalysis object using the 
   arguments passed in by the user.

   Args:
       directory (Path or string): 
           This is the path to a folder containing a subfolder /raw with either .tiff or .mcd files inside it

       resolutions (iterable of length two: float, float): 
           This is the [X, Y] resolutions of the images in micrometers / pixel. 
           The default is 1.0 microns / pixels for both dimensions, as has been usual for IMC. 

       from_mcds (boolean): 
           whether the /raw subfolder contains .mcd files (= True) or .tiff files (= False)

   returns:
       a palmettobug.ImageAnalysis object


.. py:function:: read_txt_file(path: Union[str, pathlib.Path], num_meta_data_columns: int = 6)

   Reads a single txt file (these are backups for if mcd files become corrupted).

   Assumes that the txt file itself is not corrupted & has six [Arg: num_meta_data_columns] metadata columns before the channels.
   Also assumes a complete file with X / Y columns, whose values accurately correspond to the X / Y of the image to be generated.

   Args:
       path (str or Path): 
           the full path to the single .txt file to be read-in

        num_meta_data_columns (integer): 
           The number of metadata columns in the file. Default is 6, which worked for the files I tested this function on, 
           but I don't know what is standard / what kind of variability there is in the metadata for these types of files.

   returns:
       numpy array, which can be saved as an (.ome).tiff


.. py:function:: txt_folder_to_tiff_folder(txt_folder: Union[pathlib.Path, str], tiff_folder: Union[pathlib.Path, str], panel: pandas.DataFrame, hpf: Union[float, int] = 50, ome_tiff_metadata: bool = False, resolutions: list[float] = [1.0, 1.0], num_meta_data_columns: int = 6) -> None

   Iteratively runs read_txt_file() on each file in a directory, and output .ome.tiffs (at least by file extension) to an output directory.

   To actually generate ometiff metadata to be included with the images, two things must occur:

       1. ome_tiff_metadata == True

       2. Resolutions Argument must be provided (default is 1 micrometer/pixel in both X an Y)

   Args:
       txt_folder (Path or str): 
           the path to a folder where the .txt files to be converted are. Assumes only .txt files are inside this folder

       tiff_folder (Path or str): 
           the path to the folder where the output .ome.tiff files will be written

       panel (pandas DataFrame): 
           the panel file. Used to filter out undesirable channels & construct ometiff metadata

       hpf (integer >= 0): 
           whether to run hot pixel filtering (if hpf != 0) and if hpf is run what threshold to use. 

       ome_tiff_metadata (boolean): 
           whether to generate metadata for the ome.tiff -- this does not change the output extension name, which will always be .ome.tiff, but 
           does determine whether the function will attempt to add metadata to the files. If True, then the resolutions argument must be provided, or an error will occur. 

       resolutions (list of two floats > 0): 
           the resolution of the images in X and Y dimensions, in micrometers per pixel. Only used if ome_tiff_metadata == True. 

       num_meta_data_columns (integer):
           The number of metadata columns in the file. Default is 6, which worked for the files I tested this function on, 
           but I don't know what is standard / what kind of variability there is in the metadata for these types of files. 

   Inputs / Outputs:
       Inputs: 
           reads each file inside txt_folder (expecting all to be .txt files of the format exported during IMC as backups of .mcd files. )

       Outputs: 
           writes an .ome.tiff file in tiff_folder for each .txt file read in from txt_folder


.. py:function:: setup_for_FCS(directory)

   This sets up a folder for single-cell analysis using the CATALYST-derived module of PalmettoBUG. 

   Can be used for a solution-mode experiment (direct from FCS files) or as part of the set up when transitioning from image processing to single-cell analysis.

   Args:
       directory (str):
           The directory to set up from single-cell analysis

   Returns:
       Analysis_panel (a pandas dataframe of the initially generated Analysis_panel file, needs the marker_class column to be filled in by the user)

       Analysis_panel_dir (a string, the path to where the Analysis_panel should be saved on the disk once it has been completed by the user)

       metadata (a pandas dataframe of the initially generated metadata file, needs the patient_id and condition columns to be filled in by the user)

       metadata_dir (a string, the path to where the metadata should be saved on the disk once it has been completed by the user)


.. py:class:: SupervisedClassifier(homedir: Union[pathlib.Path, str])

   This class handles the supervised pixel classifier creation, training, and prediction. It is mainly set up by the 
   setup_classifier method, not by the __init__ call.

   Args:
       homedir (str or Path):
           the path to the directory where the Pixel Classification folder and subfolder will be placed 
           (this is the main directory for PalmettoBUG)

   Key Attributes:
       classifier_path (str): 
           the full file path to the .json file containing the training opencv ANN_MLP classifier

       classifier_dir (str): 
           the path to the folder where the classifier will be setup (== {self._homedir}/Pixel_Classification/{self.classifier_name}/ )

       classifier_training_labels (str): 
           The path to the folder where the classifier training labels are (by default) expected to be written to / read from

       output_directory (str): 
           the path to the folder where the classifier predictions will be exported by default.

       classifier_name (str): 
           The name of the classifier, used for the folder name where the classifier is set up, and to help name the .json files containing
           the trained classifier & its details. Derived from the main, opencv2 .json file name & includes its .json file extension. 

       algorithm (cv2.ml.ANN_MLP): 
           the opencv2 ANN_MLP classifier instance

       details_dict (dictionary): 
           the dictionary containing details of the classifier not available inside the opencv2 .json file 
           This dictionary is saved to a .json file parallel to the opencv2 formatted .json, with "_details" appended to its filename
           Information in the this dictionary are things like the channels, sigmas, & features selected. 

   Formerly PxQuPy (Pixel QuPath Python) -- may likely will remain residuals of that naming in class-internal / GUI-internal namespace


   .. py:attribute:: _homedir
      :value: ''


   .. py:attribute:: classifier_path
      :value: None


   .. py:attribute:: algorithm
      :value: None


   .. py:attribute:: details_dict


   .. py:attribute:: _image_name
      :value: ''


   .. py:method:: _setup_directory() -> None

      helper for __init__, checks and sets up pixel classification folder that contains individual classifier subfolders


   .. py:method:: _setup_classifier_directory(classifier_name: Union[str, None] = None, classifier_path: Union[str, None] = None) -> None

      Sets up the classifier folder (subfolder of /Pixel_Classification) for an individual supervised classifier.

      If classifier_path is provided, then attempts to load the .josn at the provided classifier_path and derives the classifier name 
      from the path. When providing classifier_path, the loaded classifier can then be used (or this should be the case) immediately to
      predict. THIS TAKES PRECEDENCE over classifier_name.

      Alternatively, the classifier_name is provided and classifier_path is None --> This creates a new, empty classifier folder.

      In practice, may be more a helper method for the setup_classifier method


   .. py:method:: setup_classifier(classifier_name: str, number_of_classes: int, sigma_list: list[float], features_list: list[str], channel_dictionary: dict[str:int], classes_dictionary: dict[int:str] = {}, image_directory: str = '', categorical: bool = True, internal_architecture: list[int] = [], epsilon: float = 0.01, iterations: int = 1000) -> tuple[cv2.ml.ANN_MLP, dict]

      This method takes in a variety of user inputs, and creates the initial pixel classifier directory and .json files, ready for training.

      Args:
          classifier_name (str):  
              the name of the classifier

          number_of_classes (int): 
              the number of classes being predicted by the classifier 

          sigma_list (list of numeric): 
              list of the numeric values of the sigmas to be used in the creation of features for the classifier. 
                  
              Example: [1.0, 2.0, 4.0]

          features_list (list of strings): 
              list of the features to be generated from the image and to be fed into the classifier. 
              Possible features = ["GAUSSIAN", "LAPLACIAN", "WEIGHTED_STD_DEV", "GRADIENT_MAGNITUDE", 
              "STRUCTURE_TENSOR_EIGENVALUE_MAX", "STRUCTURE_TENSOR_EIGENVALUE_MIN", "STRUCTURE_TENSOR_COHERENCE", 
              "HESSIAN_DETERMINANT", "HESSIAN_EIGENVALUE_MAX",  "HESSIAN_EIGENVALUE_MIN"]

          channel_dictionary (dict): 
              a dictionary with keys of the channels' common names (str) and values of the channels' location in the image (int). 

                  Example: {'channel_1_name':1, 'channel_10_name':10, 'channel_3_name':3, ...}

              Use this to specify the channels you want used, and to record in the classifier's .json file what antigen each 
              channel represents
          
          classes_dictionary (dict): 
              a dictionary with keys (int) that correspond to the integer labels in the label / prediction files. 
              As in, these are the label numbers used in napari for a given class. The dictionary values (str) correspond to the 
              description the user want for each class. 

                      Example == {1:"Astrocyte",2:"Neuron",3:"Background", ...}

              These are used to retrieve the biologically important information after the classification is complete. 
              It is not needed for the classification steps themselves. Currently has a default value of {}, but that may change 
              in the future to enforce a choice of labels on the part of the user.

          image_directory (str, optional): 
              The path to the folder containing the images you plan to train / predict pixel classes with.

              NOTE: this is optional / for the user's benefit & for reproducibility, as it is saved in the .json file for 
              retrieval of what images were used, but it is NOT ENFORCED. 
              As in, you can train / predict on totally different images (although that is likely a bad idea unless you are keeping thorough records!)

          categorical (bool): 
              if True, then the classifier is set to return only the category output. If False, then it will 
              return probabilities for each category instead of the final decision

          internal_architecture (list of integers or list[None]): 
              the sizes of any internal neuron layers you wish to add to the ANN_MLP.

          epsilon (float): 
              a learning rate parameter of the ANN_MLP training

          iterations (integer): 
              the number of iterations during ANN_MLP training. 

      Returns:
          cv.ml.ANN_MLP object: 
              an opencv ANN_MLP instance that is ready to train on data in the shape described by sigma/features/channels/classes information

          dictionary: 
              the information that is saved in classifier_X_details.json, holding the key details for properly deriving the image fetaures, etc. for training and prediction

      Inputs / Outputs: 
          Outputs: 
              saves the classifier as classifier_X.json that can be easily imported by an opencv ANN_MLP object
              (this save is done within the _initialize_classifier_dict_and_ANN_MLP() function)

              saves classifier_X_details.json with the following information: channel_dictionary, sigma list, features list.
              These details are saved separately, as they can mess with the simple import of the classifier into the opencv ANN_MLP


   .. py:method:: _write_biolables_csv() -> None

      This helper method writes the biolabel.csv file using the details dictionary.


   .. py:method:: _initialize_classifier_dict_and_ANN_MLP(number_of_channels: int, classifier_name: str = 'classifier_test.json', number_of_classes: int = 2, internal_architecture: list[int] = [], iterations: int = 1000, epsilon: float = 0.01) -> cv2.ml.ANN_MLP

      This helper method write the dictionary of the main .json file containing information like the classifier neuron weights. 
      See self.setup_classifier method for more details on arguments


   .. py:method:: load_saved_classifier(classifier_json_path: Union[pathlib.Path, str]) -> None

      This is an alternate way to use this class -- instead creating a new classifier, load an old one.

      It loads a saved pixel classifier using its main classifier.json and classifier_details.json files 
      The classifier_json_path is the full path, including filename + file extension, to the classifier.json file. 
      The classifier_details.json is expected to be found in the SAME FOLDER with it. 


   .. py:method:: launch_Napari_px(image_path: Union[pathlib.Path, str], display_all_channels: bool = False) -> None

      This launches napari for generating training labels, receving a path (image_path) to the image file you want to make labels for


   .. py:method:: write_from_Napari(output_folder: Union[str, None] = None) -> None

      This saves the training labels to the training labels folder. Will only run if labels have previously been made 
      & this method has not been run already (as this method clears the labels after saving them to the disk)

      Args:
          output_folder (str, Path, or None): What folder to write the training labels to (must exist). If None, will use the default location
          for a Pixel Classifier, same as used in the GUI. 


   .. py:method:: train_folder(image_folder: Union[pathlib.Path, str], labels_dir: Union[str, pathlib.Path, None] = None) -> cv2.ml.ANN_MLP

      This function trains the ANN_MLP classifier using the training labels in the classifier's directory & the corresponding images 
      in the provided image_folder.

      Note: the images in image_folder must have matching names with the training label files. It is fine if training labels does not 
      have every image in image_folder but it is not acceptable vice versa (training labels without a corresponding image in image_folder).

      Training parameters are previously determined when the classifier was set up with self.setup_classifier.

      Features are generated for each image one-by-one, and their pixels inside label layers are collected -- Then the training is performed 
      on those collected pixels together. 

      NOTE: If run on an already-trained classifier, then it is training with initial weights equal to the weights from the prior training (but 
      that probably does not make much of a difference)

      Args:
          image_folder (str / Pathlike): 
              The path to the folder where the .tiff files to predict pixel classes for reside. 

          labels_dir (str / Pathlike, None): 
              The path to the folder where the .tiff files containing the training label information reside. If None (default) will use self.classifier_training_labels. 
              There must be ONLY .tiff files in this folder & then names of the files in this folder MUST match with names of .tiff files in the image_folder. 

      Inputs / Outputs:
          Inputs: 
              reads .tiff files from image_folder / labels_dir (self.classifier_training_labels if labels_dir is None)

          Outputs: 
              writes self.details_dict to the {name}_details.json file


   .. py:method:: predict(image: numpy.ndarray, image_name: str, output_folder: Union[str, None] = None) -> numpy.ndarray[int]

      This runs the provided QuPath classifier on an image (as a numpy array). Currently only limited QuPath classifiers are supported 
      (ANN_MLP only, not local normalization, etc.).

      Args:
          image (numpy array): 
              a numpy array representing the image to be analyzed

          image_name (str): 
              the file name of the image being analyzed, important for properly naming the output mask. Should include the file extension (usually .tiff). 

          output_folder (str, or default = None): 
              if not None, should be a valid directory (as a str) to write the pixel classification file into. If None, will instead write the file into 
              self.output_directory folder

      Returns:
          numpy array: 
              the pixel classification or probability predictions from the classifier. Dimensions match the spatial dimensions of the image.

      Inputs / Outputs:
          Outputs: 
              a pixel classification predict map exported as a .tiff file to self.output_directory/{image_name} or to output_folder/{image_name}, if output_folder is not None.


   .. py:method:: predict_folder(img_folder: Union[pathlib.Path, str], output_folder: Union[str, None] = None) -> None

      This runs the provided PxQuPy classifier on the images in the provided directory, exporting images (calls self.predict for each image in the img_folder).

      Args:
          img_folder (str / Pathlike): 
              the path to a folder containing the images to generate pixel classifications from

          output_folder (str, or default = None): 
              if not None, should be a valid directory (as a str) to write the pixel classifications into. If None, will instead write the files into 
              the self.output_directory folder

      Inputs / Outputs:
          Outputs: 
              pixel classification predict maps exported as .tiff files to self.output_directory/ or to output_folder/, if output_folder is not None.


.. py:class:: UnsupervisedClassifier(homedir: Union[str, pathlib.Path], classifier_name: str, panel: Union[None, pandas.DataFrame] = None, classifier_dictionary: dict = {})

   This class coordinates the creation, training, and prediction from an unsupervised pixel classifier.

   Args:
       homedir (str or Path):
           The PalmettoBUG project directory

       classifier_name (str):
           The name of the pixel classifier being made -- will determine a number of file and folder names.

       panel (pandas DataFrame, or None):
           This data frame is unique to unsupervised Classifiers, and can be added later by setting this classes' panel attribute. See description
           in Key Attributes

       classifier_dictionary (dictionary, or None):
           Training details of the classifier -- not needed if constructing the classifier using the setup_and_train method.
           However, can be used to semi-reload an unsupervised classifier, by reading the _details.json file and supplying the resulting dictionary
           to this argument. 

   Key Attributes:
       panel (pandas dataframe): 
           This panel is unique to unsupervised classifier. It has an 'antigen' column listing all the kept (keep == 1 from the main panel.csv) 
           antigens in the dataset, and then its own 'keep' column to indicate which channels to be used in the classifier training (1 = use, 0 = don't use).
           After these two columns, there are a series of columns whose names correspond to various transformations of the data, such as "HESSIAN_MIN", etc. 

               possible_additional_transformations = ['GRAD_MAG', 
                                                      'HESSIAN_DET', 
                                                      'HESSIAN_MAX', 
                                                      'HESSIAN_MIN', 
                                                      'LAPLACIAN', 
                                                      'STRUCT_CO', 
                                                      'STRUCT_MAX', 
                                                      'STRUCT_MIN', 
                                                      'WGT_STDV']  

           In these columns, the values can either be 0 (meaning don't use this transformation for this channel) or 1, which indicates which transformations
           to use of which channels in the training of the classifier. This allows you to use transformations for certain channels and not for others.

       training_dictionary (dict): 
           contains infomration on the training of the classifier for reproducibility. Gets exported to the disk as a _details.json for future reference.

       flowsom_dictionary (dict): 
           contains information on the trained FlowSOM classifier used in the process of prediciton, including the 
           flowsom.FlowSOM instance itself. It is not saved to the disk as .json (I don't know for certain, but I expect a flowsom.FlowSOM 
           object may not able to be written to the disk so easily. It is fundamentally a neural network, so there should be some way to do it.). 

       classifier_dir (str):
           The directory to the folder where the pixel classifier folder is to be setup

       output_dir (str): 
           The directory to the folder where the pixel classifier will output predictions by default. It is a sub-folder of classifier_dir. 

   The concept for this type of classifier was inspired from the Pixie / Ark-analysis::

           https://github.com/angelolab/ark-analysis?tab=MIT-1-ov-file

   Pixie is licensed under the MIT license

   However, this implementation is essentially fresh, although preserving the key steps 
           (0.999% channel normalization --> FlowSOM --> normalization within pixels) 

   but not the rest of the Pixie code.
   The only part that I'm aware of that borrows more directly from the original Pixie is how the 0.999% channel normalization numbers are 
   aggregated & averaged for all images, instead on an image-by-image basis, as I had originally done. I'm not sure which is better, either.

   This implementation also includes new capacities compared to Pixie, such as the ability to generate QuPath-like features (hessians, laplacians, etc.) as input 
   channels for the FlowSOM, although it is uncertain how useful these additional features are.


   .. py:attribute:: _homedir
      :value: ''


   .. py:attribute:: classifier_name


   .. py:attribute:: panel
      :value: None


   .. py:attribute:: flowsom_dictionary


   .. py:method:: _setup_classifier_directory(classifier_name: Union[None, str] = None) -> None

      Sets up individual classifier's folder 


   .. py:method:: setup_and_train(img_directory: Union[pathlib.Path, str], sigma: float = 1.0, size: int = 500000, seed: int = 1234, n_clusters: int = 20, xdim: int = 15, ydim: int = 15, rlen: int = 50, smoothing: int = 0, suppress_zero_division_warnings=False, quantile: float = 0.999) -> tuple[dict, dict]

      This function performs all the steps required to train an initialized unsupervised classifier.

      Args:
          img_directory (string or Path): 
              The path to a folder containing the .tiff images to train (and presumably predict) from

          sigma (float): 
              sets the extent of Gaussian blurring used to generate features for training 

          size (integer):  
              The number of pixels to sample from the iamges to form the training dataset

          seed (integer): 
              seed for the non-deterministic FlowSOM algorithm

          n_clusters (integer): 
              The number of metaclusters for the FlowSOM algorithm to return. 

          xdim / ydim (integer / integer):  
              The X / Y dimensions of the FlowSOM self-organizing map. xdim * ydim is how many initial 
              points are in the SOM (and so, how many clusters are predicted before merging down to n_clusters)

          rlen (integer): 
              The number of training iterations for the SOM

          additional_features (boolean): 
              whether there are additional features beyond only the gaussian blurred channels. If False, can run faster by skipping unneeded steps.

          smoothing (integer > 0): 
              Whether & how much to smooth the pixel classification made by the FlowSOM. If smoothing = 0, no 
              smoothing is applied. Otherwise, smoothing argument is used as the threshold for the smooth_isolated_pixels() 
              function which removes isolated pixel classifications from a pixel classification map. 
              Saved in the training dictionary, but not applied during training -- only applied later after prediction.

      Returns:
          dictionary: contains the trained flowSOM instance itself, useful for classification of  images

          dictionary:  contains the training parameters, useful for reproducibility / providing a record of how the classifier was trained. 

      Inputs / Outputs: 
          Outputs: 
              in the process of setting up the dictionaries, this method writes 2 .json files to the self.classifier_dir folder


   .. py:method:: predict(image_name: str, img_directory: Union[pathlib.Path, str], flowsom_dictionary: Union[None, dict] = None, output_folder: Union[pathlib.Path, str, None] = None) -> None

      Predicts the pixel classes of a single image

      Args:
          image_name (string):
              A string with the name of the image in the img_directory to make the prediction for.
              You can easily find a list of the possible options for this argument using os.listdir(img_directory)

          img_directory (Path or string):
              the folder of image to predict classes for

          flowsom_dictionary (dictionary or None):
              The dictionary containing the flowsom.FlowSOM instance, as well as the training details of the classifier, which allow it to predict.
              If None, will try to use self.flowsom_dictionary

          output_folder (Path, str or None):
              the folder where the pixel classification predictions will be written. Must already exist or be create-able by os.mkdir()   
              If None, will attempt to writ to self.output_dir (default is 'classification_maps' inside the pixel classifier's directory)

      I / O:
          Inputs:
              read a file from f'{img_directory}/{image_name}'. This file should be a .tiff file with the same number of channels (in the same order) as the 
              .tiff files that the Unsupervised classifier was trained on. Usually, it is the same folder & images for training and prediction.

          Outputs:
              writes a single 2 dimensional, single-channel .tiff file to f'{output_folder}/{image_name}' containing the pixel class predictions.


   .. py:method:: predict_folder(img_directory: Union[pathlib.Path, str], flowsom_dictionary: Union[None, dict] = None, output_folder: Union[pathlib.Path, str, None] = None) -> None

      Applies self.predict method to every image in a supplied folder 

      Args:
          img_directory (Path or string):
              the folder of images to predict classes for. Every .tiff in this folder will have a prediction written for it.

          flowsom_dictionary (dictionary or None):
              The dictionary containing the flowsom.FlowSOM instance, as well as the training details of the classifier, which allow it to predict.
              If None, will try to use self.flowsom_dictionary

          output_folder (Path, string, or None):
              the folder where the pixel classification predictions will be written. Must already exist or be create-able by os.mkdir()

      I / O:
          Inputs:
              reads all the files from f'{img_directory}/'. This file should be a .tiff file with the same number of channels (in the same order) as the 
              .tiff files that the Unsupervised classifier was trained on. Usually, it is the same folder & images for training and prediction.

          Outputs:
              writes 2 dimensional, single-channel .tiff files to f'{output_folder}/' containing the pixel class predictions.


.. py:function:: plot_pixel_heatmap(pixel_folder: Union[str, pathlib.Path], image_folder: Union[str, pathlib.Path], channels: list[str], panel: pandas.DataFrame, silence_division_warnings=False) -> tuple[matplotlib.pyplot.figure, pandas.DataFrame]

   This plots a heatmap derived from the actual data of the pixel class regions predicted by a classifier (unlike plot_class_centers, which uses the training centroids).
   Specifically, it shows the mean of 1%-99% quantile scaled data for each channel in each pixel class.

   Args:
       pixel_folder (str, Path):
           The folder of predictions from a pixel classifier

       image_folder (str, Path):
           The folder of images that the channels intensities will be read from to construct the heatmap. Only files present in BOTH pixel_folder & image_folder
           will be used.

       channels (iterable of strings):
           The names of the antigens to use in the panel. Will be matched against the antigens in panel, and then used to slice the images to only the channels of interest.
           These antigen names are also what will be displayed on the heatmap axes.

       panel (pd.DataFrame):
           The panel file (panel.csv) of the PalmettoBUG project in question. Specifically, panel['keep'] == 0 channels are removed, and then the antigen names in channels
           are matched against the antigen names in panel['name'] to slice the images to only the channels of interest. 

       silence_division_warnings (bool):
           One of the steps of this function involves a lot of division where zero-division / related errors can occur. 
           Will silence these warnings if this parameter == True

   Returns:
       a matplotlib figure and a pandas dataframe containing the values displayed in the plot


.. py:function:: plot_class_centers(flowsom: flowsom.FlowSOM, **kwargs) -> tuple[matplotlib.pyplot.figure, pandas.DataFrame]

   This plots the heatmap of the centroids of the metaclusters of a flowsom. It is useful to identifying what each 
   metaclustering represents biologically. For pixel class work, this means the flowsom generated by an Unsupervised classifier

   Note!: 
       This function plots the centroids of the clusters determined during training, without respect to the predictions.
       For a heatmap that uses the actual data from the pixel classifier post-prediction use plot_pixel_heatmap below.

   Args:
       flowsom (flowsom.FlowSOM):
           Contains the information to be plotted.

   Returns:
       a matplotlib figure and a pandas dataframe 


.. py:function:: segment_class_map_folder(pixel_classifier_directory: Union[pathlib.Path, str], output_folder: Union[pathlib.Path, str], distance_between_centroids: int = 10, threshold: int = 5, to_segment_on: list[int] = [2], background: int = 1) -> None

   Takes pixel classification maps and uses edt + watershedding to segment into objects

   Args:
       pixel_classifier_directory (string or Path):  
           The path to the folder of pixel classification maps to derive segmentations from 

       output_folder (string or Path):  
           the path to a folder where the segmentation masks are to be written. 

       distance_between_centroids(integer):
           the minimum distance between centroids for the watershedding. Higher numbers remove the number of centroids and force them to be farther apart, 
           leading to fewer, larger cell segmentations, whereas lower numbers allow very close centroids, leading to smaller, more numerous segmentations. 

       threshold (integer): 
           objects smaller than this threshold (in pixels) will be removed before edt / watershedding. Objects this small could theoretically be segmented, if the 
           watershedding leads to this occurring. However, would have to happen inside a larger region being watershed from multiple points

       to_segment_on (list of integers): 
           The classes to segment on. They will be merged before running, and usually it is recommended that a dedicated supervised pixel classifier that only 
           finds the objects of interest be used (so usually only 1 class to segment on) 

       background (integer): 
           The background class, which wil be set to zero

   Returns:
       None 
       
   Inputs / Outputs:
       Inputs: 
           reads in all the files in pixel_classifier_directory as .tiff files (MUST NOT have other file types / subfolders)

       Outputs: 
           for each file read-in exports a .tiff file to output_folder


.. py:function:: plot_classes(class_map_folder, output_folder, **kwargs)

   Allows classy masks and pixel classification outputs to be written as .png files

   Args:
       class_map_folder (string or Path):
           The folder from which .tiff files are read for conversioninto .png files. 

       output_folder (string or Path):
           The folder where the PNG files will be written. Should exist or be make-able by os.mkdir()

       **kwargs:
           are passed to matplotlib.pyplot.imshow()


.. py:function:: merge_classes(classifier_mask: numpy.ndarray[int], merging_table: pandas.DataFrame) -> numpy.ndarray[int]

   This function takes in a classifier output (numpy array, dtype = int) and a merging table (pandas DataFrame with a particular format) 
   and outputs a new numpy array where all classses in the original array have been converted to the corresponding value in the merging 
   column of the merging-table dataframe. 

   Args:
       classifier_mask (np.ndarray of integers):
           A pixel class prediction. 

       merging_table (pandas DataFrame):
           The table that details how the original classes of classifier_mask will be merged, and what the final numbers will be
           Has a column 'class' for the current integer class labels of classifier_mask, and column 'merging' denoting what new integer labels
           should be for each of the original classes. 
           
           By convention, as class labeled 'background' should have its merging value set to 0, and NO MERGING CLASS should == 1.
           1 is a special number when merging supevised classifiers, and when classifying cell masks using the 'mode' method.

           Usually also has a column dedicated to the biologically relevant (non integer) labels that each new merging class is intended to
           represent.

   Returns:
       A numpy ndarray (integers), with the same shape as classifier_mask, but with the new merged class labels replacing the original class labels.


.. py:function:: merge_folder(folder_to_merge: Union[pathlib.Path, str], merging_table: pandas.DataFrame, output_folder: Union[pathlib.Path, str] = None) -> None

   This function performs merge_classes() [see function above] on all the images in a provided folder, exporting the merged class map to a 
   specified output folder. if output_folder is None --> then the output is placed in a "/merged_classification_maps" in the same folder as 
   the input folder. 

   Args:
       folder_to_merge (Path or string): 
           the path to the folder containing the classification maps to merge

       merging_table (pandas dataframe): 
           A pandas dataframe containing a 'class' column that denoting a class in the input class maps, and a 
           'merging' column denoting the new values of that class in the merged output class maps. Usually there is also a 'label' column, 
           which denotes the biological label, as a string, that corresponds to each class merging.

           NOTE:
               DO NOT: use the number 1 as one of you merging labels if you intend on doing mode-based cell classification 
               with the merged pixel classifier predictions. 
               DO: use the number 0 as the merging label of 'background' classes -- this will effectively drop them from the merged predictions

       output_folder (Path, string, or None): 
           the path to a the folder where the merged classification maps are to be exported, with the same 
           filenames as the original folder. If None, then the output folder will be a folder parallel to the input folder (as in, both 
           folder will be in the same parent directory), with the name "/merged_classification_maps". 

   Inputs / Outputs:
       Inputs: 
           reads every file inside folder_to_merge one by one (assumes each is a .tiff file, and there are no subfolders!)

       Outputs: 
           writes a new .tiff file into output_folder for every file in folder_to_merge (preserving the same filenames)


.. py:function:: slice_folder(class_to_keep: Union[int, list[int]], class_map_folder: Union[pathlib.Path, str], image_folder: Union[pathlib.Path, str], output_folder: Union[pathlib.Path, str], padding: int = 5, zero_out: bool = False) -> None

   This function performs slice_image_by_region() [a non-public function, see code file] on every image in a folder. 
   This means that each image in the folder will be reduced to the bounding box that contains only the specified classes_to_keep.

   For example: you could use this function, after classifying villi regions of an intestinal tissue section, to reduce the images
   to the minimal rectangle that contains all of the villi class, reducing or removing the unwanted regions of the image.

   Args:
       class_to_keep (integer or a list of integers): 
           The class(es) to subset the images on

       class_map_folder (Path or string): 
           the path to a folder containing the classification maps (as tiffs) that will determine where the images are sliced / subsetted

       image_folder (Path or string): 
           the path to a folder containing the images to be sliced / subsetted

       output_folder (Path or string): 
           the path to a folder where the sliced / subsetted images will be exported as tiffs

       padding (integer > 0): 
           how many pixels to pad the minimal boudning box of the classes_to_keep in each image. Set to 0 to not pad at all

       zero_out (boolean): 
           If True, all pixels not in class_to_keep will have their channels values set to zero, leaving only the classes of 
           interest contributing information to the image. Default is False, which retains the values of pixels not in classes_to_keep 
           as long as they fall within the minimal bounding box of the classes of interest. 

   Returns:
       None

   Inputs / Outputs:
       Inputs: 
           reads every file in the image_folder (as .tiff files), and every file in the class_map_folder (also as .tiff)
           These provided folders MUST NOT have files besides .tiff nor have any subfolders.

       Outputs: 
           outputs a .tiff file to output_folder for every file read-in from image_folder/class_map_folder


.. py:function:: mode_classify_folder(mask_folder: Union[pathlib.Path, str], classifier_map_folder: Union[pathlib.Path, str], output_folder: Union[pathlib.Path, str], merging_table: Union[pandas.DataFrame, None] = None) -> pandas.DataFrame

   This function classifies cells using a pixel classifier and also creates "classy mask" .tiff files which can be useful for merging / expanding
   cell masks. It uses a simplistic method where the mode of the class values inside a cell masks is the class assigned to that mask.

   Args:
       mask_folder (Path or string): 
           the path to the folder containing cell masks (such as those produced by deepcell or cellpose) to be classified

       classifier_map_folder (Path or string): 
           the path to the folder containing the classifier maps that will be used to classify the masks

       output_folder (Path or string):  
           the path to the folder where the 'classy mask' tiff files will be exported

   Returns:
       pandas dataframe: 
           A dataframe with a single column, that denotes the calculated classification for every cell in the dataset. Can be added
           later to a Analysis as an alternative to FlowSOM-based classification of cells.

   Inputs / Outputs:
       Inputs: 
           reads all .tiff files that are in both mask_folder, classifier_map_folder (as in, a file with the same name is in both)

       Outputs: 
           for every read-in file, exports a .tiff into output_folder


.. py:function:: secondary_flowsom(mask_folder: Union[pathlib.Path, str], classifier_map_folder: Union[pathlib.Path, str], number_of_classes: Union[int, None] = None, XY_dim: int = 10, n_clusters: int = 10, rlen: int = 50, seed: int = 42) -> tuple[flowsom.FlowSOM, pandas.DataFrame]

   This function performs a FlowSOM clustering on all the cell regions of a dataset, using the fraction of each pixel class in each cell as 
   its inputs. It is intended as a secondary step of the unsupervised, Pixie-like cell classification pipeline available in PalmettoBUG. 
   Modeled intentionally after the steps of Pixie / Ark-Analysis by the Angelo lab:

           (https://github.com/angelolab/ark-analysis). 

   It is intended to be part of an alternate way to classify cells using pixel classifiers instead of a direct CATALYST-style FlowSOM on the
   cell regions themselves.

   Note that for FlowSOM integer parameters (XY_dim, n_clusters, seed) some reasonable defaults are provided, but these default -- especially 
   n_clusters -- may not be ideal for your data.  

   Args:
       mask_folder (Path or string): 
           the path to a folder containing the cell masks to cluster with FlowSOM

       classifier_map_folder (Path or string): 
           The path to a folder containing ht epixel classification maps to be used to classify the cells' masks. 

               NOTE! >>> The files in mask_folder & classifier_map_folder should have the same filenames! 

       number_of_classes (integer or None): 
           the number of classes in the pixel classifier that generated the maps in classifier_map_folder. 
           If None, this will be empirically determined by reading every classification map in the folder and updating 

       XY_dim (integer): 
           the XY and dimensions of the original FlowSOM grid. (XY_dim * XY_dim) is the number of clusters generated by the 
           FlowSOM algorithm before merging to metaclusters. 

       n_clusters (integer):  
           The number of final metaclusters for the FlowSOM algorithm to output.

       rlen (integer): 
           The number of training iterations of the Self-Organizing Map

       seed (integer): 
           the random state seed to run the FlowSOM algorithm with. For reproducibility of results. 

   Returns:
       tuple(FlowSOM, pandas dataframe):

           1. FlowSOM ('fs') --> a FlowSOM object, trained & predicting from the provided cell information. fs.get_cell_data() or 
           fs.get_cluster_data() supply anndata objects with information
           see: https://flowsom.readthedocs.io/en/latest/generated/flowsom.FlowSOM.html for information about this class

           2. pandas dataframe ('anndata_fs') -->  a pandas dataframe with a single integer column with length equal to the number of cell 
           regions in the masks of the mask_folder, with values reflecting the metacluster prediciton of the FlowSOM algorithm for 
           each cell region. Once these values are merged into biologically relevant labels they can be inserted as column of 
           data.obs in a PalmettoBUG.Analysis created from the same masks 


.. py:function:: classify_from_secondary_flowsom(mask_folder: Union[pathlib.Path, str], output_folder: Union[pathlib.Path, str], flowsom_data: flowsom.FlowSOM) -> pandas.DataFrame

   This function takes the classifications from a secondary FlowSOM and a folder of matching cell masks, and creates 'classy' masks form that. 
   Additionally, returns a single-solumn dataframe with all the classifications from the FlowSOM (this can be  more directly accessed with 
   (flowsom_data.get_cell_data().obs['metaclustering'] + 1)).

       NOTE! >>> The classy masks are 1-indexed because 0 is a special number (background) in images, while the FlowSOM classes are 0-indexed 
               like the majority of python. This is why (flowsom_data.get_cell_data().obs['metaclustering'] + 1) describes the classes 
               accurately in the classy masks, and not just flowsom_data.get_cell_data().obs['metaclustering']. 

   Usually, the classifications here are an intermediate step, with overclustering / excessive clustering being performed as is usual for 
   FlowSOM clustering, and manual merging being a necessary step afterwards to derive biologically useful labels for the cells. 

   Args:
       mask_folder (str or Path): 
           the directory path to a folder containing the cell mask .tiffs that are to be classified with the secondary 
           FlowSOM output. 

           NOTE! >>> the FlowSOM must have been trained / predicted from the same cell masks in the same file order, or the 
                   classification will invalid. 

       output_folder (str or Path): 
           the path to a folder where the "classy masks" will be exported.

       flowsom_data (FlowSOM): 
           The trained/predicted FlowSOM object from which the predictions will be derived. 

   Returns:
       pandas dataframe: 
           a single-column of integers pandas dataframe containing the cell classification assignments from the FlowSOM. 
           It should represent (flowsom_data.get_cell_data().obs['metaclustering'] + 1), where 'flowsom_data' is the input argument 
           to the function.

   Inputs / Outputs:
       Inputs: 
           reads every file in the mask_folder as .tiff file (MUST NOT have other files / subfolders)

       Outputs: 
           for every file read-in, writes a .tiff file inside output_folder


.. py:function:: extend_masks_folder(classifier_map_folder: Union[pathlib.Path, str], mask_folder: Union[pathlib.Path, str], classy_mask_folder: Union[pathlib.Path, str], output_directory_folder: Union[pathlib.Path, str], merge_list: Union[list[int], None] = None, connectivity: int = 1) -> None

   Expands cell masks into a matching region of pixel classification. Can be used, for example, to segment
   irregularly shaped cell types into non-circular masks. Operates on a whole folder of images.

   Args:
       classifier_map_folder (str or Path): 
           the path to a folder of a pixel classifier's output (as .tiff files)

       mask_folder (str or Path): 
           the path to a folder of cell masks (segmentation output as .tiff files) to extend

       classy_mask_folder (str or Path): 
           The path to a folder of "classy masks" as .tiff files
           NOTE! >>> The files in classifier_map_folder, mask_folder, classy_mask_folder should all align with each other, as in:

               --> same file names in the same order

               --> the classy masks should be derived from the masks
               
               --> the numbers of the classy masks should match the numbers of the pixel classifications in the classifier_map_folder 
                   (as in, class 1 should mean the same biological thing in both: for example if class 1 is astrocyte in the class 
                   maps, class 1 must mean astrocyte in the classy masks too in order to have a valid merging/expansion on class 1)

       output_directory_folder (str or Path): 
           the path to a folder where you want to save the expanded cell masks

       merge_list (list of integers, or None): 
           a list of the classes to merge / extend the masks on. if None, then all classes are used -- if 
           there are background classes in the pixel classifier's output, then leaving merge_list = None
           is HIGHLY discouraged, as you are likely to end up with wildly large cell masks.

       connectivity (integer): 
           values = 1 or 2. This determines whether, when performing the final scikit-image watershedding step of the 
           merge / expansion, pixel are considered connected when touching diagonally (2) or not (1). This means 
           connectivity = 2 will (slightly) more aggressively extend the cell masks than connectivity = 1. 
           See: https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.watershed 
           for details of the internal function in which the conectivity parameter is used. 

   Returns:
       None

   Inputs / Outputs:
       Inputs: 
           for every .tiff file shared between all three (classifier_map_folder, mask_folder, classy_mask_folder) input folders. As in,
           every filename present in all three (assumed to be from the same image), this funciton reads in those files. 

       Outputs: 
           For every shared .tiff file read in (really set of three from all input folders), will output one .tiff 
           file in the output_directory_folder


.. py:class:: WholeClassAnalysis(directory: Union[pathlib.Path, str], classifier_df: pandas.DataFrame, metadata: pandas.DataFrame, Analysis_panel: pandas.DataFrame, csv: Union[pandas.DataFrame, None] = None)

   This class handles the whole-class Analysis, where pixel regions are treated as if they are cell segmentation masks

   It has limited options compared to the standard experiment class that handles true single cell data. This class only has a few plot options 
   and a single statistics option, and no batch correction, dropping of samples, or scaling

   Args:
       directory (string or Path): the path to a folder containing /intensities and /regionprops subfolders,
               which would have been produced by running region measurements on the pixel classification maps
               generated by a pixel classifier. 

       classifier_df (pandas dataframe): the biological_labels.csv exported from the pixel classifier whose
               output is being used. Contains "labels", "class", and/or "merge" columns,
               which help associate region numbers in the images / regionpros & intensity csvs
               with the biological labels in the classifier. 

       metadata (pandas dataframe): analogous to the metadata csv file in a standard, single-cell analysis
               Contains the same, file_name, sample_id, patient_id, condition columns

       Analysis_panel (pandas dataframe): analogous to the Analysis_panel csv file in a standard, single-cell analysis.
               For example, contains columns for antigen and marker_class.

       class_type (string): one of -- "premerge", "merged" -- whether the outputs of the classifier are before or after
               merging (relevant for what column in classifier_df is used as the class, "merging" or "class" )

   Key Attributes:
       data (anndata.AnnData): the data, with data.X being a numpy array containing the channel information per "event" (per class per image)
               data.obs being derived from the inputted metadata and data.var being derived from the Analysis_panel

       class_labels (pandas DataFrame): this is the inputted classifier_df, which associates the class numbers with biological labels

       directory (str): The path to the folder where the analysis is to be initialized. Used to set up directories 
               (such as save_dir, data_table_dir), to export some files (input_tables_to_csv) and to find the expected
               intensities / regionprops csv files when loading the data. 

       save_dir (str): The path to where plots are saved by this class (when filename is provided in plotting functions)

       data_table_dir (str): The path to where data tables are saved by this class (when filename is provided to methods that produce
               dataframes such as statistics / exports) 


   .. py:attribute:: directory
      :value: ''


   .. py:attribute:: class_labels


   .. py:attribute:: _metadata


   .. py:attribute:: _panel


   .. py:attribute:: save_dir
      :value: '/python_plots'


   .. py:attribute:: data_table_dir
      :value: '/Data_tables'


   .. py:attribute:: percent_areas
      :value: None


   .. py:method:: _load(csv: Union[pandas.DataFrame, None] = None, arcsinh_cofactor: int = 5) -> None

      Helper to the __init__ method: performs the loading and shaping of data during the initial load.


   .. py:method:: input_tables_to_csv() -> None

      Allows the saving of the primary csv files within this class to the disk inside the self.directory folder


   .. py:method:: plot_percent_areas(filename: Union[str, None] = None, N_column: str = 'sample_id', calculate_only: bool = False) -> matplotlib.pyplot.figure

      Plots a boxplots of percent class in each image, showing and comparing the distributions between conditions

      Returns the plot as a matplotlib figure

      Args:
          filename (str or None):
              If filename is specified, this will export the plot as a PNG file to self.save_dir/{filename}.png

          N_column (str):
              The aggregating group for the data. As in, the individual dots of the distribution in the boxplot will be determined
              by the unique groups in this column.

          calculate_only (bool):
              Default == False. If True (& self.percent_areas == None), this method will not return anything, 
              but instead will perform the calculation of %pixel class in each ROI. This calculation will be 
              saved to self.percent_areas, where it can easily be plotted by this function later. 
              This is implemented to save time by meaning the calculations only have to be done once

      Returns:
          matplotlib.pyplot figure or None (returns None only if calculate_only == True, and no 
          prior calculation of the % areas has been done)


   .. py:method:: plot_distribution_exprs(unique_class: Union[str, int], plot_type: str, N_column: str = 'sample_id', marker_class: str = 'All', filename: Union[str, None] = None) -> seaborn.FacetGrid

      Plots a Bar or Violin plot from the distribution of marker expression / %class in each sample_id, comparing conditions

      Args: 
          unique_class (string or integer):
              Indicates which pixel class to plot antigen expressions for

          N_column (str):
              Indicates which column in the data will serve as the aggregating column for creating the distribution in the final plot

          plot_type (string):
              'Violin' or 'Bar' -- determines what kind of plot is created

          marker_class (string):
              'All', 'type', 'state', or 'none' (or any other marker_class in self.data.var['marker_class']). Determines which antigens are used in the plot
              By default, every antigen, regardless of marker_class is used ('All'). 

          filename:
              If specified, this funciton will additionally export the plot as a PNG file to self.save_dir/{filename}.png

      Returns:
          the plot as a seaborn FacetGrid (FacetGrid.figure --> a matplotlib figure)


   .. py:method:: whole_marker_exprs_ANOVA(marker_class: str = 'All', groupby_column: str = 'class', N_column: str = 'sample_id', variable: str = 'condition', statistic: str = 'mean', area: bool = True) -> pandas.DataFrame

      Calculates statistics comparing the conditions in the experiment using ANOVA on the expression of [marker_class] markers 
      and %area of each class

      Args:
          marker_class (string): which markers / antigens to test by ANOVA. one of -- "All", "type","state", "none". 

          groupby_column (string): which column the data will be grouped by for the purposed of running separate ANOVAs for each group 
                  (as this is whole-class analysis, should always be "class")

          N_column (string): The column in the data that will defines the groups of the statistical test (i.e., the 'N' groups 
                  that contribute to the degrees of freedom in the test)

          variable (string): which column in self.data.obs will be trated as the column containing condition / group information

          statistic (string): one of --"ANOVA", "Kruskal" -- which statistical test (ANOVA, kruskal-wallis), and what aggregate statistic 
                  (mean/std or median/IQR, respectively) is calculated & displayed in the final dataframe

          area (bool): whether to also calculate an ANOVA comparing the %area of each class between the conditions (default is True)

      Returns:
          (pandas dataframe): the pandas dataframe contianing the statistical outputs of this test.


   .. py:method:: plot_heatmap(type_of_stat: str = 'F statistic', filename: Union[str, None] = None) -> matplotlib.pyplot.figure

      Plots a statistics heatmap. -Neg log(statistic) if the statistic is a p value instead of an F statistic


   .. py:method:: export_data(subset_columns: Union[list[str], None] = None, subset_types: Union[list[list[str]], None] = None, groupby_columns: Union[list[str], None] = None, statistic: str = 'mean', groupby_nan_handling: str = 'zero', include_marker_class_row: bool = False, untransformed: bool = False, filename: Union[str, None] = None) -> pandas.DataFrame

      Exports currently loaded data from the Analysis, from self.data. 

      Preserves any previously performed scaling, dropped categories, & batch correction. Always of arcsinh(data / 5) transformed data. Can
      export the entirety of relevant self.data information, or export subsets of self.data, and/or export aggregate summary statistics for 
      groups within the data. 

      Args:
          subset_columns (list[str] or None): 
              a list of strings denoting the columns to subset the dataframe's rows on (here and in other arguments, non-string input is attepmted 
              to be cast to strings inside the function, as well as the corresponding column of the data). if this or subset_types is None, no subsetting occurs. 

          subset_types (list[list[str]] or None): 
              a list contains sub-lists for strings. The length of the upper list must be the length of
              the subset_columns list, as each sub-list contains strings corresponding to the rows to keep. 

                  As in: if subset_columns = ['column1', 'column3'] and subset_types = [['type2', 'type6'],['typeB', 'typeZ']],
                  then rows of type2 / type6 in column1 will be kept, and similarly rows of typeB / typeZ in column2.

              When > 1 columns / conditions are subsetted on, as in the above example, the rows that are kept are the union of 
              all the subsetting conditions WITHIN a given column, but the intersection BETWEEN what is kept from each column. 
              So in the above example, all rows of column1 == type2/6 that also have column2 == typeB/Z are the rows that are maintained.
                                                      
          groupby_columns (list[str] or None): 
              A list of strings indicating what columns of the data to groupby. If None, then grouping is not performed.
              Used like this:    self.data.obs.groupby(groupby_columns)              but on a dataframe containing the data.X values as well

          statistic (str): 
              Possible values: 'mean','median','sum','std','count'. Denotes the pandas groupby method to be used after grouping (ignored if groupby_columns is None).
              Numeric methods (mean, median, sum, std) are only applied to numeric columns, so only those columns + the groupby columns 
              will be in the final dataframe / csv
          
          groupby_nan_handling(str):
              'zero' or 'drop' -- when grouping the data whether to drop (nans), which usually represent non-existent category combinations or to 
              convert nans to zeros. Any other values of this parameter will cause NaNs to be left as-is in the data export
              Note that the default (and only option available in GUI) is 'zero', which converts ALL NaN values to 0, while the 'drop' option only drops
              rows where EVERY numerical value is NaN.
              By default, all possible groupby_columns combinations are included in the export (even if they are not present in the data, such cell types 
              not present in every ROI), This is the source of most NaN values. Notably, columnns in the metadata (not data.obs!) of the Analysis are given special 
              treatment to try to prevent non-existent experimental categories from having data exported (for example, each ROI / sample_id should have been 
              with a single condition, not every possible condition in the dataset). 

          include_marker_class_row (bool): 
              Whether to include the marker_class information as a row at the bottom of the table --> True to 
              include this row -- useful for reimport into PalmettoBUG.
              False to not include this row -- this is probably better for import into non-PalmettoBUG software for analysis,
              or at the least the user will need to remember to remove this row before analyzing!
              When the marker_class row is included, it is encoded as integers (to prevent mixed dtype issues/warnings on reload)
              
                  >>> 0 = 'none', 1 = 'type', 2 = 'state'

              metadata columns (which have no marker_class) have this row filled with 'na'. 
              NOT USED IN COMBINATION WITH GROUPING!

          untransformed (bool):
              if True, will export the untransformed (pre-arcsinh, pre-scaling, etc., etc.) data, from self.data.uns['count'].
              Provided so that the raw data is not difficult to recover, although not expected to be used frequently. Default == False. 

          filename: (str, or None): 
              the name of the csv file to save the exported dataframe inside the self.data_table_dir folder. If None, no export occurs, and the data table is only returned. 

      Returns:
          (pandas DataFrame) -- the pandas dataframe representing the exported data. 

      Inputs/Outputs:
          Outputs: 
              If filename is provided (is not None), then exports the data table to self.data_table_dir/filename.csv


.. py:class:: Analysis(in_gui=False)

   This class is essentially a python port of CATALYST -- but with certain differences, include slightly different calculations / normalizations, 
   additional functions, and missing functions.::

       There are a few broad types of methods, "load_" , "do_" , and "plot_". Methods starting with "do_" tend to execute a transformation or 
       a calculation on the data (such as statistics, UMAP / PCA, or scaling). Those starting with "plot_" always generate a plot, usually returning
       a matplotlib figure.

   Args:
       in_gui (bool):
           Whether this class is inside the GUI (True) or not (False). Used primarily 
           for determining whether to have tkinter pop-up warnings (True) or print-to-console warnings (False)

           Most of the critical steps in setting up an Analysis occurs in the data loading methods, not in the initialization of the class.

   Input / Output:
       if a method contains a "filename" keyword arugment (with default = None), then supplying that argument will trigger the export of 
       the method's return data to the directory. As in, for a plotting method, supplying a filename means that the it will not only return a 
       matplotlib figure as usual, but will ALSO export the figure to the directory as a PNG file, at::

           self.save_dir/{filename}.png

       Methods that return data tables are similar, but export to self.data_table_dir (not self.save_dir)

   Key Attributes:         
       data (anndata.AnnData): This is an anndata object containing the numerical values of the channels in data.X, the event 
           anntotation in data.obs and the antigen annotations in .data.var. Pre-arcsinh transformed data lives in data.uns, 
           but is not used for any function in this pipeline.
           data.obs starts out with the same information as the metadata, except each unique entry in metadata
           (representing a unique sample_id) is replicated across all the sample_id events (there are usually >1 cell per image!). 
           At the same time, data.var starts out the same as panel (truly identical). As clusterings are performed, new columns can be             added to data.obs that did not initially exist in the metadata

       metadata & panel (pandas dataframes): these are the metadata and panel pandas dataframes that get loaded into data.obs and 
           data.var & represent the metadata.csv and Analysis_panel.csv files in the directory of the Analysis. 

       UMAP_embedding & PCA_embedding (anndata.AnnData): usually downsampled from data, these are anndata objects with UMAP or 
           PCA values for plotting in 2 dimensions

       directory (str): the path to the folder where the Analysis is initialized / performed. Used to find the input data and
           set up the directories for outputs. 

       save_dir (string): the path to the folder where plots generated by this class are saved
       
       data_table_dir (string): the path to the folder where datatables (such as exports or statistics) are saved

       clusterings_dir (string): folder where clustering .csv files are saved and expected to be for reload


   .. py:attribute:: _in_gui
      :value: False


   .. py:attribute:: directory
      :value: None


   .. py:attribute:: data
      :value: None


   .. py:attribute:: back_up_data
      :value: None


   .. py:attribute:: back_up_regions
      :value: None


   .. py:attribute:: logger
      :value: None


   .. py:attribute:: clusterings_dir
      :value: None


   .. py:attribute:: _scaling
      :value: 'unscale'


   .. py:attribute:: unscaled_data
      :value: None


   .. py:attribute:: _quantile_choice
      :value: None


   .. py:attribute:: input_mask_folder
      :value: None


   .. py:attribute:: _distance_edt_data
      :value: None


   .. py:attribute:: is_batched
      :value: 0


   .. py:method:: load_data(directory: Union[pathlib.Path, str], arcsinh_cofactor: Union[int, float] = 5, save_dir: str = 'Plots', data_table_dir: str = 'Data_tables', csv: Union[str, pathlib.Path, None] = None, csv_additional_columns: list = [], load_regionprops=True) -> None

      Load the data for an analysis       

      Args:
          directory (string or Path): the path to the directory where the Analysis is to be performed. If csv is None, then the expectation 
                  is that there should be .fcs files inside a subfolder of this directory (specifically inside a /Analysis_fcs subfolder)

          arcsinh_cofactor (integer): Default is 5. If > 0, will transform data according to the following equation
                  >>> data = arcsinh(data / arcsinh_cofactor)

          save_dir & data_table_dir (str): these allow you to specify what self.save_dir and self.data_table_dir will be WITHIN the main
              directory. By default save_dir == "Plots" and data_table_dir == "Data_tables". If you want export outside the main directory,
              Set these attributes later using a full file path string.

          csv (string/Path or None): the path to a csv file containing data ready to import into PalmettoBUG (the format for this kind of data 
                  matches what PalmettoBUG exports in an Analysis). If None (default) then presumes .fcs files are available in the appropriate 
                  folder (directory/Analysis_fcs) and will load from those files.

          csv_additional_columns (list): ONLY used if loading from csv -- this is a list of non-standard column names in csv that are to be treated 
                  as metadata (will end up in self.data.obs) and not as numerical data (destined for self.data.X). The "standard" metadata column
                  names are those commonly encountered in PalmettoBUG operation, such as "sample_id" or "leiden".
                  This is mainly intended to increase flexibility in cases where PalmettoBUG is being used outside the GUI & a novel metadata 
                  category is created.

          load_regionprops (boolean): whether to load the regionprops as well. This is important if you plan on doing any spatial analysis
                  as this loads the centroids, etc. It does not APPEND the regionprops to the anndata object (self.data), and you must call
                  append_regionprops in order to do that.

      Input/Output:
          Input: expects either a .csv file at the path defined by the [csv] argument, or expects a folder of only .fcs files located at [directory]/Analysis_fcs. 


   .. py:method:: _load_fcs(arcsinh_cofactor: Union[int, float] = 5) -> None

      Loads and processes .fcs files from the 'Analysis_fcs' directory, aligns them with metadata,
      applies arcsinh transformation, and stores the result in an AnnData object for downstream analysis.

      Args:
          arcsinh_cofactor (int | float): The cofactor used for arcsinh transformation of intensity values.
                                       If set to 0 or less, no transformation is applied.


   .. py:method:: _load_csv(csv_path: Union[pathlib.Path, str], additional_columns: list = [], arcsinh_cofactor: Union[int, float] = 5) -> None

      Helper for load_data that handles the loading of a csv file (this csv is usually exported from PalmettoBUG as well, and expects a
      particular format that PalmettoBUG can export)

      Args:
          csv_path (str or Path): Full path to the CSV file containing single-cell data (can be from outside the Analysis directory).

          additional_columns (list): List of custom metadata columns to treat as metadata
              (i.e., to include in `obs` rather than `X`). These must not conflict with antigen names. 
              NOTE: Ignored if the csv contains information about type / state/ etc. in the final row (additional metadata columns are automatically identified)

          arcsinh_cofactor (int or float): If > 0, applies arcsinh transformation to expression data
              using: arcsinh(data / cofactor). If 0 or less, no transformation is applied.


   .. py:method:: load_regionprops(regionprops_directory: Union[pathlib.Path, str, None] = None, auto_panel: bool = True) -> pandas.DataFrame

      This method handles the loading of regionprops data (only from FCS directories --  directories from exported CSVs 
      depend on regionprops data already in the CSV, if present).

      Args:
          regionprops_directory (Path, string, None): 
              The path to a folder containing the regionprops .csv files exported during region
              measurements. If None, then assumes this regionprops folder exists in the usual location of an analysis -- i.e., in a 
              /regionprops folder one folder above this class's self.directory

          auto_panel (bool): 
              If True, uses the automatic type / state / none assignments for each region property and proceeds immediately 
              into appending the regionproperties to the dataset. If False, then you can edit the returned dataframe to reflect your 
              desired marker_class assignments, and feed that into self.append_regionprops

      Returns:
          (pandas dataframe): 
              an automatic Regionprops_panel.csv file (mimics an Analysis_panel.csv file, treating each region property like 
              an antigen). Centroid-0 / centroid-1 are set to marker_class 'none', while all other regionprops are left as 'type' markers

      Input/Output:
          Input: 
              reads from the provided regionprops_directory. Expects only .csv files representing regionproperties -- all with the
              same columns of data to allow concatenation -- inside this folder.

          Output: 
              writes a file to  --  self.directory/Regionpprops_panel.csv -- which is the same format as the Analysis_panel.csv, having 
              a row for each "marker", with 3 columns for its name(s) and marker_class (type/state/none). But in this case, 
              the "markers" are not antigens, but regionproperties like eccentricity, area, etc. 


   .. py:method:: append_regionprops(regionprops_panel: Union[pandas.DataFrame, str, pathlib.Path, None] = None) -> None

      Continuation of load_regionprops. Useful if you don't like the automatic type / state / none assignments for each region property.

      This adds the regionprops data & panel to the main anndata object in this class 

      NOTE: don't call more than once!! You can duplicate data columns that way.

      If regionprops_panel is left as None, will read in the Regionprops_panel from self.directory/Regionprops_panel.csv


   .. py:method:: filter_data(to_drop: str, column: str = 'sample_id') -> None

      This function drops all rows matching to_drop in the provided column from self.data. 

      Args:
          to_drop (str):
              The unique value in [column] to drop all cells with that value

          column (str):
              The column in self.data.obs to use in dropping data from the analysis.


   .. py:method:: do_COMBAT(batch_column: str, covariates=None) -> None

      Performs scanpy's combat implementation on self.data. See their documentation for more details

      batch_column specifies a column in self.data.obs to use as the batch grouping for the correction (usually 'patient_id')


   .. py:method:: do_scaling(scaling_algorithm: str = '%quantile', upper_quantile: float = 99.9, split_by_column: str = '') -> None

      This method allows the easy scaling / unscaling of the numerical data in self.data.X. The scaling is always performed down / within 
      columns such that different antigens end up on the same / more similar scale. 

      Args:
          scaling_algorithm (string): 
              one of ["%quantile", "min_max", "standard", "robust", "qnorm", and "unscale"]. If "unscale", will undo any 
              previous scaling -- unscaled data is saved before any other scaling method is performed, allowing easy reversion and 
              switching between scaling methods.
              If a scaling is ever applied after another scaling, the unscaled data is used in the calculations (it is as if the first 
              scaling never happened). Comparison of scaling methods:

                  >> %quantile: This is perhaps the most common method for this kind of data. In it, each column is divided by the value of 
                  its quantile % provided in the upper_quantile argument (this would be the same as dividing by the maximum of each 
                  column if upper_quantile == 100). Then all values > 1 as reduced to 1 so that the scale of the data is constrained.
                  This process is  somewhat reminiscent of thresholding the brightness of an image by choosing a maximum threshold. 

                  >> min_max: This scales each channel / antigen between 0 and 1 by this equation: (values - min) / (max - min). It is 
                  performed by skikit-learn's preprocessing min_max function. 

                  >> standard: This perform standard scaling (scaling as if the data is normally distributed with a mean of 0 and a variance 
                  of 1). It is performed by skikit-learn's preprocessing scale function

                  >> robust: This performs robust scaling using skikit-learn's preprocessing robust_scale function. It is more resistant to 
                  outliers & does not try to scale to normality, unlike standard scaling. 

                  >> qnorm: This method is known for its use in large genomics studies, and uses a particular quantile-based scaling method.
                      implemented by: https://github.com/Maarten-vd-Sande/qnorm. 

          upper_quantile (float): 
              ONLY USED with scaling_algorithm == "%quantile". Determines the upper quantile percentage used in that 
              scaling method
          
          split_by_column (string):
              If not == "", then will attempt to find a columnin self.data.obs matching the provided value, then will 
              split the dataset by unique groups in that column and will perform the selected scaling WITHIN those groups individually, 
              and on the entire dataset at once.     


   .. py:method:: do_leiden_clustering(seed: int = 1234, marker_class: str = 'type', min_dist: float = 0.1, n_neighbors: int = 15, resolution: int = 1, flavor: str = 'leidenalg', try_from_umap_embedding: bool = False) -> None

      Creates a UMAP from all the cells in the dataset and then performs leiden clustering. 
      An alternative to FlowSOM for clustering cells.

      Args:
          seed (int): 
              The random seed for all non-deterministic steps in the clustering pipeline.

          marker_class (string): 
              what channels/antigens to use in the clustering ("type", "state", "none", or all)

          min_dist (float): 
              used in constructing the umap on which the leiden clustering will be performed.

          n_neighbors (integer): 
              used in contructing the nieghbors on which the umap is constructed.

          resolution (integer): 
              used in the ledien clustering itself. Higher numbers favor the finding of more clusters.

          try_from_umap_embedding (boolean): 
              if a UMAP of the entire dataset has been previously performed, set this to True
              to skip the time-consuming steps required for UMAP, and simply use the previously calculated dimensionality 
              reduction. Will not filter for marker_class (assumes that was already done in the creation of the UMAP)

      Returns:
          True or False, depending on whether the marker_class chosen exists in the panel


   .. py:method:: do_flowsom(marker_class: str = 'type', n_clusters: int = 20, XY_dim: int = 10, rlen: int = 15, scale_within_cells: bool = True, seed: int = 1234) -> flowsom.FlowSOM

      Executes FlowSOM clustering on the data.

      Args:
          marker_class (string): 
              what antigens / channels to use in clustering ("type", "state' , "none", or All).

          n_clusters (integer): 
              The final number of metaclusters that cells will be classified into in the "metaclustering" column. This is 
              achieved by merging the over-clustering produced by the SOM (the values in the "clustering" column) down to this number.

          XY_dim (integer): 
              This determines dimensions / points in the initial grid of the self-organizing map, and thereby the initial number 
              of clusters before merging into metaclusters. Specifically, XY_dim*XY_dim will equal the number of initial points in the 
              grid (X & Y dimensinos are often allowed to be specified separately, perhaps I will restore that ability, but don't see really 
              any circumstances where having different X / Y dimensions would be desirable)

          rlen (integer): 
              The number of training iterations. Higher numbers tend to fit the FlowSOM closer to the data / create a more stable
              FlowSOM output (less variation by random seed). However more training iterations takes more time to run. 

          seed (integer):
              the random seed for reproducibility of FlowSOM (which is a non-deterministic algorithm)

      Returns:
          (FlowSOM) The trained FlowSOM object, useful for accessing the various techniques & visualizations available in the FlowSOM package such as minimum spanning trees, etc.


   .. py:method:: _plot_stars_CNs(fs: flowsom.FlowSOM, filename: Union[str, None] = None) -> matplotlib.pyplot.figure

      Plots the minimum spanning tree / star plot from the FlowSOM package

      Args:
          fs (flowsom.FlowSOM):
              Returned by the self.do_flowsom method.

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

      Returns:
          a matplotlib.pyplot figure


   .. py:method:: do_regions(region_folder: Union[pathlib.Path, str]) -> pandas.DataFrame

      (Modified from mode_classify_folder) function to classify cells by the region and sample_id they are in.
      As in, for every matching image in the mask and region folders, looks at the cells in the mask image -- for every cell
      it will check if that cell lies within a region of the region image (the mode of its pixels lies within a region with value > 0).
      Then will assign a label to that cell: 0 if outside a region, or {region#}_{image#} if it does lie within a region. 
      These labels are accumulated into a list which is appended to the Analysis Object


   .. py:method:: _assign_regions(mask: numpy.ndarray[Union[float, int]], region_map: numpy.ndarray[int], image_number: Union[str, int]) -> tuple[numpy.ndarray[float], numpy.ndarray[int], pandas.DataFrame]

      This function iterates through two matching-sized numpy arrays (one representing cell masks & one representing 
      region of the image [these regions are also masks, with background pixels of value == 0]), and returns a list of assigned regions 
      to each of the cells ('0' if not within a region, and {region#}_{image#} if within a region, such as '2_3' for region 2 of the third image).
      The image number must be passed into the function.  


   .. py:method:: _do_spatial_leiden(n_neighbors: int = 15, resolution: int = 1, random_state: int = 42) -> None

      This function takes the centroid information from regionprops (centroid-0 and centroid-1) and calculates a neighborhood graph / leiden 
      clustering for that. 
      This is similar to the use of leiden on UMAPs, just in this case the input to the UMAP is only the physical X / Y coordinates of the 
      centroids.

      Appends the resulting spatial clustering -- which is calculated per image -- to self.data.obs in the format 
      f"{image number}_{cluster number}"  

      Uncertain how useful this is, but it is available      


   .. py:method:: plot_cell_counts(group_by: str = 'sample_id', color_by: str = 'condition', filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots cell counts as a bar plot

      Args:
          group_by (str):
              The column in self.data.obs to use to group / divide the bars of the plot. 

          color_by (str): 
              The column in self.data.obs used to color the bars of the plot

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

      Returns:
          a matplotlib.pyplot figure


   .. py:method:: plot_MDS(marker_class: str = 'type', color_by: str = 'condition', print_stat: bool = False, seed: int = 42, filename: Union[str, None] = None, **kwargs) -> tuple[matplotlib.pyplot.figure, pandas.DataFrame]

      Plots an MDS embedding of the sample_ids in the dataset as a scatterplot, only using the antigens with marker_class [antigens_to_show] 
      in the panel and colored by [color_by] 

      Args:
          marker_class (str):
              Either "type", "state", "none", or "All". Which antigens (see self.data.var) to use to calculate & create the MDS plot

          color_by (str);
              which column in self.data.obs to use to color the samples.

          print_stat (bool):
              whether to export the MDS embedding (True) to self.data_table_dir or not (False, default)

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

          kwargs:
              are passed to seaborn.scatterplot()

      Returns:
          a matplotlib.pyplot figure and a pandas dataframe


   .. py:method:: plot_NRS(marker_class: str = 'type', filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots the non-redundancy scores of each antigen in the category specified in [marker_class] as boxplots, with the distribution 
      deriving from the NRS scores from each sample_id

      Args:
          marker_class (str):
              Either "type", "state", "none", or "All". Which antigens (see self.data.var) to use to calculate the NRS and plot

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

          kwargs:
              are passed to seaborn.boxplot()

      Returns:
          a matplotlib.pyplot figure            


   .. py:method:: plot_ROI_histograms(color_by: str = 'condition', marker_class: str = 'All', suptitle: bool = True, filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plot kde-smoothed histograms of each antigen / channel's expression, with separate lines for separate ROIs, colored by [color_by] 

      Args:
          color_by (str):
              which column in self.data.obs to color the histogram tracings by

          marker_class (str):
              Either "type", "state", "none", or "All". Which antigens (see self.data.var) to display in the plot

          suptitle (bool):
              whether to attempt to add a title to the plot automatically. 

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

          kwargs:
              are passed to matplotlib.pyplot.axis.plot()

      Returns:
          a matplotlib.pyplot figure  


   .. py:method:: do_UMAP(marker_class: str = 'type', cell_number: int = 1000, seed: int = 0, n_neighbors: int = 15, min_dist: float = 0.1, **kwargs) -> None

      Perform the calculations for a UMAP embedding.

      Args:
          marker_class (string): 
              none, type, state, or ALL >> what markers/antigens to use in the UMAP algorithm

          cell_number (integer): 
              The downsampling number. No more than this number of cells will be randomly taken from each sample_id in the process of downsampling

          seed (integer): 
              The random seed used for reproducibility in downsampling and running the UMAP

          kwargs: 
              passed as kwargs into scanpy.tl.umap() call


   .. py:method:: do_PCA(marker_class: str = 'type', cell_number: int = 1000, seed: int = 0) -> None

      Perform the calculations for a PCA embedding.

      Args:
          marker_class (string): 
              none, type, state, or ALL >> what markers/antigens to use in the UMAP algorithm

          cell_number (integer): 
              The downsampling number. No more than this number of cells will be randomly taken from each sample_id in the process of downsampling

          seed (integer): 
              The random seed used for reproducibility in downsampling and running the UMAP


   .. py:method:: _downsample_for_UMAP(anndata_in: anndata.AnnData, max_number: int = 1000, seed: int = 42) -> anndata.AnnData

      Helper for do_UMAP and do_PCA  methods, performs the downsampling of the data, where no more than the supplied max_number of cells
      will be randomly sampled from each sample_id of the anndata_in object. Returns the downsampled data as an anndata object.


   .. py:method:: plot_scatter(antigen1: str, antigen2: str, hue: Union[str, None] = None, filename: Union[str, None] = None, size: Union[int, float] = 1, alpha: Union[int, float] = 0.5, **kwargs) -> matplotlib.pyplot.figure

      Makes a scatterplot of [antigen1] vs. [antigen2], colored by [hue]. Will write a png file from the plot to 
      self.save_dir if filename is not None. 

      Args:
          antigen1 (str):
              The antigen (in self.data.var['antigen']) to plot along the x-axis of the plot

          antigen2 (str):
              The antigen (in self.data.var['antigen']) to plot along the y-axis of the plot

          hue (str):
              If not None, either in self.data.var['antigen'], self.data.obs.columns, or "Density". 
              If None, then no color applied to points in the scatter. If in self.data.var['antigen'], points will be colored 
              by the expression of the provided antigen. If in self.data.obs.columns, points will be colored by category in 
              that column. If 'Density', will attempt to color the plot based on the density of points at that location on the plot

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

          size (numeric):
              the size of the points in the plot

          alpha (numeric between 0-1):
              the transparency of points in the plot. Number closer to 1 mean less transparent points, and vice versa

          kwargs:
              are passed to seaborn.scatterplot()

      Returns:
          a matplotlib.pyplot figure            


   .. py:method:: plot_UMAP(color_by: Union[None, str] = 'metaclustering', palette=None, filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots a UMAP embedding as a scatterplot, colored by [color_by]. Primarily a wrapper on scanpy.pl.umap() method
      See that method's information for more details: https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pl.umap.html

      Args:
          color_by (str or None):
              Either: 1). what column in self.data.obs to color the UMAP cells by 2). what antigen in self.data.var['antigen'] to color
              the UMAP by, or 3). None to have no coloring of points
          
          palette:
              how to color the points. See matplotlib colormaps, or the scanpy link above for more details.
              Example: 'tab20' is a colormap that can be good for plots using a categorical variable (one of self.data.obs columns)
              to color the cells, while 'viridis' or 'coolwarm' can be good for continuous variable (one of self.data.var['antigen'].unique())

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

          kwargs:
              are passed to scanpy.pl.umap()

      Returns:
          a matplotlib.pyplot figure 


   .. py:method:: plot_PCA(color_by: str = 'metaclustering', palette=None, filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots a PCA embedding as a scatterplot, colored by [color_by]. Primarily a wrapper on scanpy.pl.umap() method.
      Even though PCa does not use a scanpy function, self.PCA_embedding is set up in such a way that scanpy.pl.umap() can be used to
      plot it. 
      See that method's information for more details: https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pl.umap.html

      Args:
          color_by (str or None):
              Either: 1). what column in self.data.obs to color the UMAP cells by 2). what antigen in self.data.var['antigen'] to color
              the PCA by, or 3). None to have no coloring of points
          
          palette:
              how to color the points. See matplotlib colormaps, or the scanpy link above for more details.
              Example: 'tab20' is a colormap that can be good for plots using a categorical variable (one of self.data.obs columns)
              to color the cells, while 'viridis' or 'coolwarm' can be good for continuous variable (one of self.data.var['antigen'].unique())

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

          kwargs:
              are passed to scanpy.pl.umap()

      Returns:
          a matplotlib.pyplot figure 


   .. py:method:: plot_facetted_DR_by_antigen(marker_class: list = ['type', 'state'], kind: str = 'UMAP', suptitle: bool = True, number_of_columns: int = 3, filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Like the plot_facetted_DR method below, but specific to when you want to facet by the antigens & color each facet by the respective antigen.
      Notably, this method does not take a color / hue parameter, nor does it need a facetting column, as the assumption of this funciton being called is that
      the antigens are being used for both.

      Args:
          marker_class (list of str);
             A list of the valid marker_class values in the analysis (self.data.var['marker_class', or ]"All", "none", "type", "state", "spatial_edt", ...).
             For each of the marker_classes listed, the antigen's for that class will be included in the final UMAP. This is inclusive, so if "All" if in the list
             it doesn't matter what other classes are listed -- every antigen will be used. Default is ['type', 'state'] so that all except 'none' antigens will be 
             displayed in most cases

          kind (str):
              "umap" or "pca" -- which type of dimensionality reduction is to be used.

          suptitle (bool):
              whether to attempt to automaticaly place a title on the whole plot (instead of only each facet getting a title). Not that this suptitle
              can frequently be oddly placed since the number of facets in the plot changes where the title would most comfortably sit. 

          number_of_columns (integer):
              How many columns to have in the grid of the plot. The number of rows is automatically determined form this number 
              and the number of facets required to plot every antigen.

          filename (str or None):
              if not None, then the filename to save the plot under (as a png) in the self.save_dir folder

          kwargs:
              are passed to matplotlib.pyplot.axis.scatter()

      Returns:
          a matplotlib.pyplot figure. Note that, unlike the subsequent facetted DR method, this plot will contain EVERY cell in the embedding in EVERY facet,
          just the color applied to the points in each plot facet will be different. 


   .. py:method:: plot_facetted_DR(color_by: str, subsetting_column: str, kind: str = 'UMAP', suptitle: bool = True, number_of_columns: int = 3, color_bank: Union[list[str], None] = None, filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots a dimensionality reduction embedding (kind = 'PCA' or 'UMAP'), facetted by the supplied [subsetting_column], each UMAP colord 
      by the supplied [color_by]. 

      Args:
          color_by (str): 
              what column in self.data.obs, or which antigen name (values in self.data.var['antigen'], used to select the expression data in 
              matching column) to color the scatter plots in each facet with. 

          subsetting_column (str): 
              the column in self.data.obs on which to facet the plot. For every unique value in this column, a separate UMAP plot will be created,
              containing only the cells with that unique value displayed. Additionally, the first plot in the facet grid will always be a dimensinoality
              reduction plot containing all the cells. 
              NOTE: the plots will all have the same DR embedding, as dimensionality reduction IS NOT RUN for each subset of cells, instead the last DR (one the whole / 
              downsampled data) of the proper kind will be the embedding used for every cell

          kind (str): 
              "UMAP" or "PCA" -- which kind of dimensionality reduction to attempt to plot

          suptitle (bool): 
              whether or not ot include an automatically generated title. Default = True, but may want to set to False if the 
              suptitle is being placed in the wrong location on the plot (as happens when there are a very large number of subplots)

          number_of_columns (int): 
              how many columns to use in the figure's grid. The number of rows will be automatically determined from this. 
              If the number of total panels or rows is too low, this number may be reduced automatically. 

          color_bank (list of strings or None): 
              a list of strings representing colors that can be recognized by matplotlib.Patch, used to 
              determine the colors on the plot for each group in the color_by column, 
              ONLY if color_by is in self.data.obs.columns and NOT if colorby is in self.data.var['antigen']

          filename (str or None): 
              If not None, then will attempt to write the figure produced to self.save_dir/{filename}
              INCLUDE the file extension in this string! (usually .png)

          kwargs: 
              keyword arguments passed on to each matplotlib.axis.scatter() call. 

      Returns:
          a maptlotlib figure, the facetted plot of UMAPs. The first UMAP is always the un-facetted (all data together) UMAP

      Inputs / Outputs:
          Outputs: 
              if filename is provided, then the matplotlib figure will also be written as a file to -- self.save_dir/{filename}


   .. py:method:: plot_medians_heatmap(marker_class: str = 'type', groupby: str = 'metaclustering', scale_axis: Union[None, int] = 0, subset_df: pandas.DataFrame = None, subset_obs: pandas.DataFrame = None, colormap='coolwarm', figsize: tuple[Union[int, float], Union[int, float]] = (10, 10), filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots a heatmap in a manner similar to CATALYST by first taking the median of each channel in each category of [groupby] column, then
      %quantile normalizing the medians from 1%-99% across the antigens.

      Args:
          filename (string): 
              the filename for exported heatmap

          marker_class (string): 
              none, type, state, or All >>> what markers / antigens to use in the heatmap

          groupby (string): 
              a name of a column in self.data.obs to group the data by (usually 'metaclustering','clustering','merging',
              'classification','leiden'). 
              groupby can be:

                      "metaclustering" --> cluster heatmap
                      "sample_id"   --> heatmap by ROI
                      "merging" / etc. --> heatmap by arbitrary column in self.data.obs

          scale_axis (integer or None):
              Either None, 0 or 1 -> Which axis of the final median array to scale along before plotting. Default is 0, to scale within antigens.
              (0 --> scale within antigen, 1 --> scale within groupby categories, None --> scale medians across the entire array)

          subset_df (pandas DataFrame or None): 
              a dataframe equivalent to self.data.X with column names = self.data.var.index allows 
              custom / transformed / subsetted data to be introduced into this plotting method without needing to edit / transform
              self.data directly. If None, then self.data will be used to create the plot and subset_obs argument will be ignored.
              Requires a paired subset_obs dataframe.

          subset_obs (pandas dataframe or None): 
              an equivalent to self.data.obs, paired with subset_df argument

          figsize (tuple of numerics): 
              X / Y dimension sizes of the plot

          kwargs: 
              passed in seaborn.clustermap() call

      Returns:
          a matplotlib.pyplot figure


   .. py:method:: _plot_facetted_heatmap(filename: str, subsetting_column: str, groupby_column: str = 'metaclustering', marker_class: str = 'type', number_of_columns: int = 3, suptitle: bool = True, **kwargs) -> str

      Calls plot_medians_heatmap iteratively to plot a facetted heatmap, facetted on the unique categories in [subsetting_column].

      Unique in that this function only exports an .SVG file to the disk and return only the path to that file (does not return the plot 
      like the other functions)

      This function is old, and not well-tested / supported so it may have errors! Also this depends on svg_stack, which is no longer a 
      mandatory dependency of PalmettoBUG


   .. py:method:: do_cluster_merging(file_path: Union[str, pathlib.Path], groupby_column: str = 'metaclustering', output_column: str = 'merging') -> None

      Creates a "merging"" column inside self.data.obs by merging & annotating an existing column in self.data.obs [groupby_column]

      Args:
          file_path (str):
              The full file path to a .csv file. This csv file will be read-in as a pandas dataframe. This dataframe is expected to 
              have at least two columns:
                  -- "original_cluster" 

                  -- "new_cluster" 
          
          groupby_column (str):
              the name of the column in self.data.obs whose values are being merged / annotated. The unique values in this column should correspond
              to the values in the 'original_cluster' column of the read-in dataframe described above. Usually, this is either "metaclustering" or "leiden"
              but it does not have to be

          output_column (str):  
              the name of a new column that will be inserted into self.data.obs. This column will contain the annotated / assigned values from the 
              read-in dataframe. As in, the "original_cluster" values in groupby_column will be replaced with their corresponding "new_cluster" values
              and the new column added as self.data.obs[output_column]


   .. py:method:: plot_cluster_distributions(groupby_column: str = 'metaclustering', marker_class: str = 'type', plot_type: str = 'violin', comp_type: str = 'raw', filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plot the distribution of marker expression within groups of cells. Violin or bar plots.

      Args:
          groupby_column (string): 
              The column in self.data.obs to group the cells by (usually a way of identifying cell types, 
              like metaclustering or merging, but can be a different grouping like sample_id)

          marker_class (string, "All","type","state", or "none"): 
              what type of antigen to include in the plot

          plot_type (string, "violin" or "bar"): 
              whether to plot a violin or bar plot

          comp_type (string, "vs" or "raw"): 
              whether to display the raw values of marker expression or to display the difference between each 
              cluster and the rest of the dataset. As in, if == "vs", then the data for each cluster will have the mean expression of rest 
              of the clusters substracted from it before plotting. 

          filename (string, or None): 
              If not None, the name of the .png file to be saved in experiment.save_dir

          kwargs: 
              passed into seaborn.catplot() call

      Returns:
          a matplotlib figure


   .. py:method:: plot_cluster_histograms(antigen: str, groupby_column: str = 'metaclustering', filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots kde-smoothed histogram of a particular marker / antigen's expression across all the clusters in the supplied [groupby_column] column

      Args:
          antigen (str):
              one of the values in self.data.var['antigen']. Determines which antigen in the dataset to plot

          groupby_column (string): 
              The column in self.data.obs to group the cells by (usually a way of identifying cell types, 
              like metaclustering or merging, but can be a different grouping like sample_id). Creates facets 
              of the plot

          filename (string, or None): 
              If not None, the name of the .png file to be saved in experiment.save_dir

          kwargs: 
              passed into matplotlib.pyplot.axis.plot() for each facet of the plot

      Returns:
          a matplotlib figure


   .. py:method:: plot_cluster_abundance_1(groupby_column: str = 'metaclustering', bars_by: str = 'sample_id', number_of_columns: int = 3, filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots a stacked barplot (where the stacks all add up to 1) of the ratios of each cell type from the supplied [groupby_column]
      column in each sample_id, facetted by condition.

      Args:
          groupby_column (str):
              The name of a column in self.data.obs to divide the stacks of the barplot by
              NOTE: the bars of the barplot are ALWAYS separated by self.data.obs['sample_id']
                  and the plot is ALWAYS facetted into multiple panels on self.data.obs['condition']
          
          number_columns (integer):
              How many columns in the plot / when to warp the facets of the plot. For example, if your dataset has
              5 conditions, and you supply a value == 3 here, then the first three conditions will be plotted in the first row
              and the remaining two conditions will be plotted in the second row.

          filename (string, or None): 
              If not None, the name of the .png file to be saved in experiment.save_dir

          kwargs: 
              passed into seaborn.objects.Plot()

      Returns:
          a matplotlib figure


   .. py:method:: plot_cluster_abundance_2(groupby_column: str = 'metaclustering', N_column: str = 'sample_id', hue: str = 'condition', plot_type: str = 'barplot', filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots the abundance of each celltype (from the supplied [groupby_column] column in self.data.obs) in each sample id as each a bar, 
      box, or a stripplot (with plot_type == "barplot","boxplot","stripplot"). 

      Separate boxplot / stripplots are made from each condition in the supplied [hue] column to allow comparisons.

      Args:
          groupby_column (str):
              The name of a column in self.data.obs to facet the bar / box / strip plot into multiple panels

          N_column (str):
              The name of the column in self.data.obs that determines what individual units compose the distribution of the boxplot.
              This function does not do a statistical test, but the groups of this column would correspond to the N used to determine 
              variance / degrees of freedom in a t-test.
              NOTE: a key assumption is that the categories in this column are NEVER shared between hue categories. This holds for the defaults 
              (each unique ROI / sample_id can only have one condition assigned to it) but must also be true for any alternate column used. 

          hue (str):
              The name of a column in self.data.obs to separate & color columns of the plots by

          plot_type (str):
              either "barplot", "boxplot", or "stripplot". Determines which type of plot use on each sub-panel.

          filename (string, or None): 
              If not None, the name of the .png file to be saved in experiment.save_dir

          kwargs: 
              passed into seaborn.{bar/box/strip}plot()

      Returns:
          a matplotlib figure


   .. py:method:: do_cluster_stats(groupby_column: str = 'metaclustering', N_column: str = 'sample_id', marker_class: str = 'type') -> dict[Union[str, int], pandas.DataFrame]

      Calculated statistics by pairwise ANOVAs (effectively a t-test) between each cluster's marker expression and the marker expression of 
      the rest of the dataset. Instead of using all the cells individually, an average is taken of each sample_id 

      Args:
          groupby_column (string): 
              The column in the self.data.obs dataframe to group the cells for making comparison between unique value in 
              this column (usually a celltype column, like "metaclustering", but could be something else, like condition or sample_id)

          N_column (string):
              The column in self.data.obs that determines the "N" for the statistical test (data is aggregated by this before the test and it
              helps determine what the degrees of freedom are in the test.)
              NOTE: unlike other instances of N_column in palmettobug functions, it is possible groups within this column to be shared within the conditions,
              as the comparison of interest is usually on the cell type level, not between conditions.

          marker_class (string == "All", "type", "state", or "none" ): 
              what markers to include in the comparison. Usually "type", should typically match the markers used to generate the cell clustering / groupby being compared.

      Returns:
          a dictionary with keys = unique values of the groupby_column, and values = pandas dataframes containing the statistics for that 
          groupby_column value. This dictionary is also saved as self.df_out_dict, from which it is accessed by the self.plot_cluster_stats method


   .. py:method:: plot_cluster_stats(statistic: str = 'FDR_corrected', filename: Union[str, None] = None, **kwargs) -> matplotlib.pyplot.figure

      Plots a heatmap of from cluster statistics calculated with the method self.do_cluster_stats. 

      Args:
          statistic (str):
              which column of the output of self.do_cluster_stats() (aka, self.df_out_dict) to plot. Can be "F_statistic", "p_values", or "FDR_corrected"
              p-value stats will be transformed by the -log(stat) before plotting, so that higher values correspond with greater significance

          filename (str, or None):
              if not None, will determine the filename of the plot saved to self.save_dir

      Returns:
          a matplotlib.pyplot


   .. py:method:: do_abundance_ANOVAs(groupby_column: str = 'merging', variable: str = 'condition', N_column: str = 'sample_id', conditions: list[str] = [], filename: Union[str, None] = None) -> pandas.DataFrame

      Performs pairwise ANOVA tests (or effectively, a t-test) between two provided conditions in the self.data.obs['condition'] column, looking at 
      the abundance (as % in each sample_id) of the cell types specified in the groupby_column. 

      Args:
          groupby_column (str): 
              The column in self.data.obs where the cell type information is contained

          variable (str): 
              The column in self.data.obs where the independent variable information is found (default = 'condition')

          N_column (str):
              The column in self.data.obs that determines the aggregation (and downstream from this, the degrees of freedom) for the statistical test.
              NOTE: a key assumption is that the categories in this column are NEVER shared between conditions -- aggregation on this column
              is done BEFORE comparison of conditions. This holds for the defaults (each unique ROI / sample_id can only have one condition assigned to it)
              but must also be true for any alternate column used. 

          conditions (list of strings or empty list): 
              list of unique values in self.data.obs[variable] to be compared by ANOVA if None, then wil perform an ANOVA test on all the conditions in the dataset. 

          filename (str or None):
              if not None, determines the filename that the output dataframe will be saved to inside the self.data_table_dir folder.

      Returns:
          (pandas dataframe) representing the statistics calculated by this function


   .. py:method:: do_count_GLM(conditions: list[str], variable: str = 'condition', groupby_column: str = 'merging', N_column: str = 'sample_id', family: str = 'Poisson', filename: Union[str, None] = None) -> pandas.DataFrame

      Performs a statistical test on cell abundance / cell count using generalized linear models. 

      Cell counts are taken for each sample_id in each condition and then those aggreagated per-sample_id numbers are used in the GLM.

      Args:
          conditions (list of strings): 
              conditions to compare. In GUI, either pairwise or all possible conditions at once are compared.

          variable (string): 
              the column in self.data.obs that will be treated as the independent variable for the test. Almost always 'condition'

          groupby_column (string): 
              the column in self.data.obs that contains the cell type information from which counts / abundance will be calculated

          N_column (string):
              the column in self.data.obs that contains the replication N grouping (data is aggregated by this grouping
              before the statistical test, and relates to the number of degrees of freedom in the test). Usually only sample_id or patient_id. 
              NOTE: a key assumption is that the categories in this column are NEVER shared between conditions -- aggregation on this column
              is done BEFORE comparison of conditions. This holds for the defaults (each unique ROI / sample_id can only have one condition assigned to it)
              but must also be true for any alternate column used. 

          family (string -- "Poisson", "NegativeBinomial"): 
              The distribution to use in the GLM. Can be "Poisson" or "NegativeBinomial". Other distributions, such as "Gaussian" and "Binomial" are 
              not recommended or not currently configured properly. 

          filename (string or None):  
              the filename for the csv exported into self.data_table_dir. If None, no such file is exported

      Returns:
          pandas dataframe: Summary statistics from the results of the model

      Inputs/Outputs:
          Outputs: 
              If filename is provided (is not None), then exports the summary statistic table to self.data_table_dir/filename.csv


   .. py:method:: plot_state_distributions(marker_class: str = 'state', subset_column: str = 'merging', colorby: str = 'condition', N_column: str = 'sample_id', grouping_stat: str = 'median', wrap_col: int = 3, suptitle: bool = False, figsize: tuple[Union[int, float], Union[int, float]] = None, filename: Union[None, str] = None) -> matplotlib.pyplot.figure

      Plots a facetted boxplot of the expression of a specified marker_class (usually 'state'), split into various cell groupings 
      (subset_column, usually 'merging') per panel, comparing on colorby (usually 'condition'). 
      Aggregates within each sub-group first by N_column (usually 'sample_id') using the aggregation statistic specified in grouping_stat, 
      so that the boxplots aren't overwhelmed trying to plot thousands of individual cells. 

      Args:
          marker_class (string):
              What marker_class of antigens to use in the plot. Either 'type','state' (default), 'None', or 'All'. 

          subset_column (string):
              The name of a categorical column in self.data.obs to group the cells by. These groupings will constitute the panels of the final
              plot. 

          colorby (string):
              The name of a categorical column in self.data.obs to group the cells by, typically 'condition'. These groups will define how the 
              boxplots in each panel are colored. 

          N_column (string):
               The name of a categorical column in self.data.obs to group the cells by, typically 'sample_id'. It is recommended to not change this
               as errors / strange looking plots are likely with any other value. It specifies how the data is aggregated before plotting,
               as plotting every cell for a large dataset is likely to make the boxplot too confusing, as there can be far too many outlier
               points on the plot.
               NOTE: a key assumption is that the categories in this column are NEVER shared between conditions -- aggregation on this column
               is done BEFORE comparison of conditions. This holds for the defaults (each unique ROI / sample_id can only have one condition assigned to it)
               but must also be true for any alternate column used. 

          grouping_stat (string):
              How to aggregate the data using the N_column parameter -- as in, take the 'mean' of the sample_id's or the 'median' before plotting?

          wrap_col (integer):
              how many panels per column of the facetted plot before wrapping and starting a new row of boxplots

          suptitle (boolean):
              whether to include an automatically generated title at the top of the boxplot or not

          figsize (tuple of two numerics):
              The size, in inches, of the final plot's dimensions. Used in the matplotlib.pyplot.subplots() function

          filename (None, or string):
              If not None, then this method will write the plot as a .png file to the folder specificed by self.save_dir using the provided
              filename. This filename should not include the file extension (the extension is always .png, and is automatically supplied by this
              method). If None, then the figure is not written to the hard drive.

      Returns:
          matplotlib.pyplot figure

      Inputs/Outputs:
          Outputs: 
              If filename is provided (is not None), then exports the figure as a .png file


   .. py:method:: plot_state_p_value_heatmap(stats_df: Union[None, pandas.DataFrame] = None, top_n: int = 50, heatmap_x: list[str] = ['condition', 'sample_id'], ANOVA_kwargs: dict = {}, include_p: bool = True, figsize: tuple[Union[int, float], Union[int, float]] = (10, 10), filename=None) -> matplotlib.pyplot.figure

      Plots a heatmap of the top most significantly differences found with the self.do_state_exprs_ANOVAs() method

      Presumes the supplied stats_df matches the format exported by self.do_state_exprs_ANOVAs() method! 
      Including structure, columns, rank ordering by F-statistic top-to-bottom, etc.

      Args:
          stats_df (None, or a pandas dataframe):
              A pandas dataframe with marker expression statistics -- the returned output of the self.do_state_exprs_ANOVAs() method.
              If None, then stats_df = self.do_state_exprs_ANOVAs(**ANOVA_kwargs) will be run to generate the statistics dataframe.

          top_n (integer):
              How many of the top (order by F-statistic) antigen expression changes to plot on the heatmap. Default = 50

          ANOVA_kwargs (dictionary):
              Only used if stats_df is None. Provides the parameters of self.do_state_exprs_ANOVAs method, as in
              stats_df = self.do_state_exprs_ANOVAs(**ANOVA_kwargs) will be run first before the heatmap is generated from the
              stats_df.

          heatmap_x (list of strings):
              A list of column names in self.data.obs that determine how the data will be grouped for the x-axis of the heatmap.
              The values of the heatmap tiles are the median expression of the antigen of interest in these groups.
              NOTE: The y-axis of the heatmap is already determined by antigen/cellgrouping pairs in stats_df, and if the cell grouping
              used to calculate statistics was not the entire dataset, then it will also be used in grouping the data to calculate medians
              for the heatmap, along with the columns specified by this parameter. 
                  As in, let's say state marker statistics were calculated between cell types defined in a 'merging' column, while heatmap_x
                  was set to be ['condition','sample_id'] (the default) to group the data by each ROI, along with its treatment label -->
                  Then, on the heatmap, the data will be grouped by sample_id, condition, and merging -- then the median taken of those groups
                  and plotted on the heatmap.

          include_p (boolean):
              whether to include an additional column of the heatmap for the p-values associated with each row of the statistics calculated
              in the stats_df. This column's values do not come ffrom the groupings explained above, but directly from the adjusted p-values
              of the statistics table, transformed as follows:

                  heatmap_value = -Log(adj_p_value)

              NOTE that the negative log of 0.05 is ~1.3.

          figsize (tuple of 2 numerics):
              The dimensions of the final plot, in inches. Used in the matplotlib.pyplot.subplots call

          filename (None, or string):
              If not None, then this method will write the plot as a .png file to the folder specificed by self.save_dir using the provided
              filename. This filename should not include the file extension (the extension is always .png, and is automatically supplied by this
              method). If None, then the figure is not written to the hard drive.

      Returns:
          matplotlib.pyplot figure

      Inputs/Outputs:
          Outputs: 
              If filename is provided (is not None), then exports the figure as a .png file


   .. py:method:: do_state_exprs_ANOVAs(marker_class: str = 'state', groupby_column: str = 'merging', variable: str = 'condition', N_column: str = 'sample_id', statistic: str = 'mean', test: str = 'anova', conditions: list[str] = [], filename: Union[str, None] = None) -> pandas.DataFrame

      Performs statistical comparison of marker expression within cell types between conditions using ANOVA

      Aggregates marker expression using mean or median within each unique [sample_id + groupby] column combination and then compares across 
      conditions

      Args:
          marker_class (str): 
              one of -- "All", "type", "state", "none" -- determines what markers are compared. 
              See: self.data.var['marker_class'] or the Analysis_panel file

          groupby_column (str): 
              the column title of the cell type column in self.data.obs. Usually a string, but theoretically could be any allowed in a pandas dataframe column title. 

          variable (str): 
              the column of self.data.obs containing the independent variable / condition. Default = 'condition;

          N_column (stR):
              the column of self.data.obs that carries the experimental unit. i.e., the data will be aggregate based on this column to construct the 
              distributions of the final statistical comparison and the number of degrees of freedom in the test could be described as:
                  degrees_of_freedom = len(self.data.obs[N_column].unique()) - len(self.data.obs[variable].unique()) 
              As in, N - the number of comparisons.
              NOTE: a key assumption is that the categories in this column are NEVER shared between conditions -- aggregation on this column
              is done BEFORE comparison of conditions. This holds for the defaults (each unique ROI / sample_id can only have one condition assigned to it)
              but must also be true for any alternate column used. 

          statistic (str): 
              one of -- "mean", "median" -- which aggregation statistic to use

          test (str):
              'anova' or 'kruskal' -- The statistical test to perform (ANOVA or Kruskal-Wallis)

          conditions(list of str): 
              if empty (default) will use all the unique condiitons in self.data.obs[variable]. Otherwise, will only compared the conditions in the this 
              list -- values in this list should be values in self.data.obs[variable], values not in this will be ignored. 

          filename (str or None): 
              If not None, the name for saving the output datatable as a csv file in self.data_table_dir

      Returns:
          (pandas dataframe) the summary statistics of the ANOVA tests

      Inputs/Outputs:
          Outputs: 
              If filename is provided (is not None), then exports the summary statistic table to self.data_table_dir/filename.csv


   .. py:method:: export_data(subset_columns: Union[list[str], None] = None, subset_types: Union[list[list[str]], None] = None, groupby_columns: Union[list[str], None] = None, statistic: str = 'mean', groupby_nan_handling: str = 'zero', include_marker_class_row: bool = False, untransformed: bool = False, filename: Union[str, None] = None) -> pandas.DataFrame

      Exports currently loaded data from the Analysis, from self.data. 

      Preserves any previously performed scaling, dropped categories, & batch correction. Always of arcsinh(data / 5) transformed data. Can
      export the entirety of relevant self.data information, or export subsets of self.data, and/or export aggregate summary statistics for 
      groups within the data. 

      Args:
          subset_columns (list[str] or None): 
              a list of strings denoting the columns to subset the dataframe's rows on (here and in other arguments, non-string input is attepmted 
              to be cast to strings inside the function, as well as the corresponding column of the data). if this or subset_types is None, no subsetting occurs. 

          subset_types (list[list[str]] or None): 
              a list contains sub-lists for strings. The length of the upper list must be the length of
              the subset_columns list, as each sub-list contains strings corresponding to the rows to keep. 

                  As in: if subset_columns = ['column1', 'column3'] and subset_types = [['type2', 'type6'],['typeB', 'typeZ']],
                  then rows of type2 / type6 in column1 will be kept, and similarly rows of typeB / typeZ in column2.

              When > 1 columns / conditions are subsetted on, as in the above example, the rows that are kept are the union of 
              all the subsetting conditions WITHIN a given column, but the intersection BETWEEN what is kept from each column. 
              So in the above example, all rows of column1 == type2/6 that also have column2 == typeB/Z are the rows that are maintained.
                                                      
          groupby_columns (list[str] or None): 
              A list of strings indicating what columns of the data to groupby. If None, then grouping is not performed.
              Used like this:    self.data.obs.groupby(groupby_columns)              but on a dataframe containing the data.X values as well

          statistic (str): 
              Possible values: 'mean','median','sum','std','count'. Denotes the pandas groupby method to be used after grouping (ignored if groupby_columns is None).
              Numeric methods (mean, median, sum, std) are only applied to numeric columns, so only those columns + the groupby columns 
              will be in the final dataframe / csv
          
          groupby_nan_handling(str):
              'zero' or 'drop' -- when grouping the data whether to drop (nans), which usually represent non-existent category combinations or to 
              convert nans to zeros. Any other values of this parameter will cause NaNs to be left as-is in the data export
              Note that the default (and only option available in GUI) is 'zero', which converts ALL NaN values to 0, while the 'drop' option only drops
              rows where EVERY numerical value is NaN.
              By default, all possible groupby_columns combinations are included in the export (even if they are not present in the data, such cell types 
              not present in every ROI), This is the source of most NaN values. Notably, columnns in the metadata (not data.obs!) of the Analysis are given special 
              treatment to try to prevent non-existent experimental categories from having data exported (for example, each ROI / sample_id should have been 
              with a single condition, not every possible condition in the dataset). 

          include_marker_class_row (bool): 
              Whether to include the marker_class information as a row at the bottom of the table --> True to 
              include this row -- useful for reimport into PalmettoBUG.
              False to not include this row -- this is probably better for import into non-PalmettoBUG software for analysis,
              or at the least the user will need to remember to remove this row before analyzing!
              When the marker_class row is included, it is encoded as integers (to prevent mixed dtype issues/warnings on reload)
              
                  >>> 0 = 'none', 1 = 'type', 2 = 'state'

              metadata columns (which have no marker_class) have this row filled with 'na'. 
              NOT USED IN COMBINATION WITH GROUPING!

          untransformed (bool):
              if True, will export the untransformed (pre-arcsinh, pre-scaling, etc., etc.) data, from self.data.uns['count'].
              Provided so that the raw data is not difficult to recover, although not expected to be used frequently. Default == False. 

          filename: (str, or None): 
              the name of the csv file to save the exported dataframe inside the self.data_table_dir folder. If None, no export occurs, and the data table is only returned. 

      Returns:
          (pandas DataFrame) -- the pandas dataframe representing the exported data. 

      Inputs/Outputs:
          Outputs: 
              If filename is provided (is not None), then exports the data table to self.data_table_dir/filename.csv


   .. py:method:: export_DR(kind: str = 'umap', filename: Union[str, None] = None) -> tuple[pandas.DataFrame, str]

      Exports a dimensionality reduction embedding (PCA or UMAP)

      Args:
          kind (str): 
              one of -- "umap", "pca" -- the type of embedding to export

          filename (str or None): 
              the filename of the csv file to export to self.data_table_dir. 
              if None, no export occurs, only the dataframe is returned

      Returns:
          (pandas dataframe) this contains three columns, the dim1/2 of the embedding + the cell number as in self.data (needed as downsampling is typically used for DR)

      Inputs/Outputs:
          Outputs: 
              If filename is provided (is not None), then exports the UMAP/PCA table to self.data_table_dir/filename.csv


   .. py:method:: export_clustering(groupby_column: str = 'metaclustering', identifier: str = '') -> tuple[pandas.DataFrame, str]

      Saves a clustering to self.clusterings_dir as a csv. ALWAYS exports to the disk

      These saved clustering files are used for both reloading a clustering later and for loading info into a Spatial analysis.
      The filename of a clustering is its (groupby_column + identifier).csv

      Args:
          groupby_column (str): 
              the title of the column in the self.data.obs that represents a particular cell type clustering to save.
              this string forms the first part of the filename of the csv saved to self.clusterings_dir
              For the sake of reloading, expected to be one of: 
                     
                  -- "classification", "leiden", "merging", "metaclustering", "clustering" --

              If groupby_column == "", then all expected groupby columns, as listed above,
              will be attempted to be added to the exported file.

          identifier (str): 
              a string that forms the second part of the filename of the csv saved to self.clusterings_dir

      Returns:
          (pandas dataframe): the pandas dataframe that is written to the csv file at the export path

          (str): The path to the exported csv file

      Inputs/Outputs:
          Inputs: 
              Attempts to read from directory above self.directory / regionprops --> looking for regionproperty data to export with the 
              clustering, which can be used in spatial analysis. 

          Outputs: 
              Writes the clustering data table to self.clusterings_dir/{groupby_column}{identifier}.csv


   .. py:method:: export_clustering_classy_masks(clustering='merging', identifier='')

      Intent of this function is to write a "classy mask" folder from an annotation

      NOTE: 
          This method depends on the original mask folder and the analysis being linked properly.
          If data has been dropped from the analysis, then those dropped cells will either be ignored (if
          an entire sample_id was dropped), or they will be assigned to a 'none' label

      Uses: visualization, mainly. Perhaps could be used in extending masks

      Args:
          clustering (str):
              A column in self.data.obs to categorize the cells by. Each unique value in this column will receive a unique integer number
              to classify its cell mask by.

          identifier (str):
              if not the empty string '', will be appended to the name of the saved classy mask folder / CSV, This name will follow the convention
              f'{name of the original cell masks folder}_{identifier}'. Use this make sure that the resultant classy masks have a memorable / 
              distinct name. 

      Returns:
          a pandas dataframe containing the clustering assignments of every cell in the style of a classy_mask, including, critically
          the integer assigned to each cluster in the classy masks

      Inputs / Outputs:
          Inputs:
              expects to find cell masks at self.input_mask_folder, whose masks correspon to the cells in the sample_id's of the analysis
          
          Outputs:
              writes to a new classy mask folder at f'{project}/classy_masks/{name of the original cell masks folder}_{identifier}',
              including the classified masks themselves in a sub-folder, and dataframes containing information about the classes of 
              the classy masks & their corresponding labels.


   .. py:method:: load_clustering(path: Union[str, pathlib.Path]) -> None

      Looks in the self.clusterings_dir for a filename that matches [choice] and loads it at a clustering

      Expects to find a csv file with the same format as exported by self.export_clustering. Attempts to confirm that the data is unchanged
      from when the clustering was originally exported. This includes dropped data, batch correction & scaling -- so be sure
      that these are the same when loading a clustering. 

      Args:
          choice (str): 
              the filename for the csv to be loaded, which should exist in the self.clusterings_dir folder

      Returns:
          None (modifies self)

      Inputs/Outputs:
          Inputs: 
              Reads from path, presuming path is the full filename of a .csv file created by self.export_clustering()


   .. py:method:: load_classification(cell_classifications: Union[pandas.DataFrame, pathlib.Path, str], column: str = 'labels') -> None

      Load a cell classification from the output data table of classy mask generation using a pixel classifier.

      Args:
          cell_classifications (pandas dataframe, or string / Path): 
              either a pandas dataframe, or the path to a csv file from which a pandas dataframe will be read. The dataframe must contain one of 
              two columns: "label" and/or "classification" with cell type labels and be equal in length with self.data.obs

          column (str): 
              "classification" load attempt to load pixel classification numbers, "labels" to try to load biological labels first


      Returns:
          None  (modifies self)

      Inputs/Outputs:
          Inputs: 
              if cell_classifications is not a pandas dataframe, attempts to read a .csv from the path specified by cell_classifications


.. py:class:: SpatialAnalysis

   This class serves as a coordinating class for the three spatial analysis sub-classes. In the GUI, these subclasses are currently called directly (for historical reasons
   and because there is no real reason to update that). However, for use of PalmettoBUG in scripting outside the GUI, it is convenient to have a unified class where all the 
   spatial methods can be accessed.

   The methods of this class are wrappers on methods of the 3 subclasses. Because of this, it can be divided into a number of groupings:

       >>> add data, cell maps, neighbors, neighborhoods, spaceANOVA, and edt    (in order)

   Args:
       (None) -- the key set-up steps in this class are called in the methods of this class, not when it is initialized

   Key Attributes:
       exp:
           This is the connected Analysis object, containing the anndata (exp.data) which holds most of the data used for calculations

       SpaceANOVA, neighbors, edt:
           These are subclasses that contain the actual methods used by this higher-level class. The methods of this class
           are all wrappers on methods contained in one of these sub-classes. 


   .. py:attribute:: edt


   .. py:attribute:: SpaceANOVA


   .. py:attribute:: neighbors


   .. py:method:: add_Analysis(Analysis) -> None

      Connects the Spatial methods here to a palmettobug.Anlysis object. Edits to the Analysis object will affect the Spatial methods here


   .. py:method:: plot_cell_maps(plot_type: str, id: Union[str, int, None] = None, clustering: str = 'merging') -> matplotlib.pyplot.figure

      This plot cell maps either as "points" or as "masks".

      Args:
          plot_type (string): 
              either "points" or "masks". If "points" then the cells are represented at dots or various sizes on a white background. If "masks" then 
              the cells are represented as their mask shapes on a black background ("masks" uses a squidpy plotting function). 

          id (string, integer, or None): 
              If None, will make cell maps for every sample_id / image in the dataset. If not None, will only make a plot for the specified image.
              id's should either be the sample_id (for which a string or integer form will work) or the file_name of the desired image.

          clustering (string): 
              the name of the column in self.exp.data.obs to use for coloring the cells. Usually one of the standard clustering column names 
              ("merging", "metaclustering", etc.)

      Returns:
          matplotlib.figure or None


   .. py:method:: do_neighbors(radius_or_neighbors: str, number: int) -> None

      Creates the neighbor-graph between cells in the dataset (using their centroids). This step is necessary before performing any of the other neighbor-based methods.
      It uses the squidpy.gr.spatial_neighbors function to generate the neighborhood graph.

      Args:
          radius_or_neighbors (string):
              whether to create the neighbor graph using a fixed radius ("Radius") or to create the neighbor graph using 
              a fixed number of neartest neighbors ("Neighbor")

          number (integer): 
              either the length of the search radius in pixels, or the number of nearest neighbors per cell (depending on radius_or_neighbors parameter)

      Returns:
          (None)


   .. py:method:: plot_neighbor_interactions(clustering: str = 'merging', facet_by: str = 'None', col_num: int = 1, filename: Union[str, None] = None) -> matplotlib.pyplot.figure

      This method wraps squidpy's gr/pl.interaction_matrix functions. It plots a heatmap representing the number of interactions between
      cell types in the dataset. Note that this is an absolute number, and is effected by the abundance of celltypes (more abundant celltypes will 
      have more interactions). 

      Args:
          clustering (string): 
              the name of the column in self.exp.data.obs to use for grouping cells into cell types. Usually one of the standard clustering column names 
              ("merging", "metaclustering", etc.). Their should be >1 unique cluster, the heatmap's dimensions with only one cluster would be 1x1

          facet_by (string): 
              a name of a column in self.exp.data.obs or "None". If not "None", used to interaction matrices for subsets of the data, for example to compare
              interaction matrices between conditions. The first panel is always the interaction matrix for the entire dataset (not subsetted).

          col_num (integer): 
              If facetting, how many columns to have in the figure.

          filename (string or None): 
              if not None, will write the plot as /{filename}.png to the spatial save folder of the palmettobug analysis directory. 

      Returns:
          matplotlib.figure


   .. py:method:: plot_neighbor_enrichment(clustering: str = 'merging', facet_by: str = 'None', col_num: int = 1, seed: int = 42, n_perms: int = 1000, filename: Union[None, str] = None) -> matplotlib.pyplot.figure

      This method wraps squidpy's gr/pl.neighborhood_enrichment functions. It plots a heatmap representing the enrichment of interactions over the 
      random expectation between cell types in the dataset. This is calculated by permutation test, and the values of the heatmap are z-scores
      between the interactions found in the permutation test and the empirical number of interactions.

      Args:
          clustering (string): 
              the name of the column in self.exp.data.obs to use for grouping cells into cell types. Usually one of the standard clustering column names 
              ("merging", "metaclustering", etc.). Their should be >1 unique cluster, the heatmap's dimensions with only one cluster would be 1x1

          facet_by (string): 
              a name of a column in self.exp.data.obs or "None". If not "None", used to interaction matrices for subsets of the data, for example to compare
              interaction matrices between conditions. The first panel is always the neighborhood enrichment for the entire dataset (not subsetted).

          col_num (integer): 
              If facetting, how many columns to have in the figure.

          seed (integer): 
              random seed for the permutation test

          n_perms (integer): 
              how many permutations to perform in the permutation test

          filename (string or None): 
              if not None, will write the plot as /{filename}.png to the spatial save folder of the palmettobug analysis directory. 

      Returns:
          matplotlib.figure 


   .. py:method:: plot_neighbor_centrality(clustering: str = 'merging', score: str = 'closeness_centrality', filename: Union[str, None] = None) -> matplotlib.pyplot.figure

      Wraps squidpy's gr/pl.centrality_scores functions. clustering corresponds to "cluster_key" in squidpy's API, and score corresponds to "score". 

      Args:
         clustering (string): 
              The cell type grouping ('merging', ' metaclustering', etc.) to plot centrality scores for

          score (string):
              The type of centrality score to plot: ['degree_centrality','closeness_centrality','average_clustering']

          filename (string or None):
              If not None, specifies the filename to save the plot under (as a PNG) in the standard /Spatial_plots folder of the analysis folder.


   .. py:method:: do_neighborhood_CNs(clustering: str = 'merging', leiden_or_flowsom: str = 'FlowSOM', seed: int = 42, resolution: float = 1.0, min_dist: float = 0.1, n_neighbors: int = 15, **kwargs) -> matplotlib.pyplot.figure

      This method uses a previously constructed neighbor graph and a cell clustering to identify the proportions of each cell type among the neighbors of 
      every cell, then runs an unsupervised clustering algorithm (FlowSOM or Leiden) to group the cells in "cellular neighborhoods" (CNs). This neighborhood 
      grouping is appended to self.exp.data.obs as a "CN" column, and can be used in all the same ways as any other annotation / clustering to generate plots, etc.
      Additionally, a figure is returned that is unique to the type of clustering performed -- if FlowSOM, a minimum spanning tree is returned while if Leiden then
      a UMAP is returned.

      Args:
          clustering (string): 
              the name of the column in self.exp.data.obs to use for grouping cells into cell types. Usually one of the standard clustering column names 
              ("merging", "metaclustering", etc.).

          leiden_or_flowsom (string): 
              "Leiden" or "FlowSOM" -- determines which of the unsupervised clsutering algorithms will be used to group the cells.

          seed (integer): 
              The random seed for the clustering algorithm

          resolution (float): 
              ONLY for Leiden clustering -- corresponds to the same parameter (resolution) in scanpy's tl.leiden function

          min_dist (float): 
              ONLY for Leiden clustering -- corresponds to the same parameter (min_dist) in scanpy's tl.umap function, which is 
              necessary for leiden clustering

          n_neighbors (integer): 
              ONLY for Ledien clustering -- corresponds to the same parameter in scanpy's pp.neighbors function, which is necessary 
              for leiden clustering

          **kwargs: 
              ONLY for FlowSOM -- these passed into the FlowSOM class (copied from saesys lab FlowSOM_Python repository). This allows key
              parameters like the number of trainig cycles (rlen), number of output clusters (n_clusters), and x/y dimensions (xdim and ydim)
              to be passed to the FlowSOM instance. 

      Returns:
          matplotlib.figure


   .. py:method:: plot_CN_graph(filename: Union[str, None] = None) -> matplotlib.pyplot.figure

      UMAP or star-plot -- note that this figure is already returned by the method above


   .. py:method:: plot_CN_heatmap(clustering: str = 'merging', **kwargs) -> matplotlib.pyplot.figure

      Plots a heatmap of the proportions of the cell types in each of the CN clusters


   .. py:method:: plot_CN_abundance(clustering: str, cols: int = 3) -> matplotlib.pyplot.figure

      Plots a facetted barplot of the proportion of each cell type in each of the CN clusters


   .. py:method:: estimate_SpaceANOVA_min_radii(with_empty_space: bool = True) -> int

      This uses information about the cell masks & images (such as perimeter, area, cell occupied bounding-box areas, etc.).

      If with_empty_space is True, will further adjust up the estimating minimum radii using the proportion of empty space in the cell-occupied regions of the images


   .. py:method:: do_SpaceANOVA_ripleys_stats(clustering: str, max: int = 100, min: int = 10, step: int = 1, condition1: Union[str, None] = None, condition2: Union[str, None] = None, threshold: int = 10, permutations: int = 0, seed: int = 42, center_on_zero: bool = False, silence_zero_warnings: bool = True, suppress_threshold_warnings: bool = False) -> None

      Calculates Ripley's spatial statistics for every celltype-celltype pair in clustering in every image. The necessary first step in the SpaceANOVA analysis
      pipeline.

      Args:
          clustering (string): 
              the name of the column in self.exp.data.obs to use for grouping cells into cell types. Usually one of the standard clustering column names 
              ("merging", "metaclustering", etc.).

          max / min / step (integers): 
              these are the integers that determine at which radii statistics will be calculated. An easy to see what those radii will be is

                  >>> list(range(min, max, step))

              The default looks at the range 10-100 with a step size of 1. The min is set to be > 0 because the first few radii should essentiall always have
              zero cell interactions in them. This is because we calculate using cell centroids, so even if two cells directly touch, the first few radii will 
              still not have an interaction. For example, if a perfectly circular cell has a diameter of 10, then only radii > 5 will even have a change of encountered 
              another cell since radii < 5 will only search space INSIDE the cell. And even once the earch radius is outside the cell, it does not count as an 
              interaction when it touches another cell -- it only is counted when it touches the centroid of the other cell. 
              Because of this, usually radii < 10 or so have almost no interaction and therefore very unusual behaviour, and might be best dropped from the calculations.
          
          condition1 / condition2 (string or None): 
              If both are None (default), then every condition in the dataset is used (and the fANOVAs are multi-comparison). Else if both
              condition1/2 are specificied, they should be unique values in the 'condition' columns of self.exp.data.obs, and the SpaceANOVA analysis will
              only look at those conditions, using pairwise comparisons / ANOVAs.

          threshold (integer): 
              default = 10. If at least of the celltypes in a given celltype-celltype pair has fewer cells than this threshold in a given image, 
              that image will be skipped & no Ripley statistics will be calculated from that image for that celltype-cetype pair. Note that is a given 
              celltype-celltype pair never passes this threshold for any of the images for a given condition, then it will be ignored for that condition. Further, 
              if only one condition (or no conditions) has images that pass the threshold, then an ANOVA for that celltype-celltype pair is impossible and will be 
              skipped. However, even then, the Ripley's statistics that were successfully calculated for that single condition can still be plotted.

          permutations (integer): 
              If greater than zero, than a permutation correction will be applied to the data. This is done by randomizing the celltype labels in an image
              and calculating the average Ripley's K for those randomizations. The average random K for the celltype is then substracted from the calculated Ripley's K 
              for that celltype in that image. This corrected K is than used to calculate Ripley's L / g as normal.
              Permutation correction can slow the calculation substantially, but is almost always recommended as it uses the actual strucutre of the cells in the 
              images to correct the values of the Ripley's statistics. This is a powerful and simple way to correct for holes / inhomogeneities in the tissue.

          seed (integer): 
              This is the random seed used for the SpaceANOVA methods. This includes the random permutations for the permutation correction, but
              also the seeds used plotting error regions and fANOVA. The seeds for plotting & fANOVA can be set separately when calling those functions
              but by default whatever you use here for the seed will be used for those steps as well. 

          center_on_zero (boolean): 
              ONLY with permutation correction (permutations > 0). This determines whether to 'center' Ripley's g on 0 or on 1.
              Ripley's g is unique among the Ripley's statistics in that it is particularly easy to interpret, as its theoretical value in a 
              random point pattern is equal to a straight line at 1, with values above 1 indicating more association between points that expected,
              and value less than 1 indicating less interactions that expected. However, the permutation correction shifts this centerpoint to 1 when 
              the permutation is substracted from the calculated K. Additionally, when this substract is done the shape of K / L will deviate strongly
              from the theoretical shape of those statistic's curves.
              So: 
              If this parameter is True, then this shift is allowed to occur, and g will need to be interpreted as centered on 0.

                  >>> Permutation correction is: K = K_data - K_permutation

              If this paramter is False, then after substracting the permutaiton K, the theoretical K { pi*(r^2) } is added back, which
              shifts the center of g to 1 without changing its shape. This change also restores the shape of the K / L statistics to better match  
              their more usual, monotonically increasing shape

                  >>> Permutation correction is: K = K_data + (K_theoretical - K_permutation)

          silence_zero_warnings (boolean):
              this method generates a large number of zero division errors, even in a normal run. By default these are silenced. 

          suppress_threshold_warnings (boolean):
              If True, will not print warnings about images failing to meet cell number thresholds


   .. py:method:: plot_spaceANOVA_function(stat: str, comparison: str = None, seed: Union[int, None] = None, f_stat: Union[str, None] = None, hline: Union[int, None] = None, output_directory: Union[str, None] = None)

      This function plots a selected Ripley's statistic for a celltype-celltype pair, and optionally also the signle-radii f-values from ANOVA tests
      conducted at each point along the Ripley stat graph.

      Args:
          stat (string): 
              either "K", "L", or "g". Determines which Ripley's statistic to plot

          comparison (string or None): 
              a string with the form {celltype1}___{celltype2}. The triple underscore in the middle is how this string is split
              into the two cell types of interest (don't have a triple underscore inside your cell type labels!). A full list of the available comparisons of 
              this form can also be easily accessed with the self.SpaceANOVA._all_comparison_list attribute

          seed (integer or None): 
              the random seed for the fANOVA function. If None, then the seed previously selected in self.do_SpaceANOVA_ripleys_stats will be used.

          f_stat (string or None): 
              if not None, should be "f" (typical), "padj", or "p". If not None, adds a panel to the final plot showing the results of
              (standard, not functional) ANOVA tests comparing conditions at every individual radii. This is useful for visualizing at what distance the 
              difference between conditions is most significant. 

          hline (int or None): 
              if not None, draws a horizontal line on the Ripley's statistics plot at the value (usually ONLY when plotting the 'g' statistic, and 
              set to 0 or 1, depending on where the graph is centered)

          output_directory (string or None): 
              If None, the plots are exported to the automatic / standard directory in the PalmettoBUG project. if not None, should the path to a folder where
              the plots can be exported. (ONLY used if comparison is None)


   .. py:method:: run_SpaceANOVA_statistics(stat: str = 'g', seed: Union[int, None] = None)

      This runs the functional ANOVA on the available Ripley's statistics, returning 3 datatables for the (adjusted) p-value, and fANOVA stat


   .. py:method:: plot_spaceANOVA_heatmap(stat: str, filename: Union[None, str] = None) -> matplotlib.pyplot.figure

      Plots a heatmap from one of the dataframes returned / created by self.run_SpaceANOVA_statistics. If plotting a (adjusted) p-value, as is typical, the 
      statistic is transformed by the negative log first so that high number indicate higher significance.

      stat = 'p', 'padj', or 'f'


   .. py:method:: do_edt(pixel_classifier_folder: str, masks_folder: str, maps: str = '/classification_maps', smoothing: int = 10, stat: str = 'mean', normalized: bool = True, background: bool = False, marker_class: Union[str, None] = 'spatial_edt', auto_panel: bool = True, output_edt_folder: Union[None, str, pathlib.Path] = None, save_path: Union[None, str, pathlib.Path] = None)

      Calculates the euclidean distance between cell masks (provided in masks_folder) and matching pixel classifications (pixel_classifier_folder).
      This appends the calculated edt for each cell to self.exp.data, as a new 'channel' / 'antigen', where it can then be used for plotting & calculations.

      Args:
          pixel_classifier_folder (str): 
              the path to a PalmettoBUG-generated pixel classifier folder. This folder needs to contain a subfolder of .tiffs
              containing the pixel class predictions (see maps argument), and contain a biological_labels.csv indicating what the biological
              names of each class in the classifier are.
              When a pixel classifier with > 1 predicted pixel class is used, an edt statistic will be calculated separately for each and added to
              self.exp.data

          masks_folder (str): 
              the path to a folder containing a set of .tiff files of cell segmentation masks. These .tiffs should match those in 
              f'{pixel_classifier_folder}/{maps}'.

          maps (str): 
              should be either "/classification_maps" or "/merged_classification_maps". This determines which subfolder from pixel_classifier_folder
              that contains the .tiff files of the pixel classification maps to usea for calculating the distance from. The filenames of these .tiffs
              should match those in masks_folder

          smoothing (int): 
              If == 0, no smoothing is performed. Otherwise this indicates the size of isolated pixel class regions to smooth out before
              calculating distances. As in, if smoothing == 10 (default) regions of a pixel class smaller than 10 pixels will be dropped and "smoothed"
              into the surrounding pixel classes using mode-based fill-in (the mode of the remaining, closest neighbor pixels will be used to assign the replacement 
              value for dropped pixels). 
              Why smooth? When calculating the distance form a pixel class using the Euclidean Distance Transform (EDT), very small pixel regions can have an outsized
              impact on the final EDT map, so removing spurious / small regions can help clean up the final calculation.

          stat (str): 
              One of "mean", "median", or "minimum". This determines what statistic is read off of each segmentation region when calculating the edt value
              for each cell. The default, "mean", is the most common use, and represents the average distance from the pixel class across the whole cell's spatial
              footprint. "min" means that the calculated value for each cell with just be its minimum distance to the class of interest (its closest point).
              Of note, when "min" is the selected statistic, normalization cannot be performed (see f0ollowing argument). 

          normalized (bool): 
              Whether to normalize (True) or not (False). In this case, normalization means dividing each cell's determined edt value by the average of the 
              image that cell is in. As in, if "median" statistic is selected with normalized = True, each cell's edt value will be:

                  cell_stat = median(cell_edt) / median(image_edt)

              instead of the non-normalized value:

                  cell_stat = median(cell_edt)

              When "mean" is used as a statistic, then the normalization factor is the mean of the image's edt values. Normalization cannot be calculated this way
              with a statistic is "min", and so is ignored (there is no normalization for "min").

              Normalization is useful as it is a way to help correct for the abundance of the pixel class. As in, if one image is 70% within a pixel class (lets say
              the pixel class is for fibrotic regions) its edt distances will be much lower across the image than an image where only 30% of the image is the fibrotic class. 
              This would make all cells - regardless of cell type - in image 1 have much lower edt values than the cells in image 2, which would not be an inaccurate 
              conclusion (the cells in image 1 are genuinely closer to fibrotic regions), but introduces a confounding factor: have the cells moved towards the fibrosis, or 
              has the fibrotic regions expanded?
              Normalization helps address this, in part, as it takes into account the total quantity & positioning of the pixel class in each image when calculating
              the edt value for each cell.

          background (bool): 
              whether to include the 'background' class in the edt calculations (True) or not (False, default). Usually, the background class is ignored,
              because it is not biologically relevant, but in some situations there is effectively no background class / even the background is biologically meaningful.
              For example, if a classifier is trained to identify broad tissue regions in a sample (such as intestinal crypt lumen / epithelia / lamina propia in a 
              colon section), it might be the case that every part of the image falls into one of the pixel classes and there is no 'background'. 
              Because of how supervised classifier's are trained in PalmettoBUG, a background class is always created so this option is useful if that 'background'
              is actually a relevant grouping.

          marker_class (str or None): 
              what marker_class (self.exp.data.var['antigen']) to assign the edt columns when they are added to self.exp.data.X. By default,
              this is "spatial_edt", as it helps the subsequent plottting functions easily find the edt channels while ignoring the other marker_classes.
              However, setting this to "type" / "state" / "none" is also allowed, if you want to perform plots / calculations that combine both the spatial edt
              data and other channels. 

          auto_panel (bool): 
              Whether to automatically add the channels to self.exp.data (True, default), or only to return a panel dataframe (for manually editing
              marker_class, say if you want to assign different edt statistics from a single classifier to different marker_class-es). If False, you will 
              need to manually add the edt data to self.exp.data with self.edt.append_distance_transform(distances_panel = {your edited marker_class panel}).

          output_edt_folder (None, string, or Path): 
              Default is None (no edt map export). If not None, then should be the path where folders of .tiff files can be
              exported. Specifically, the folders used will be the f'{output_edt_folder}_{class_biological_label}' for each class in the classifier.
              The saved .tiff files will be the intermediate Euclidean distances transforms for that class, from which the stats for each cell mask were calculated.

          save_path (None, string, or Path): 
              Default is None (edt valuesa re not saved). If not None, then should be a file path where a csv file can be written.
              This csv will contain the information for all the edt's calculated for the provided pixel classifier / masks pairing. 
              Note that marker_class information will not be saved (that will need to be set again on re-load of the saved edt values)

      Returns:
          a pandas dataframe (panel) which can be used to see or set the marker_class information

          a pandas dataframe (self.edt.results) which contains the edt calculations for every cell and pixel class


   .. py:method:: do_reload_edt(dataframe: pandas.DataFrame, marker_class: str) -> None

      Loads a column of data into the anndata of the experiment (meant for saved edt information, but could be used for any type of channel)

      Args:
          dataframe (pandas DataFrame, or Path/string):
              If not a pandas DataFrame, will attempt to pandas.read_csv(dataframe) first. This dataframe should be as long as the number of cells in the data
              and have columns representing spatial_edt data for each cell. Its format should match the format of table exported by do_edt


   .. py:method:: plot_edt_heatmap(groupby_col: str, marker_class: str = 'spatial_edt', filename: Union[None, str] = None) -> matplotlib.pyplot.figure

      Plots a heatmap for spatial edt -- default marker_class is spatial_edt, and export folder (if filename is provided) is in /Spatial_plots

      groupby_col specifies a clustering (such as 'merging', 'metaclustering', etc.) to group the heatmap by


   .. py:method:: plot_edt_boxplot(var_column: str, groupby_col: str = 'merging', facet_col: str = 'condition', col_num: int = 3, filename: str = '') -> matplotlib.pyplot.figure

      Plots a channel on a horizontal boxplot. Could be used for non-spatial_edt data, but export folder (if a filename is provided)
      is in /Spatial_plots.

      Args:
          var_column (str): 
              the channel to use for the plots. Usually a spatial edt channel (like 'distance to Vimentin'), but could be
              any of the channels in self.exp.data.var['antigen']

          groupby_col (str): 
              the column in self.exp.data.obs that will be used to gruop the box plot (one box per unique value in this column, per facet)

          facet_col (str): 
              the column in self.exp.data.obs that will be used to split the data into two boxplots. Usually (and by default) facetted on 
              the condition column, for comparison of treatment vs. control

          col_num (int): 
              the number of columns of facets before they begin to wrap. As in, with the default of col_num = 3, then the fourth facet
              will be on the second row of the facet grid.

          filename (str): 
              the name to the save a .png file of the plot unde rin /Spatial_plots. If not provided (default value), the plot will not be 
              written to the disk.

      Returns:
          matplotlib.figure (the boxplot)


   .. py:method:: run_edt_statistics(groupby_column: str, marker_class: str = 'spatial_edt', N_column: str = 'sample_id', statistic: str = 'mean', test: str = 'anova', filename: Union[str, None] = None) -> pandas.DataFrame

      A wrapper on do_state_exprs from the palmettobug.Analysis class, but with the default marker_class of 'spatial_edt', and a output folder
      in /Spatial_plots.

      Args:
          groupby_column (string): 
              a clustering of the data (such as 'leiden', 'merging', etc.)

          marker_class (string):
              'spatial_edt' (default), 'type','state','none','All' -- if specifying any marker_class except 'spatial_edt', then there is little reason to use
              this function, as you could just use palmettobug.Analysis.do_state_exprs

          statistic (string):
              'mean' or 'median' -- which aggregation method to use when calculating the average value in each ROI / sample_id

          test (string):
              'anova' or 'kruskal' -- whether to use an ANOVA or a Kruskal-Wallis test to do the stats

          filename (string or None): 
              If not None, specifies the filename to save the statistics table under (as a CSV) in the /Spatial_plots folder.

      Returns:
          a pandas dataframe, containing the statistics


.. py:class:: TableLaunch(dataframe: Union[str, pathlib.Path, pandas.DataFrame], export_path: Union[str, pathlib.Path], table_type: str = 'other', labels_editable: bool = True, width: int = 1, height: int = 1)

   Bases: :py:obj:`customtkinter.CTk`


   This class launches a customtkinter GUI window, in order to display the contents of a table. 
   It can make setting up the panel/metadata/Analysis_panel files easier / faster, instead of manually setting values in python or 
   requiring the user to go to a second program (like excel) to enter values.

   Args:
           dataframe (Path, string, or pandas dataframe): 
               either the dataframe, or the path to a .csv file, which will be displayed in the window

           export_path (Path or string): 
               the path to where you want to write the dataframe when the "Accept and Return" Button is pressed & the window closes. Remeber to include the .csv file extension!

           table_type (str): 
               The type of the table being passed in. Helps determine the format of the display, one of ['panel','metadata','Analysis_panel','Regionprops_panel','other']. 
               'other' is the default, and will display all entries in the dataframe as either uneditable labels or as entry fields that 
               allow any kind of text editing depending on whether labels_editable is False or True. 

               The specific table types, like 'metadata', etc. expect a particular format for the dataframe. For example, selecting a 
               table_type of 'panel' means 4 data columns will be generated >> the first two are label/entry columns, the next column of 
               drop-down option widgets with 0 / 1 as the choices and the last column of data will be a drop down with options of 
               "Nuclei (1)" and "Cytoplasmic / Membrane (2)". This is to replicate the expectation of the panel file, but theoretically you could 
               use 'other' for all table types and simply use the entry fields to have full control of the final table. 
               Note that if you use 'other' instead of 'panel' in the example above, you should enter numbers for nuclei (1) and cytoplasmic (2) 
               segmentation channels, NOT the strings "Nuclei (1)" or "Cytoplasmic / Membrane (2)", as the table_type == "panel" also auto-converts
               between the human-friendly strings and the computer friendly integers (1 or 2). 

           labels_editable (bool): 
               whether fields are displayed as CTklabels (uneditable, if False) or CTkEntry (editable, if True) when those fields are otherwise specified by the table_type. 

           width / height (both ints): 
               determine the shape & size of the window/table when launched. This operates by an estimation of the needed table height/width to accommodate the widgets
               multiplied by height/width. (Default = 1 for both means the initial estimate is also the final shape)

       Input / Output:
           Input: 
               if dataframe is a file path, not a pandas dataframe, wil attempt to read a csv at that filepath
           Output: 
               when the accept button is pressed, will attempt to write the dataframe set up in the GUI to export_path as a csv file.


   .. py:attribute:: tablewidget


   .. py:attribute:: column
      :value: 1


   .. py:attribute:: table_list


   .. py:attribute:: accept_button


   .. py:method:: accept_and_return() -> None


.. py:function:: run_napari(image: numpy.ndarray[Union[int, float]], masks: Union[None, numpy.ndarray[int]] = None, channel_axis: Union[None, int] = None) -> None

   This function launches napari with the provided image & mask. The mask is shown as a layer over the image.

   Args:
       image (numpy array): 
           the numpy array to display in napari (3D -- X, Y, and channels / Z)

       masks (numpy array (of integers) or None): 
           if None, only the image is displayed. If not None, then this is added as a label layer in Napari
           Should have the same X,Y dimensions as the image, but only one channel layer

       channel_axis (None, or integer): 
           passed to the napari.imshow() function, determines which dimension of image numpy array will be treated as the channels
           By default, Napari treats the channel axis as a spatial dimension (it expects 3D imaging), which is fine as I think this allows easier scrolling through the 
           channels. However, if you prefer Napari to open separate layers for each channel, then specify the channel axis with this parameter and Napari should
           do that instead of treating the channels like an extra spatial dimension. 


.. py:function:: print_license()

.. py:function:: print_3rd_party_license_info()