palmettobug.Pixel_Classification.Classifiers
============================================

.. py:module:: palmettobug.Pixel_Classification.Classifiers

.. autoapi-nested-parse::

   This module handles the back-end for pixel classifier creation, training, and prediction (as well as segmentation from a pixel classifier)

   This also means that the functions/classes here are part of the public (non-GUI) API of PalmettoBUG.
   Predominantly the SupervisedClassifier and the UnsupervisedClassifier classes handle the creation, etc. of pixel classifiers.

       >> SupervisedClassifier is a QuPath ANN_MLP pixel classifier mimic, & requires training from user-generating labels in Napari

       >> UnsupervisedClassifier is for an ark-analysis/Pixie-like unsupervised classifier based on FlowSOM clustering.


Classes
-------

.. autoapisummary::

   palmettobug.Pixel_Classification.Classifiers.SupervisedClassifier
   palmettobug.Pixel_Classification.Classifiers.UnsupervisedClassifier


Functions
---------

.. autoapisummary::

   palmettobug.Pixel_Classification.Classifiers.plot_class_centers
   palmettobug.Pixel_Classification.Classifiers.plot_pixel_heatmap
   palmettobug.Pixel_Classification.Classifiers.segment_class_map_folder


Module Contents
---------------

.. py:class:: SupervisedClassifier(homedir: Union[pathlib.Path, str])

   This class handles the supervised pixel classifier creation, training, and prediction. It is mainly set up by the 
   setup_classifier method, not by the __init__ call.

   Args:
       homedir (str or Path):
           the path to the directory where the Pixel Classification folder and subfolder will be placed 
           (this is the main directory for PalmettoBUG)

   Key Attributes:
       classifier_path (str): 
           the full file path to the .json file containing the training opencv ANN_MLP classifier

       classifier_dir (str): 
           the path to the folder where the classifier will be setup (== {self._homedir}/Pixel_Classification/{self.classifier_name}/ )

       classifier_training_labels (str): 
           The path to the folder where the classifier training labels are (by default) expected to be written to / read from

       output_directory (str): 
           the path to the folder where the classifier predictions will be exported by default.

       classifier_name (str): 
           The name of the classifier, used for the folder name where the classifier is set up, and to help name the .json files containing
           the trained classifier & its details. Derived from the main, opencv2 .json file name & includes its .json file extension. 

       algorithm (cv2.ml.ANN_MLP): 
           the opencv2 ANN_MLP classifier instance

       details_dict (dictionary): 
           the dictionary containing details of the classifier not available inside the opencv2 .json file 
           This dictionary is saved to a .json file parallel to the opencv2 formatted .json, with "_details" appended to its filename
           Information in the this dictionary are things like the channels, sigmas, & features selected. 

   Formerly PxQuPy (Pixel QuPath Python) -- may likely will remain residuals of that naming in class-internal / GUI-internal namespace


   .. py:attribute:: _homedir
      :value: ''


   .. py:attribute:: classifier_path
      :value: None


   .. py:attribute:: algorithm
      :value: None


   .. py:attribute:: details_dict


   .. py:attribute:: _image_name
      :value: ''


   .. py:method:: _setup_directory() -> None

      helper for __init__, checks and sets up pixel classification folder that contains individual classifier subfolders


   .. py:method:: _setup_classifier_directory(classifier_name: Union[str, None] = None, classifier_path: Union[str, None] = None) -> None

      Sets up the classifier folder (subfolder of /Pixel_Classification) for an individual supervised classifier.

      If classifier_path is provided, then attempts to load the .josn at the provided classifier_path and derives the classifier name 
      from the path. When providing classifier_path, the loaded classifier can then be used (or this should be the case) immediately to
      predict. THIS TAKES PRECEDENCE over classifier_name.

      Alternatively, the classifier_name is provided and classifier_path is None --> This creates a new, empty classifier folder.

      In practice, may be more a helper method for the setup_classifier method


   .. py:method:: setup_classifier(classifier_name: str, number_of_classes: int, sigma_list: list[float], features_list: list[str], channel_dictionary: dict[str:int], classes_dictionary: dict[int:str] = {}, image_directory: str = '', categorical: bool = True, internal_architecture: list[int] = [], epsilon: float = 0.01, iterations: int = 1000) -> tuple[cv2.ml.ANN_MLP, dict]

      This method takes in a variety of user inputs, and creates the initial pixel classifier directory and .json files, ready for training.

      Args:
          classifier_name (str):  
              the name of the classifier

          number_of_classes (int): 
              the number of classes being predicted by the classifier 

          sigma_list (list of numeric): 
              list of the numeric values of the sigmas to be used in the creation of features for the classifier. 
                  
              Example: [1.0, 2.0, 4.0]

          features_list (list of strings): 
              list of the features to be generated from the image and to be fed into the classifier. 
              Possible features = ["GAUSSIAN", "LAPLACIAN", "WEIGHTED_STD_DEV", "GRADIENT_MAGNITUDE", 
              "STRUCTURE_TENSOR_EIGENVALUE_MAX", "STRUCTURE_TENSOR_EIGENVALUE_MIN", "STRUCTURE_TENSOR_COHERENCE", 
              "HESSIAN_DETERMINANT", "HESSIAN_EIGENVALUE_MAX",  "HESSIAN_EIGENVALUE_MIN"]

          channel_dictionary (dict): 
              a dictionary with keys of the channels' common names (str) and values of the channels' location in the image (int). 

                  Example: {'channel_1_name':1, 'channel_10_name':10, 'channel_3_name':3, ...}

              Use this to specify the channels you want used, and to record in the classifier's .json file what antigen each 
              channel represents
          
          classes_dictionary (dict): 
              a dictionary with keys (int) that correspond to the integer labels in the label / prediction files. 
              As in, these are the label numbers used in napari for a given class. The dictionary values (str) correspond to the 
              description the user want for each class. 

                      Example == {1:"Astrocyte",2:"Neuron",3:"Background", ...}

              These are used to retrieve the biologically important information after the classification is complete. 
              It is not needed for the classification steps themselves. Currently has a default value of {}, but that may change 
              in the future to enforce a choice of labels on the part of the user.

          image_directory (str, optional): 
              The path to the folder containing the images you plan to train / predict pixel classes with.

              NOTE: this is optional / for the user's benefit & for reproducibility, as it is saved in the .json file for 
              retrieval of what images were used, but it is NOT ENFORCED. 
              As in, you can train / predict on totally different images (although that is likely a bad idea unless you are keeping thorough records!)

          categorical (bool): 
              if True, then the classifier is set to return only the category output. If False, then it will 
              return probabilities for each category instead of the final decision

          internal_architecture (list of integers or list[None]): 
              the sizes of any internal neuron layers you wish to add to the ANN_MLP.

          epsilon (float): 
              a learning rate parameter of the ANN_MLP training

          iterations (integer): 
              the number of iterations during ANN_MLP training. 

      Returns:
          cv.ml.ANN_MLP object: 
              an opencv ANN_MLP instance that is ready to train on data in the shape described by sigma/features/channels/classes information

          dictionary: 
              the information that is saved in classifier_X_details.json, holding the key details for properly deriving the image fetaures, etc. for training and prediction

      Inputs / Outputs: 
          Outputs: 
              saves the classifier as classifier_X.json that can be easily imported by an opencv ANN_MLP object
              (this save is done within the _initialize_classifier_dict_and_ANN_MLP() function)

              saves classifier_X_details.json with the following information: channel_dictionary, sigma list, features list.
              These details are saved separately, as they can mess with the simple import of the classifier into the opencv ANN_MLP


   .. py:method:: _write_biolables_csv() -> None

      This helper method writes the biolabel.csv file using the details dictionary.


   .. py:method:: _initialize_classifier_dict_and_ANN_MLP(number_of_channels: int, classifier_name: str = 'classifier_test.json', number_of_classes: int = 2, internal_architecture: list[int] = [], iterations: int = 1000, epsilon: float = 0.01) -> cv2.ml.ANN_MLP

      This helper method write the dictionary of the main .json file containing information like the classifier neuron weights. 
      See self.setup_classifier method for more details on arguments


   .. py:method:: load_saved_classifier(classifier_json_path: Union[pathlib.Path, str]) -> None

      This is an alternate way to use this class -- instead creating a new classifier, load an old one.

      It loads a saved pixel classifier using its main classifier.json and classifier_details.json files 
      The classifier_json_path is the full path, including filename + file extension, to the classifier.json file. 
      The classifier_details.json is expected to be found in the SAME FOLDER with it. 


   .. py:method:: launch_Napari_px(image_path: Union[pathlib.Path, str], display_all_channels: bool = False) -> None

      This launches napari for generating training labels, receving a path (image_path) to the image file you want to make labels for


   .. py:method:: write_from_Napari(output_folder: Union[str, None] = None) -> None

      This saves the training labels to the training labels folder. Will only run if labels have previously been made 
      & this method has not been run already (as this method clears the labels after saving them to the disk)

      Args:
          output_folder (str, Path, or None): What folder to write the training labels to (must exist). If None, will use the default location
          for a Pixel Classifier, same as used in the GUI. 


   .. py:method:: train_folder(image_folder: Union[pathlib.Path, str], labels_dir: Union[str, pathlib.Path, None] = None) -> cv2.ml.ANN_MLP

      This function trains the ANN_MLP classifier using the training labels in the classifier's directory & the corresponding images 
      in the provided image_folder.

      Note: the images in image_folder must have matching names with the training label files. It is fine if training labels does not 
      have every image in image_folder but it is not acceptable vice versa (training labels without a corresponding image in image_folder).

      Training parameters are previously determined when the classifier was set up with self.setup_classifier.

      Features are generated for each image one-by-one, and their pixels inside label layers are collected -- Then the training is performed 
      on those collected pixels together. 

      NOTE: If run on an already-trained classifier, then it is training with initial weights equal to the weights from the prior training (but 
      that probably does not make much of a difference)

      Args:
          image_folder (str / Pathlike): 
              The path to the folder where the .tiff files to predict pixel classes for reside. 

          labels_dir (str / Pathlike, None): 
              The path to the folder where the .tiff files containing the training label information reside. If None (default) will use self.classifier_training_labels. 
              There must be ONLY .tiff files in this folder & then names of the files in this folder MUST match with names of .tiff files in the image_folder. 

      Inputs / Outputs:
          Inputs: 
              reads .tiff files from image_folder / labels_dir (self.classifier_training_labels if labels_dir is None)

          Outputs: 
              writes self.details_dict to the {name}_details.json file


   .. py:method:: predict(image: numpy.ndarray, image_name: str, output_folder: Union[str, None] = None) -> numpy.ndarray[int]

      This runs the provided QuPath classifier on an image (as a numpy array). Currently only limited QuPath classifiers are supported 
      (ANN_MLP only, not local normalization, etc.).

      Args:
          image (numpy array): 
              a numpy array representing the image to be analyzed

          image_name (str): 
              the file name of the image being analyzed, important for properly naming the output mask. Should include the file extension (usually .tiff). 

          output_folder (str, or default = None): 
              if not None, should be a valid directory (as a str) to write the pixel classification file into. If None, will instead write the file into 
              self.output_directory folder

      Returns:
          numpy array: 
              the pixel classification or probability predictions from the classifier. Dimensions match the spatial dimensions of the image.

      Inputs / Outputs:
          Outputs: 
              a pixel classification predict map exported as a .tiff file to self.output_directory/{image_name} or to output_folder/{image_name}, if output_folder is not None.


   .. py:method:: predict_folder(img_folder: Union[pathlib.Path, str], output_folder: Union[str, None] = None) -> None

      This runs the provided PxQuPy classifier on the images in the provided directory, exporting images (calls self.predict for each image in the img_folder).

      Args:
          img_folder (str / Pathlike): 
              the path to a folder containing the images to generate pixel classifications from

          output_folder (str, or default = None): 
              if not None, should be a valid directory (as a str) to write the pixel classifications into. If None, will instead write the files into 
              the self.output_directory folder

      Inputs / Outputs:
          Outputs: 
              pixel classification predict maps exported as .tiff files to self.output_directory/ or to output_folder/, if output_folder is not None.


.. py:class:: UnsupervisedClassifier(homedir: Union[str, pathlib.Path], classifier_name: str, panel: Union[None, pandas.DataFrame] = None, classifier_dictionary: dict = {})

   This class coordinates the creation, training, and prediction from an unsupervised pixel classifier.

   Args:
       homedir (str or Path):
           The PalmettoBUG project directory

       classifier_name (str):
           The name of the pixel classifier being made -- will determine a number of file and folder names.

       panel (pandas DataFrame, or None):
           This data frame is unique to unsupervised Classifiers, and can be added later by setting this classes' panel attribute. See description
           in Key Attributes

       classifier_dictionary (dictionary, or None):
           Training details of the classifier -- not needed if constructing the classifier using the setup_and_train method.
           However, can be used to semi-reload an unsupervised classifier, by reading the _details.json file and supplying the resulting dictionary
           to this argument. 

   Key Attributes:
       panel (pandas dataframe): 
           This panel is unique to unsupervised classifier. It has an 'antigen' column listing all the kept (keep == 1 from the main panel.csv) 
           antigens in the dataset, and then its own 'keep' column to indicate which channels to be used in the classifier training (1 = use, 0 = don't use).
           After these two columns, there are a series of columns whose names correspond to various transformations of the data, such as "HESSIAN_MIN", etc. 

               possible_additional_transformations = ['GRAD_MAG', 
                                                      'HESSIAN_DET', 
                                                      'HESSIAN_MAX', 
                                                      'HESSIAN_MIN', 
                                                      'LAPLACIAN', 
                                                      'STRUCT_CO', 
                                                      'STRUCT_MAX', 
                                                      'STRUCT_MIN', 
                                                      'WGT_STDV']  

           In these columns, the values can either be 0 (meaning don't use this transformation for this channel) or 1, which indicates which transformations
           to use of which channels in the training of the classifier. This allows you to use transformations for certain channels and not for others.

       training_dictionary (dict): 
           contains infomration on the training of the classifier for reproducibility. Gets exported to the disk as a _details.json for future reference.

       flowsom_dictionary (dict): 
           contains information on the trained FlowSOM classifier used in the process of prediciton, including the 
           flowsom.FlowSOM instance itself. It is not saved to the disk as .json (I don't know for certain, but I expect a flowsom.FlowSOM 
           object may not able to be written to the disk so easily. It is fundamentally a neural network, so there should be some way to do it.). 

       classifier_dir (str):
           The directory to the folder where the pixel classifier folder is to be setup

       output_dir (str): 
           The directory to the folder where the pixel classifier will output predictions by default. It is a sub-folder of classifier_dir. 

   The concept for this type of classifier was inspired from the Pixie / Ark-analysis::

           https://github.com/angelolab/ark-analysis?tab=MIT-1-ov-file

   Pixie is licensed under the MIT license

   However, this implementation is essentially fresh, although preserving the key steps 
           (0.999% channel normalization --> FlowSOM --> normalization within pixels) 

   but not the rest of the Pixie code.
   The only part that I'm aware of that borrows more directly from the original Pixie is how the 0.999% channel normalization numbers are 
   aggregated & averaged for all images, instead on an image-by-image basis, as I had originally done. I'm not sure which is better, either.

   This implementation also includes new capacities compared to Pixie, such as the ability to generate QuPath-like features (hessians, laplacians, etc.) as input 
   channels for the FlowSOM, although it is uncertain how useful these additional features are.


   .. py:attribute:: _homedir
      :value: ''


   .. py:attribute:: classifier_name


   .. py:attribute:: panel
      :value: None


   .. py:attribute:: flowsom_dictionary


   .. py:method:: _setup_classifier_directory(classifier_name: Union[None, str] = None) -> None

      Sets up individual classifier's folder 


   .. py:method:: setup_and_train(img_directory: Union[pathlib.Path, str], sigma: float = 1.0, size: int = 500000, seed: int = 1234, n_clusters: int = 20, xdim: int = 15, ydim: int = 15, rlen: int = 50, smoothing: int = 0, suppress_zero_division_warnings=False, quantile: float = 0.999) -> tuple[dict, dict]

      This function performs all the steps required to train an initialized unsupervised classifier.

      Args:
          img_directory (string or Path): 
              The path to a folder containing the .tiff images to train (and presumably predict) from

          sigma (float): 
              sets the extent of Gaussian blurring used to generate features for training 

          size (integer):  
              The number of pixels to sample from the iamges to form the training dataset

          seed (integer): 
              seed for the non-deterministic FlowSOM algorithm

          n_clusters (integer): 
              The number of metaclusters for the FlowSOM algorithm to return. 

          xdim / ydim (integer / integer):  
              The X / Y dimensions of the FlowSOM self-organizing map. xdim * ydim is how many initial 
              points are in the SOM (and so, how many clusters are predicted before merging down to n_clusters)

          rlen (integer): 
              The number of training iterations for the SOM

          additional_features (boolean): 
              whether there are additional features beyond only the gaussian blurred channels. If False, can run faster by skipping unneeded steps.

          smoothing (integer > 0): 
              Whether & how much to smooth the pixel classification made by the FlowSOM. If smoothing = 0, no 
              smoothing is applied. Otherwise, smoothing argument is used as the threshold for the smooth_isolated_pixels() 
              function which removes isolated pixel classifications from a pixel classification map. 
              Saved in the training dictionary, but not applied during training -- only applied later after prediction.

      Returns:
          dictionary: contains the trained flowSOM instance itself, useful for classification of  images

          dictionary:  contains the training parameters, useful for reproducibility / providing a record of how the classifier was trained. 

      Inputs / Outputs: 
          Outputs: 
              in the process of setting up the dictionaries, this method writes 2 .json files to the self.classifier_dir folder


   .. py:method:: predict(image_name: str, img_directory: Union[pathlib.Path, str], flowsom_dictionary: Union[None, dict] = None, output_folder: Union[pathlib.Path, str, None] = None) -> None

      Predicts the pixel classes of a single image

      Args:
          image_name (string):
              A string with the name of the image in the img_directory to make the prediction for.
              You can easily find a list of the possible options for this argument using os.listdir(img_directory)

          img_directory (Path or string):
              the folder of image to predict classes for

          flowsom_dictionary (dictionary or None):
              The dictionary containing the flowsom.FlowSOM instance, as well as the training details of the classifier, which allow it to predict.
              If None, will try to use self.flowsom_dictionary

          output_folder (Path, str or None):
              the folder where the pixel classification predictions will be written. Must already exist or be create-able by os.mkdir()   
              If None, will attempt to writ to self.output_dir (default is 'classification_maps' inside the pixel classifier's directory)

      I / O:
          Inputs:
              read a file from f'{img_directory}/{image_name}'. This file should be a .tiff file with the same number of channels (in the same order) as the 
              .tiff files that the Unsupervised classifier was trained on. Usually, it is the same folder & images for training and prediction.

          Outputs:
              writes a single 2 dimensional, single-channel .tiff file to f'{output_folder}/{image_name}' containing the pixel class predictions.


   .. py:method:: predict_folder(img_directory: Union[pathlib.Path, str], flowsom_dictionary: Union[None, dict] = None, output_folder: Union[pathlib.Path, str, None] = None) -> None

      Applies self.predict method to every image in a supplied folder 

      Args:
          img_directory (Path or string):
              the folder of images to predict classes for. Every .tiff in this folder will have a prediction written for it.

          flowsom_dictionary (dictionary or None):
              The dictionary containing the flowsom.FlowSOM instance, as well as the training details of the classifier, which allow it to predict.
              If None, will try to use self.flowsom_dictionary

          output_folder (Path, string, or None):
              the folder where the pixel classification predictions will be written. Must already exist or be create-able by os.mkdir()

      I / O:
          Inputs:
              reads all the files from f'{img_directory}/'. This file should be a .tiff file with the same number of channels (in the same order) as the 
              .tiff files that the Unsupervised classifier was trained on. Usually, it is the same folder & images for training and prediction.

          Outputs:
              writes 2 dimensional, single-channel .tiff files to f'{output_folder}/' containing the pixel class predictions.


.. py:function:: plot_class_centers(flowsom: flowsom.FlowSOM, **kwargs) -> tuple[matplotlib.pyplot.figure, pandas.DataFrame]

   This plots the heatmap of the centroids of the metaclusters of a flowsom. It is useful to identifying what each 
   metaclustering represents biologically. For pixel class work, this means the flowsom generated by an Unsupervised classifier

   Note!: 
       This function plots the centroids of the clusters determined during training, without respect to the predictions.
       For a heatmap that uses the actual data from the pixel classifier post-prediction use plot_pixel_heatmap below.

   Args:
       flowsom (flowsom.FlowSOM):
           Contains the information to be plotted.

   Returns:
       a matplotlib figure and a pandas dataframe 


.. py:function:: plot_pixel_heatmap(pixel_folder: Union[str, pathlib.Path], image_folder: Union[str, pathlib.Path], channels: list[str], panel: pandas.DataFrame, silence_division_warnings=False) -> tuple[matplotlib.pyplot.figure, pandas.DataFrame]

   This plots a heatmap derived from the actual data of the pixel class regions predicted by a classifier (unlike plot_class_centers, which uses the training centroids).
   Specifically, it shows the mean of 1%-99% quantile scaled data for each channel in each pixel class.

   Args:
       pixel_folder (str, Path):
           The folder of predictions from a pixel classifier

       image_folder (str, Path):
           The folder of images that the channels intensities will be read from to construct the heatmap. Only files present in BOTH pixel_folder & image_folder
           will be used.

       channels (iterable of strings):
           The names of the antigens to use in the panel. Will be matched against the antigens in panel, and then used to slice the images to only the channels of interest.
           These antigen names are also what will be displayed on the heatmap axes.

       panel (pd.DataFrame):
           The panel file (panel.csv) of the PalmettoBUG project in question. Specifically, panel['keep'] == 0 channels are removed, and then the antigen names in channels
           are matched against the antigen names in panel['name'] to slice the images to only the channels of interest. 

       silence_division_warnings (bool):
           One of the steps of this function involves a lot of division where zero-division / related errors can occur. 
           Will silence these warnings if this parameter == True

   Returns:
       a matplotlib figure and a pandas dataframe containing the values displayed in the plot


.. py:function:: segment_class_map_folder(pixel_classifier_directory: Union[pathlib.Path, str], output_folder: Union[pathlib.Path, str], distance_between_centroids: int = 10, threshold: int = 5, to_segment_on: list[int] = [2], background: int = 1) -> None

   Takes pixel classification maps and uses edt + watershedding to segment into objects

   Args:
       pixel_classifier_directory (string or Path):  
           The path to the folder of pixel classification maps to derive segmentations from 

       output_folder (string or Path):  
           the path to a folder where the segmentation masks are to be written. 

       distance_between_centroids(integer):
           the minimum distance between centroids for the watershedding. Higher numbers remove the number of centroids and force them to be farther apart, 
           leading to fewer, larger cell segmentations, whereas lower numbers allow very close centroids, leading to smaller, more numerous segmentations. 

       threshold (integer): 
           objects smaller than this threshold (in pixels) will be removed before edt / watershedding. Objects this small could theoretically be segmented, if the 
           watershedding leads to this occurring. However, would have to happen inside a larger region being watershed from multiple points

       to_segment_on (list of integers): 
           The classes to segment on. They will be merged before running, and usually it is recommended that a dedicated supervised pixel classifier that only 
           finds the objects of interest be used (so usually only 1 class to segment on) 

       background (integer): 
           The background class, which wil be set to zero

   Returns:
       None 
       
   Inputs / Outputs:
       Inputs: 
           reads in all the files in pixel_classifier_directory as .tiff files (MUST NOT have other file types / subfolders)

       Outputs: 
           for each file read-in exports a .tiff file to output_folder