palmettobug.Pixel_Classification.use_classifiers

This module is the back-end functions for using pixel classifiers, such as mask extension by pixel classifiers, cell classification by pixel classifiers, etc.

It, however, does not contain the WholeClassAnalysis class which coordinates analysis of classes as a whole.

Many of the functions in this module are available through the public (non-GUI) API of PalmettoBUG.

Functions

`plot_classes`(class_map_folder, output_folder, **kwargs)	Allows classy masks and pixel classification outputs to be written as .png files
`merge_classes`(→ numpy.ndarray[int])	This function takes in a classifier output (numpy array, dtype = int) and a merging table (pandas DataFrame with a particular format)
`merge_folder`(→ None)	This function performs merge_classes() [see function above] on all the images in a provided folder, exporting the merged class map to a
`slice_folder`(→ None)	This function performs slice_image_by_region() [a non-public function, see code file] on every image in a folder.
`mode_classify_folder`(→ pandas.DataFrame)	This function classifies cells using a pixel classifier and also creates "classy mask" .tiff files which can be useful for merging / expanding
`secondary_flowsom`(→ tuple[flowsom.FlowSOM, ...)	This function performs a FlowSOM clustering on all the cell regions of a dataset, using the fraction of each pixel class in each cell as
`classify_from_secondary_flowsom`(→ pandas.DataFrame)	This function takes the classifications from a secondary FlowSOM and a folder of matching cell masks, and creates 'classy' masks form that.
`extend_masks_folder`(→ None)	Expands cell masks into a matching region of pixel classification. Can be used, for example, to segment

Module Contents

palmettobug.Pixel_Classification.use_classifiers.plot_classes(class_map_folder, output_folder, **kwargs)

Allows classy masks and pixel classification outputs to be written as .png files

Args:

class_map_folder (string or Path):: The folder from which .tiff files are read for conversioninto .png files.
output_folder (string or Path):: The folder where the PNG files will be written. Should exist or be make-able by os.mkdir()
**kwargs:: are passed to matplotlib.pyplot.imshow()

palmettobug.Pixel_Classification.use_classifiers.merge_classes(classifier_mask: numpy.ndarray[int], merging_table: pandas.DataFrame) → numpy.ndarray[int]

This function takes in a classifier output (numpy array, dtype = int) and a merging table (pandas DataFrame with a particular format) and outputs a new numpy array where all classses in the original array have been converted to the corresponding value in the merging column of the merging-table dataframe.

Args:

classifier_mask (np.ndarray of integers):

A pixel class prediction.

merging_table (pandas DataFrame):

The table that details how the original classes of classifier_mask will be merged, and what the final numbers will be Has a column ‘class’ for the current integer class labels of classifier_mask, and column ‘merging’ denoting what new integer labels should be for each of the original classes.

By convention, as class labeled ‘background’ should have its merging value set to 0, and NO MERGING CLASS should == 1. 1 is a special number when merging supevised classifiers, and when classifying cell masks using the ‘mode’ method.

Usually also has a column dedicated to the biologically relevant (non integer) labels that each new merging class is intended to represent.

Returns:

A numpy ndarray (integers), with the same shape as classifier_mask, but with the new merged class labels replacing the original class labels.

palmettobug.Pixel_Classification.use_classifiers.merge_folder(folder_to_merge: pathlib.Path | str, merging_table: pandas.DataFrame, output_folder: pathlib.Path | str = None) → None

This function performs merge_classes() [see function above] on all the images in a provided folder, exporting the merged class map to a specified output folder. if output_folder is None –> then the output is placed in a “/merged_classification_maps” in the same folder as the input folder.

Args:

folder_to_merge (Path or string):

the path to the folder containing the classification maps to merge

merging_table (pandas dataframe):

A pandas dataframe containing a ‘class’ column that denoting a class in the input class maps, and a ‘merging’ column denoting the new values of that class in the merged output class maps. Usually there is also a ‘label’ column, which denotes the biological label, as a string, that corresponds to each class merging.

NOTE:: DO NOT: use the number 1 as one of you merging labels if you intend on doing mode-based cell classification with the merged pixel classifier predictions. DO: use the number 0 as the merging label of ‘background’ classes – this will effectively drop them from the merged predictions

output_folder (Path, string, or None):

the path to a the folder where the merged classification maps are to be exported, with the same filenames as the original folder. If None, then the output folder will be a folder parallel to the input folder (as in, both folder will be in the same parent directory), with the name “/merged_classification_maps”.

Inputs / Outputs:

Inputs:: reads every file inside folder_to_merge one by one (assumes each is a .tiff file, and there are no subfolders!)
Outputs:: writes a new .tiff file into output_folder for every file in folder_to_merge (preserving the same filenames)

palmettobug.Pixel_Classification.use_classifiers.slice_folder(class_to_keep: int | list[int], class_map_folder: pathlib.Path | str, image_folder: pathlib.Path | str, output_folder: pathlib.Path | str, padding: int = 5, zero_out: bool = False) → None

This function performs slice_image_by_region() [a non-public function, see code file] on every image in a folder. This means that each image in the folder will be reduced to the bounding box that contains only the specified classes_to_keep.

For example: you could use this function, after classifying villi regions of an intestinal tissue section, to reduce the images to the minimal rectangle that contains all of the villi class, reducing or removing the unwanted regions of the image.

Args:

class_to_keep (integer or a list of integers):: The class(es) to subset the images on
class_map_folder (Path or string):: the path to a folder containing the classification maps (as tiffs) that will determine where the images are sliced / subsetted
image_folder (Path or string):: the path to a folder containing the images to be sliced / subsetted
output_folder (Path or string):: the path to a folder where the sliced / subsetted images will be exported as tiffs
padding (integer > 0):: how many pixels to pad the minimal boudning box of the classes_to_keep in each image. Set to 0 to not pad at all
zero_out (boolean):: If True, all pixels not in class_to_keep will have their channels values set to zero, leaving only the classes of interest contributing information to the image. Default is False, which retains the values of pixels not in classes_to_keep as long as they fall within the minimal bounding box of the classes of interest.

Returns:

None

Inputs / Outputs:

Inputs:: reads every file in the image_folder (as .tiff files), and every file in the class_map_folder (also as .tiff) These provided folders MUST NOT have files besides .tiff nor have any subfolders.
Outputs:: outputs a .tiff file to output_folder for every file read-in from image_folder/class_map_folder

palmettobug.Pixel_Classification.use_classifiers.mode_classify_folder(mask_folder: pathlib.Path | str, classifier_map_folder: pathlib.Path | str, output_folder: pathlib.Path | str, merging_table: pandas.DataFrame | None = None) → pandas.DataFrame

This function classifies cells using a pixel classifier and also creates “classy mask” .tiff files which can be useful for merging / expanding cell masks. It uses a simplistic method where the mode of the class values inside a cell masks is the class assigned to that mask.

Args:

mask_folder (Path or string):: the path to the folder containing cell masks (such as those produced by deepcell or cellpose) to be classified
classifier_map_folder (Path or string):: the path to the folder containing the classifier maps that will be used to classify the masks
output_folder (Path or string):: the path to the folder where the ‘classy mask’ tiff files will be exported

Returns:

pandas dataframe:: A dataframe with a single column, that denotes the calculated classification for every cell in the dataset. Can be added later to a Analysis as an alternative to FlowSOM-based classification of cells.

Inputs / Outputs:

Inputs:: reads all .tiff files that are in both mask_folder, classifier_map_folder (as in, a file with the same name is in both)
Outputs:: for every read-in file, exports a .tiff into output_folder

palmettobug.Pixel_Classification.use_classifiers.secondary_flowsom(mask_folder: pathlib.Path | str, classifier_map_folder: pathlib.Path | str, number_of_classes: int | None = None, XY_dim: int = 10, n_clusters: int = 10, rlen: int = 50, seed: int = 42) → tuple[flowsom.FlowSOM, pandas.DataFrame]

This function performs a FlowSOM clustering on all the cell regions of a dataset, using the fraction of each pixel class in each cell as its inputs. It is intended as a secondary step of the unsupervised, Pixie-like cell classification pipeline available in PalmettoBUG. Modeled intentionally after the steps of Pixie / Ark-Analysis by the Angelo lab:

(https://github.com/angelolab/ark-analysis).

It is intended to be part of an alternate way to classify cells using pixel classifiers instead of a direct CATALYST-style FlowSOM on the cell regions themselves.

Note that for FlowSOM integer parameters (XY_dim, n_clusters, seed) some reasonable defaults are provided, but these default – especially n_clusters – may not be ideal for your data.

Args:

mask_folder (Path or string):: the path to a folder containing the cell masks to cluster with FlowSOM
classifier_map_folder (Path or string):: The path to a folder containing ht epixel classification maps to be used to classify the cells’ masks.

NOTE! >>> The files in mask_folder & classifier_map_folder should have the same filenames!
number_of_classes (integer or None):: the number of classes in the pixel classifier that generated the maps in classifier_map_folder. If None, this will be empirically determined by reading every classification map in the folder and updating
XY_dim (integer):: the XY and dimensions of the original FlowSOM grid. (XY_dim * XY_dim) is the number of clusters generated by the FlowSOM algorithm before merging to metaclusters.
n_clusters (integer):: The number of final metaclusters for the FlowSOM algorithm to output.
rlen (integer):: The number of training iterations of the Self-Organizing Map
seed (integer):: the random state seed to run the FlowSOM algorithm with. For reproducibility of results.

Returns:

tuple(FlowSOM, pandas dataframe):

1. FlowSOM (‘fs’) –> a FlowSOM object, trained & predicting from the provided cell information. fs.get_cell_data() or fs.get_cluster_data() supply anndata objects with information see: https://flowsom.readthedocs.io/en/latest/generated/flowsom.FlowSOM.html for information about this class

2. pandas dataframe (‘anndata_fs’) –> a pandas dataframe with a single integer column with length equal to the number of cell regions in the masks of the mask_folder, with values reflecting the metacluster prediciton of the FlowSOM algorithm for each cell region. Once these values are merged into biologically relevant labels they can be inserted as column of data.obs in a PalmettoBUG.Analysis created from the same masks

palmettobug.Pixel_Classification.use_classifiers.classify_from_secondary_flowsom(mask_folder: pathlib.Path | str, output_folder: pathlib.Path | str, flowsom_data: flowsom.FlowSOM) → pandas.DataFrame

This function takes the classifications from a secondary FlowSOM and a folder of matching cell masks, and creates ‘classy’ masks form that. Additionally, returns a single-solumn dataframe with all the classifications from the FlowSOM (this can be more directly accessed with (flowsom_data.get_cell_data().obs[‘metaclustering’] + 1)).

NOTE! >>> The classy masks are 1-indexed because 0 is a special number (background) in images, while the FlowSOM classes are 0-indexed
like the majority of python. This is why (flowsom_data.get_cell_data().obs[‘metaclustering’] + 1) describes the classes accurately in the classy masks, and not just flowsom_data.get_cell_data().obs[‘metaclustering’].

Usually, the classifications here are an intermediate step, with overclustering / excessive clustering being performed as is usual for FlowSOM clustering, and manual merging being a necessary step afterwards to derive biologically useful labels for the cells.

Args:

mask_folder (str or Path):

the directory path to a folder containing the cell mask .tiffs that are to be classified with the secondary FlowSOM output.

NOTE! >>> the FlowSOM must have been trained / predicted from the same cell masks in the same file order, or the: classification will invalid.

output_folder (str or Path):

the path to a folder where the “classy masks” will be exported.

flowsom_data (FlowSOM):

The trained/predicted FlowSOM object from which the predictions will be derived.

Returns:

pandas dataframe:: a single-column of integers pandas dataframe containing the cell classification assignments from the FlowSOM. It should represent (flowsom_data.get_cell_data().obs[‘metaclustering’] + 1), where ‘flowsom_data’ is the input argument to the function.

Inputs / Outputs:

Inputs:: reads every file in the mask_folder as .tiff file (MUST NOT have other files / subfolders)
Outputs:: for every file read-in, writes a .tiff file inside output_folder

palmettobug.Pixel_Classification.use_classifiers.extend_masks_folder(classifier_map_folder: pathlib.Path | str, mask_folder: pathlib.Path | str, classy_mask_folder: pathlib.Path | str, output_directory_folder: pathlib.Path | str, merge_list: list[int] | None = None, connectivity: int = 1) → None

Expands cell masks into a matching region of pixel classification. Can be used, for example, to segment irregularly shaped cell types into non-circular masks. Operates on a whole folder of images.

Args:

classifier_map_folder (str or Path):

the path to a folder of a pixel classifier’s output (as .tiff files)

mask_folder (str or Path):

the path to a folder of cell masks (segmentation output as .tiff files) to extend

classy_mask_folder (str or Path):

The path to a folder of “classy masks” as .tiff files NOTE! >>> The files in classifier_map_folder, mask_folder, classy_mask_folder should all align with each other, as in:

–> same file names in the same order

—> the classy masks should be derived from the masks

–> the numbers of the classy masks should match the numbers of the pixel classifications in the classifier_map_folder
(as in, class 1 should mean the same biological thing in both: for example if class 1 is astrocyte in the class maps, class 1 must mean astrocyte in the classy masks too in order to have a valid merging/expansion on class 1)

output_directory_folder (str or Path):

the path to a folder where you want to save the expanded cell masks

merge_list (list of integers, or None):

a list of the classes to merge / extend the masks on. if None, then all classes are used – if there are background classes in the pixel classifier’s output, then leaving merge_list = None is HIGHLY discouraged, as you are likely to end up with wildly large cell masks.

connectivity (integer):

values = 1 or 2. This determines whether, when performing the final scikit-image watershedding step of the merge / expansion, pixel are considered connected when touching diagonally (2) or not (1). This means connectivity = 2 will (slightly) more aggressively extend the cell masks than connectivity = 1. See: https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.watershed for details of the internal function in which the conectivity parameter is used.

Returns:

None

Inputs / Outputs:

Inputs:: for every .tiff file shared between all three (classifier_map_folder, mask_folder, classy_mask_folder) input folders. As in, every filename present in all three (assumed to be from the same image), this funciton reads in those files.
Outputs:: For every shared .tiff file read in (really set of three from all input folders), will output one .tiff file in the output_directory_folder