palmettobug.Analysis_functions.WholeClassAnalysis ================================================= .. py:module:: palmettobug.Analysis_functions.WholeClassAnalysis .. autoapi-nested-parse:: This module contains the back-end class (WholeClassAnalysis) that handles the analysis of pixel classes as-a-whole. It is accessed in the GUI through the second half of the third tab of the program (use pixel classifier). It is also available through the public (non-GUI) API of PalmettoBUG. Classes ------- .. autoapisummary:: palmettobug.Analysis_functions.WholeClassAnalysis.WholeClassAnalysis Module Contents --------------- .. py:class:: WholeClassAnalysis(directory: Union[pathlib.Path, str], classifier_df: pandas.DataFrame, metadata: pandas.DataFrame, Analysis_panel: pandas.DataFrame, csv: Union[pandas.DataFrame, None] = None) This class handles the whole-class Analysis, where pixel regions are treated as if they are cell segmentation masks It has limited options compared to the standard experiment class that handles true single cell data. This class only has a few plot options and a single statistics option, and no batch correction, dropping of samples, or scaling Args: directory (string or Path): the path to a folder containing /intensities and /regionprops subfolders, which would have been produced by running region measurements on the pixel classification maps generated by a pixel classifier. classifier_df (pandas dataframe): the biological_labels.csv exported from the pixel classifier whose output is being used. Contains "labels", "class", and/or "merge" columns, which help associate region numbers in the images / regionpros & intensity csvs with the biological labels in the classifier. metadata (pandas dataframe): analogous to the metadata csv file in a standard, single-cell analysis Contains the same, file_name, sample_id, patient_id, condition columns Analysis_panel (pandas dataframe): analogous to the Analysis_panel csv file in a standard, single-cell analysis. For example, contains columns for antigen and marker_class. class_type (string): one of -- "premerge", "merged" -- whether the outputs of the classifier are before or after merging (relevant for what column in classifier_df is used as the class, "merging" or "class" ) Key Attributes: data (anndata.AnnData): the data, with data.X being a numpy array containing the channel information per "event" (per class per image) data.obs being derived from the inputted metadata and data.var being derived from the Analysis_panel class_labels (pandas DataFrame): this is the inputted classifier_df, which associates the class numbers with biological labels directory (str): The path to the folder where the analysis is to be initialized. Used to set up directories (such as save_dir, data_table_dir), to export some files (input_tables_to_csv) and to find the expected intensities / regionprops csv files when loading the data. save_dir (str): The path to where plots are saved by this class (when filename is provided in plotting functions) data_table_dir (str): The path to where data tables are saved by this class (when filename is provided to methods that produce dataframes such as statistics / exports) .. py:attribute:: directory :value: '' .. py:attribute:: class_labels .. py:attribute:: _metadata .. py:attribute:: _panel .. py:attribute:: save_dir :value: '/python_plots' .. py:attribute:: data_table_dir :value: '/Data_tables' .. py:attribute:: percent_areas :value: None .. py:method:: _load(csv: Union[pandas.DataFrame, None] = None, arcsinh_cofactor: int = 5) -> None Helper to the __init__ method: performs the loading and shaping of data during the initial load. .. py:method:: input_tables_to_csv() -> None Allows the saving of the primary csv files within this class to the disk inside the self.directory folder .. py:method:: plot_percent_areas(filename: Union[str, None] = None, N_column: str = 'sample_id', calculate_only: bool = False) -> matplotlib.pyplot.figure Plots a boxplots of percent class in each image, showing and comparing the distributions between conditions Returns the plot as a matplotlib figure Args: filename (str or None): If filename is specified, this will export the plot as a PNG file to self.save_dir/{filename}.png N_column (str): The aggregating group for the data. As in, the individual dots of the distribution in the boxplot will be determined by the unique groups in this column. calculate_only (bool): Default == False. If True (& self.percent_areas == None), this method will not return anything, but instead will perform the calculation of %pixel class in each ROI. This calculation will be saved to self.percent_areas, where it can easily be plotted by this function later. This is implemented to save time by meaning the calculations only have to be done once Returns: matplotlib.pyplot figure or None (returns None only if calculate_only == True, and no prior calculation of the % areas has been done) .. py:method:: plot_distribution_exprs(unique_class: Union[str, int], plot_type: str, N_column: str = 'sample_id', marker_class: str = 'All', filename: Union[str, None] = None) -> seaborn.FacetGrid Plots a Bar or Violin plot from the distribution of marker expression / %class in each sample_id, comparing conditions Args: unique_class (string or integer): Indicates which pixel class to plot antigen expressions for N_column (str): Indicates which column in the data will serve as the aggregating column for creating the distribution in the final plot plot_type (string): 'Violin' or 'Bar' -- determines what kind of plot is created marker_class (string): 'All', 'type', 'state', or 'none' (or any other marker_class in self.data.var['marker_class']). Determines which antigens are used in the plot By default, every antigen, regardless of marker_class is used ('All'). filename: If specified, this funciton will additionally export the plot as a PNG file to self.save_dir/{filename}.png Returns: the plot as a seaborn FacetGrid (FacetGrid.figure --> a matplotlib figure) .. py:method:: whole_marker_exprs_ANOVA(marker_class: str = 'All', groupby_column: str = 'class', N_column: str = 'sample_id', variable: str = 'condition', statistic: str = 'mean', area: bool = True) -> pandas.DataFrame Calculates statistics comparing the conditions in the experiment using ANOVA on the expression of [marker_class] markers and %area of each class Args: marker_class (string): which markers / antigens to test by ANOVA. one of -- "All", "type","state", "none". groupby_column (string): which column the data will be grouped by for the purposed of running separate ANOVAs for each group (as this is whole-class analysis, should always be "class") N_column (string): The column in the data that will defines the groups of the statistical test (i.e., the 'N' groups that contribute to the degrees of freedom in the test) variable (string): which column in self.data.obs will be trated as the column containing condition / group information statistic (string): one of --"ANOVA", "Kruskal" -- which statistical test (ANOVA, kruskal-wallis), and what aggregate statistic (mean/std or median/IQR, respectively) is calculated & displayed in the final dataframe area (bool): whether to also calculate an ANOVA comparing the %area of each class between the conditions (default is True) Returns: (pandas dataframe): the pandas dataframe contianing the statistical outputs of this test. .. py:method:: plot_heatmap(type_of_stat: str = 'F statistic', filename: Union[str, None] = None) -> matplotlib.pyplot.figure Plots a statistics heatmap. -Neg log(statistic) if the statistic is a p value instead of an F statistic .. py:method:: export_data(subset_columns: Union[list[str], None] = None, subset_types: Union[list[list[str]], None] = None, groupby_columns: Union[list[str], None] = None, statistic: str = 'mean', groupby_nan_handling: str = 'zero', include_marker_class_row: bool = False, untransformed: bool = False, filename: Union[str, None] = None) -> pandas.DataFrame Exports currently loaded data from the Analysis, from self.data. Preserves any previously performed scaling, dropped categories, & batch correction. Always of arcsinh(data / 5) transformed data. Can export the entirety of relevant self.data information, or export subsets of self.data, and/or export aggregate summary statistics for groups within the data. Args: subset_columns (list[str] or None): a list of strings denoting the columns to subset the dataframe's rows on (here and in other arguments, non-string input is attepmted to be cast to strings inside the function, as well as the corresponding column of the data). if this or subset_types is None, no subsetting occurs. subset_types (list[list[str]] or None): a list contains sub-lists for strings. The length of the upper list must be the length of the subset_columns list, as each sub-list contains strings corresponding to the rows to keep. As in: if subset_columns = ['column1', 'column3'] and subset_types = [['type2', 'type6'],['typeB', 'typeZ']], then rows of type2 / type6 in column1 will be kept, and similarly rows of typeB / typeZ in column2. When > 1 columns / conditions are subsetted on, as in the above example, the rows that are kept are the union of all the subsetting conditions WITHIN a given column, but the intersection BETWEEN what is kept from each column. So in the above example, all rows of column1 == type2/6 that also have column2 == typeB/Z are the rows that are maintained. groupby_columns (list[str] or None): A list of strings indicating what columns of the data to groupby. If None, then grouping is not performed. Used like this: self.data.obs.groupby(groupby_columns) but on a dataframe containing the data.X values as well statistic (str): Possible values: 'mean','median','sum','std','count'. Denotes the pandas groupby method to be used after grouping (ignored if groupby_columns is None). Numeric methods (mean, median, sum, std) are only applied to numeric columns, so only those columns + the groupby columns will be in the final dataframe / csv groupby_nan_handling(str): 'zero' or 'drop' -- when grouping the data whether to drop (nans), which usually represent non-existent category combinations or to convert nans to zeros. Any other values of this parameter will cause NaNs to be left as-is in the data export Note that the default (and only option available in GUI) is 'zero', which converts ALL NaN values to 0, while the 'drop' option only drops rows where EVERY numerical value is NaN. By default, all possible groupby_columns combinations are included in the export (even if they are not present in the data, such cell types not present in every ROI), This is the source of most NaN values. Notably, columnns in the metadata (not data.obs!) of the Analysis are given special treatment to try to prevent non-existent experimental categories from having data exported (for example, each ROI / sample_id should have been with a single condition, not every possible condition in the dataset). include_marker_class_row (bool): Whether to include the marker_class information as a row at the bottom of the table --> True to include this row -- useful for reimport into PalmettoBUG. False to not include this row -- this is probably better for import into non-PalmettoBUG software for analysis, or at the least the user will need to remember to remove this row before analyzing! When the marker_class row is included, it is encoded as integers (to prevent mixed dtype issues/warnings on reload) >>> 0 = 'none', 1 = 'type', 2 = 'state' metadata columns (which have no marker_class) have this row filled with 'na'. NOT USED IN COMBINATION WITH GROUPING! untransformed (bool): if True, will export the untransformed (pre-arcsinh, pre-scaling, etc., etc.) data, from self.data.uns['count']. Provided so that the raw data is not difficult to recover, although not expected to be used frequently. Default == False. filename: (str, or None): the name of the csv file to save the exported dataframe inside the self.data_table_dir folder. If None, no export occurs, and the data table is only returned. Returns: (pandas DataFrame) -- the pandas dataframe representing the exported data. Inputs/Outputs: Outputs: If filename is provided (is not None), then exports the data table to self.data_table_dir/filename.csv