Plot¶

Functions that generate plots

clarite.plot.histogram(data, column: str, figsize: Tuple[int, int] = (12, 5), title: Optional[str] = None, figure: Optional[matplotlib.pyplot.figure] = None, **kwargs)¶
Plot a histogram of the values in the given column.

Parameters

data: pd.DataFrame
The DataFrame containing data to be plotted

column: string
The name of the column that will be plotted

figsize: tuple(int, int), default (12, 5)
The figure size of the resulting plot

title: string or None, default None
The title used for the plot

figure: matplotlib Figure or None, default None
Pass in an existing figure to plot to that instead of creating a new one (ignoring figsize)

**kwargs:
Other keyword arguments to pass to the histplot or catplot function of Seaborn

Returns

None

Examples
>>> import clarite
>>> title = f"Discovery: Skew of BMIMBX = {stats.skew(nhanes_discovery_cont['BMXBMI']):.6}"
>>> clarite.plot.histogram(nhanes_discovery_cont, column="BMXBMI", title=title, bins=100)
clarite.plot.distributions(data, filename: str, continuous_kind: str = 'count', nrows: int = 4, ncols: int = 3, quality: str = 'medium', variables: Optional[List[str]] = None, sort: bool = True)¶
Create a pdf containing histograms for each binary or categorical variable, and one of several types of plots for each continuous variable.

Parameters

data: pd.DataFrame
The DataFrame containing data to be plotted

filename: string or pathlib.Path
Name of the saved pdf file. The extension will be added automatically if it was not included.

continuous_kind: string
What kind of plots to use for continuous data. Binary and Categorical variables will always be shown with histograms. One of {‘count’, ‘box’, ‘violin’, ‘qq’}

nrows: int (default=4)
Number of rows per page

ncols: int (default=3)
Number of columns per page

quality: ‘low’, ‘medium’, or ‘high’
Adjusts the DPI of the plots (150, 300, or 1200)

variables: List[str] or None
Which variables to plot. If None, all variables are plotted.

sort: Boolean (default=True)
Whether or not to sort variable names

Returns

None

Examples
>>> import clarite
>>> clarite.plot.distributions(df[['female', 'occupation', 'LBX074']], filename="test")
>>> clarite.plot.distributions(df[['female', 'occupation', 'LBX074']], filename="test", continuous_kind='box')
>>> clarite.plot.distributions(df[['female', 'occupation', 'LBX074']], filename="test", continuous_kind='violin')
>>> clarite.plot.distributions(df[['female', 'occupation', 'LBX074']], filename="test", continuous_kind='qq')
clarite.plot.manhattan(dfs: Dict[str, pandas.core.frame.DataFrame], categories: Optional[Dict[str, str]] = None, bonferroni: Optional[float] = 0.05, fdr: Optional[float] = None, num_labeled: int = 3, label_vars: Optional[List[str]] = None, figsize: Tuple[int, int] = (12, 6), dpi: int = 300, title: Optional[str] = None, figure: Optional[matplotlib.pyplot.figure] = None, colors: List[str] = ['#53868B', '#4D4D4D'], background_colors: List[str] = ['#EBEBEB', '#FFFFFF'], filename: Optional[str] = None, return_figure: bool = False)¶
Create a Manhattan-like plot for a list of EWAS Results

Parameters

dfs: DataFrame
Dictionary of dataset names to pandas dataframes of ewas results (requires certain columns)

categories: dictionary (string: string) or None
A dictionary mapping each variable name to a category name for optional grouping

bonferroni: float or None (default 0.05)
Show a cutoff line at the pvalue corresponding to a given bonferroni-corrected pvalue

fdr: float or None (default None)
Show a cutoff line at the pvalue corresponding to a given fdr

num_labeled: int, default 3
Label the top <num_labeled> results with the variable name

label_vars: list of strings, default None
Label the named variables (or pass None to skip labeling this way)

figsize: tuple(int, int), default (12, 6)
The figure size of the resulting plot in inches

dpi: int, default 300
The figure dots-per-inch

title: string or None, default None
The title used for the plot

figure: matplotlib Figure or None, default None
Pass in an existing figure to plot to that instead of creating a new one (ignoring figsize and dpi)

colors: List(string, string), default [“#53868B”, “#4D4D4D”]
A list of colors to use for alternating categories (must be same length as ‘background_colors’)

background_colors: List(string, string), default [“#EBEBEB”, “#FFFFFF”]
A list of background colors to use for alternating categories (must be same length as ‘colors’)

filename: Optional str
If provided, a copy of the plot will be saved to the specified file instead of being shown

return_figure: boolean, default False
If True, return figure instead of showing or saving the plot. Useful to customize the plot

Returns

figure: matplotlib Figure or None
If return_figure, returns a matplotlib Figure object. Else returns None

Examples
>>> clarite.plot.manhattan({'discovery':disc_df, 'replication':repl_df}, categories=data_categories, title="EWAS Results")
clarite.plot.manhattan_fdr(dfs: Dict[str, pandas.core.frame.DataFrame], categories: Optional[Dict[str, str]] = None, cutoff: Optional[float] = 0.05, num_labeled: int = 3, label_vars: Optional[List[str]] = None, figsize: Tuple[int, int] = (12, 6), dpi: int = 300, title: Optional[str] = None, figure: Optional[matplotlib.pyplot.figure] = None, colors: List[str] = ['#53868B', '#4D4D4D'], background_colors: List[str] = ['#EBEBEB', '#FFFFFF'], filename: Optional[str] = None, return_figure: bool = False)¶
Create a Manhattan-like plot for a list of EWAS Results using FDR significance

Parameters

dfs: DataFrame
Dictionary of dataset names to pandas dataframes of ewas results (requires certain columns)

categories: dictionary (string: string) or None
A dictionary mapping each variable name to a category name for optional grouping

cutoff: float or None (default 0.05)
The pvalue to draw the FDR significance line at (None for no line)

num_labeled: int, default 3
Label the top <num_labeled> results with the variable name

label_vars: list of strings, default None
Label the named variables (or pass None to skip labeling this way)

figsize: tuple(int, int), default (12, 6)
The figure size of the resulting plot in inches

dpi: int, default 300
The figure dots-per-inch

title: string or None, default None
The title used for the plot

figure: matplotlib Figure or None, default None
Pass in an existing figure to plot to that instead of creating a new one (ignoring figsize and dpi)

colors: List(string, string), default [“#53868B”, “#4D4D4D”]
A list of colors to use for alternating categories (must be same length as ‘background_colors’)

background_colors: List(string, string), default [“#EBEBEB”, “#FFFFFF”]
A list of background colors to use for alternating categories (must be same length as ‘colors’)

filename: Optional str
If provided, a copy of the plot will be saved to the specified file instead of being shown

return_figure: boolean, default False
If True, return figure instead of showing or saving the plot. Useful to customize the plot

Returns

figure: matplotlib Figure or None
If return_figure, returns a matplotlib Figure object. Else returns None

Examples
>>> clarite.plot.manhattan_fdr({'discovery':disc_df, 'replication':repl_df},
 categories=data_categories, title="EWAS Results")
clarite.plot.manhattan_bonferroni(dfs: Dict[str, pandas.core.frame.DataFrame], categories: Optional[Dict[str, str]] = None, cutoff: Optional[float] = 0.05, num_labeled: int = 3, label_vars: Optional[List[str]] = None, figsize: Tuple[int, int] = (12, 6), dpi: int = 300, title: Optional[str] = None, figure: Optional[matplotlib.pyplot.figure] = None, colors: List[str] = ['#53868B', '#4D4D4D'], background_colors: List[str] = ['#EBEBEB', '#FFFFFF'], filename: Optional[str] = None, return_figure: bool = False)¶
Create a Manhattan-like plot for a list of EWAS Results using Bonferroni significance

Parameters

dfs: DataFrame
Dictionary of dataset names to pandas dataframes of ewas results (requires certain columns)

categories: dictionary (string: string) or None
A dictionary mapping each variable name to a category name for optional grouping

cutoff: float or None (default 0.05)
The pvalue to draw the Bonferroni significance line at (None for no line)

num_labeled: int, default 3
Label the top <num_labeled> results with the variable name

label_vars: list of strings, default None
Label the named variables (or pass None to skip labeling this way)

figsize: tuple(int, int), default (12, 6)
The figure size of the resulting plot in inches

dpi: int, default 300
The figure dots-per-inch

title: string or None, default None
The title used for the plot

figure: matplotlib Figure or None, default None
Pass in an existing figure to plot to that instead of creating a new one (ignoring figsize and dpi)

colors: List(string, string), default [“#53868B”, “#4D4D4D”]
A list of colors to use for alternating categories (must be same length as ‘background_colors’)

background_colors: List(string, string), default [“#EBEBEB”, “#FFFFFF”]
A list of background colors to use for alternating categories (must be same length as ‘colors’)

filename: Optional str
If provided, a copy of the plot will be saved to the specified file instead of being shown

return_figure: boolean, default False
If True, return figure instead of showing or saving the plot. Useful to customize the plot

Returns

figure: matplotlib Figure or None
If return_figure, returns a matplotlib Figure object. Else returns None

Examples
>>> clarite.plot.manhattan_bonferroni({'discovery':disc_df, 'replication':repl_df},
 categories=data_categories, title="EWAS Results")
clarite.plot.top_results(ewas_result: pandas.core.frame.DataFrame, pvalue_name: str = 'pvalue', cutoff: Optional[float] = 0.05, num_rows: int = 20, figsize: Optional[Tuple[int, int]] = None, dpi: int = 300, title: Optional[str] = None, figure: Optional[matplotlib.pyplot.figure] = None, filename: Optional[str] = None)¶
Create a dotplot for EWAS Results showing pvalues and beta coefficients

Parameters

ewas_result: DataFrame
EWAS Result to plot

pvalue_name: str
‘pvalue’, ‘pvalue_fdr’, or ‘pvalue_bonferroni’

cutoff: float (default 0.05)
A vertical line is drawn in the pvalue column to show a significance cutoff

num_rows: int (default 20)
How many rows to show in the plot

figsize: tuple(int, int), default (12, 6)
The figure size of the resulting plot in inches

dpi: int, default 300
The figure dots-per-inch

title: string or None, default None
The title used for the plot

figure: matplotlib Figure or None, default None
Pass in an existing figure to plot to that instead of creating a new one (ignoring figsize and dpi)

filename: Optional str
If provided, a copy of the plot will be saved to the specified file instead of being shown

Returns

None

Examples
>>> clarite.plot.top_results(ewas_result)