Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
wli51
left a comment
There was a problem hiding this comment.
Great work! Very informative figures.
One thing that stood out to me is whether the modified apply_pca still works with analysis from early on. If not, you might want to adjust the function such that it is backwards compatible. Please see comments in utils/preprocess.py
| @@ -1,6 +1,10 @@ | |||
| #!/usr/bin/env python | |||
|
|
|||
| # In[15]: | |||
There was a problem hiding this comment.
run id 15 in line 3 is crazy
| random_state=0, | ||
| **kwargs, | ||
| ) -> pl.DataFrame: | ||
| ) -> tuple[pl.DataFrame, pl.DataFrame]: |
There was a problem hiding this comment.
For the sake of backwards compatibility with all the previous analysis you already have. I think it is best to add a boolean switch for whether the explained variance dataframe should be returned. The boolean switch should default to False, making the function by default only returning the PC dataframe and identical to the old return signature. For newer analysis that require the explained variance the boolean switch will need to be explicitly set as True.
Something like this:
def apply_pca(
...,
return_var_explained: bool=False,
) -> pl.DataFrame | tuple[pl.DataFrame, pl.DataFrame]:
...
pc_df = pl.concat(
[
profiles.select(meta_features), # metadata df
pl.DataFrame(
principal_components, schema=pca_colnames
), # PCA components df
],
how="horizontal",
)
if return_var_explained:
return pc_df, explained_variance_df
else:
return pc_dfThere was a problem hiding this comment.
Is there a way to make the 2 healthy frames more distinct from the failing frames? This should make the gif even more clear.
There was a problem hiding this comment.
It also appears that the off-target features are a lot less differentiable between healthy/failing + treated/dmso. Maybe this is evidence that the on scores should be viewed as much more important?
|
Thanks for your review! now merging |
This PR updates the UMAP and PCA plots to separate by treatment and cell state. It also updates the notebooks that perform dimensionality reduction as well as the plotting notebooks.