pr_pseudotime result triming #103

x1han · 2023-04-29T10:36:24Z

Dear Palantir developers, @ManuSetty
Palantir is an excellent tool for trajectory analysis, but I have a problem now.
My input is a published dataset containing Oligodendrocytes lineage cells(Cell lines with precise differentiation characteristics, GSE75330), but the results obtained were not ideal.
These are codes I used:

sc.pp.normalize_total(marques_ref_adata_sub_test, target_sum=1e4)
sc.pp.log1p(marques_ref_adata_sub_test)
sc.pp.highly_variable_genes(marques_ref_adata_sub_test)
sc.pp.pca(marques_ref_adata_sub_test)
# Run diffusion maps
pca_projections = pd.DataFrame(marques_ref_adata_sub_test.obsm['X_pca'], index=marques_ref_adata_sub_test.obs_names)
dm_res = palantir.utils.run_diffusion_maps(pca_projections, n_components=5)
ms_data = palantir.utils.determine_multiscale_space(dm_res)
sc.pp.neighbors(marques_ref_adata_sub_test)
sc.tl.umap(marques_ref_adata_sub_test)

start_cell = marques_ref_adata_sub_test[marques_ref_adata_sub_test.obs['cell_type'].isin(['OPC']), :].obs.index.tolist()[0]
terminal_states = pd.Series(['MOL2', 'MOL3'], index = [marques_ref_adata_sub_test[marques_ref_adata_sub_test.obs['cell_type'].isin(['MOL2']), :].obs.index.tolist()[0], marques_ref_adata_sub_test[marques_ref_adata_sub_test.obs['cell_type'].isin(['MOL3']), :].obs.index.tolist()[0]])

pr_res = palantir.core.run_palantir(ms_data, early_cell = start_cell, num_waypoints=500, terminal_states = terminal_states.index, use_early_cell_as_start = True)
pr_res.branch_probs.columns = terminal_states[pr_res.branch_probs.columns]
palantir.plot.plot_palantir_results(pr_res, umap)
marques_ref_adata_sub_test.obs['pr_pseudotime'] = pd.DataFrame(pr_res.pseudotime).reindex(marques_ref_adata_sub_test.obs_names)
scv.pl.scatter(marques_ref_adata_sub_test, basis='umap', color = ['pr_pseudotime', 'cell_type'], color_map='gnuplot2', legend_loc='on data')

and the results:

Sampling and flocking waypoints...
Time for determining waypoints: 0.004442989826202393 minutes
Determining pseudotime...
Shortest path distances using 30-nearest neighbor graph...
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.
Time for shortest paths: 0.8781302213668823 minutes
Iteratively refining the pseudotime...
Correlation at iteration 1: 0.9999
Correlation at iteration 2: 1.0000
Entropy and branch probabilities...
Markov chain construction...
Computing fundamental matrix and absorption probabilities...
Project results to all cells...

It seems that Palantir's results appear to have misidentified the NFOL2 cluster as a terminally differentiated state, although this was not actually the case, meanwhile, the result of scanpy.tl.dpt may be the right one.

I have to say that Palantir is quite an excellent tool and I am quite fond of it. Therefore, in this context, I have two concerns.

How can I remove the warning 'findfont: Font family ['Raleway'] not found. Falling back to DejaVu Sans.', (This occurred an excessive number of times. )
Is there any parameters that can adjust the pseudotime result to the right one: OPC-COP-NFOL-MFOL-MOL. (Given that the dpt results appear to align with the actual outcomes. )

Hoping for your reply!

x1han · 2023-04-29T10:41:56Z

Furthermore, upon examining the distribution of these cell types on the diffusion map, I found that they appeared to cluster in the anticipated pattern. However, the final results were not consistent with this observation.

palantir.plot.highlight_cells_on_tsne(ms_data.iloc[:, :2], marques_ref_adata_sub_test[marques_ref_adata_sub_test.obs['cell_type'].isin(['OPC']), :].obs.index.tolist())
palantir.plot.highlight_cells_on_tsne(ms_data.iloc[:, :2], marques_ref_adata_sub_test[marques_ref_adata_sub_test.obs['cell_type'].isin(['COP']), :].obs.index.tolist())
palantir.plot.highlight_cells_on_tsne(ms_data.iloc[:, :2], marques_ref_adata_sub_test[marques_ref_adata_sub_test.obs['cell_type'].isin(['NFOL1', 'NFOL2']), :].obs.index.tolist())
palantir.plot.highlight_cells_on_tsne(ms_data.iloc[:, :2], marques_ref_adata_sub_test[marques_ref_adata_sub_test.obs['cell_type'].isin(['MFOL1', 'MFOL2']), :].obs.index.tolist())
palantir.plot.highlight_cells_on_tsne(ms_data.iloc[:, :2], marques_ref_adata_sub_test[marques_ref_adata_sub_test.obs['cell_type'].isin(['MOL1', 'MOL2', 'MOL3', 'MOL4', 'MOL5']), :].obs.index.tolist())

ManuSetty · 2023-05-01T04:58:44Z

Hello

Can you please show me a plot of umaps colored by the selected diffusion components ?

x1han · 2023-05-01T05:16:35Z

sorry i didn't understand what you mean, can you tell me in detail what plot I need to use, or you can write a simple code for me to understand, thank you.

ManuSetty · 2023-05-01T17:54:36Z

My apologies that it wasn't clear. I meant these plots

The snippet is in the tutorial notebook here: https://github.com/dpeerlab/Palantir/blob/master/notebooks/Palantir_sample_notebook.ipynb

x1han · 2023-05-03T04:24:18Z

My apologies, I did not realize this was part of the tutorial content.

ManuSetty · 2023-05-07T22:19:16Z

Sorry for the delayed response - I think the kernel computation in Palantir is running into some issues here. You could try using the scanpy default kernel. You can do this using dm_res = palantir.utils.run_diffusion_maps(ad.obsp['connectivities']). I believe this will help fix your problem.

x1han · 2023-05-08T06:03:13Z

Thanks for the advice! There are the code and outputs:

sc.pp.normalize_total(marques_ref_adata_sub_test2, target_sum=1e4)
sc.pp.log1p(marques_ref_adata_sub_test2)
sc.pp.highly_variable_genes(marques_ref_adata_sub_test2)
sc.pp.pca(marques_ref_adata_sub_test2)
sc.pp.neighbors(marques_ref_adata_sub_test2)
sc.tl.umap(marques_ref_adata_sub_test2)
umap2 = pd.DataFrame(marques_ref_adata_sub_test2.obsm['X_umap'], index=marques_ref_adata_sub_test2.obs_names)

# Run diffusion maps
connect_projections = pd.DataFrame(marques_ref_adata_sub_test2.obsp['connectivities'].todense(), index=marques_ref_adata_sub_test2.obs_names)
dm_res2 = palantir.utils.run_diffusion_maps(connect_projections)
ms_data2 = palantir.utils.determine_multiscale_space(dm_res2)

palantir.plot.plot_diffusion_components(umap2, dm_res2)

palantir.plot.highlight_cells_on_tsne(ms_data2.iloc[:, :2], marques_ref_adata_sub[marques_ref_adata_sub.obs['cell_type'].isin(['OPC']), :].obs.index.tolist())
palantir.plot.highlight_cells_on_tsne(ms_data2.iloc[:, :2], marques_ref_adata_sub[marques_ref_adata_sub.obs['cell_type'].isin(['COP']), :].obs.index.tolist())
palantir.plot.highlight_cells_on_tsne(ms_data2.iloc[:, :2], marques_ref_adata_sub[marques_ref_adata_sub.obs['cell_type'].isin(['NFOL1', 'NFOL2']), :].obs.index.tolist())
palantir.plot.highlight_cells_on_tsne(ms_data2.iloc[:, :2], marques_ref_adata_sub[marques_ref_adata_sub.obs['cell_type'].isin(['MFOL1', 'MFOL2']), :].obs.index.tolist())
palantir.plot.highlight_cells_on_tsne(ms_data2.iloc[:, :2], marques_ref_adata_sub[marques_ref_adata_sub.obs['cell_type'].isin(['MOL1', 'MOL2', 'MOL3', 'MOL4', 'MOL5']), :].obs.index.tolist())

start_cell = marques_ref_adata_sub_test2[marques_ref_adata_sub_test2.obs['cell_type'].isin(['OPC']), :].obs.index.tolist()[0]
terminal_states = pd.Series(['MOL2', 'MOL3'], index = [marques_ref_adata_sub_test2[marques_ref_adata_sub_test2.obs['cell_type'].isin(['MOL2']), :].obs.index.tolist()[0], marques_ref_adata_sub_test2[marques_ref_adata_sub_test2.obs['cell_type'].isin(['MOL3']), :].obs.index.tolist()[0]])
pr_res2 = palantir.core.run_palantir(ms_data2, early_cell = start_cell, num_waypoints=500, terminal_states = terminal_states.index, use_early_cell_as_start = True)
pr_res2.branch_probs.columns = terminal_states[pr_res2.branch_probs.columns]
palantir.plot.plot_palantir_results(pr_res2, umap2)
marques_ref_adata_sub_test2.obs['pr_pseudotime'] = pd.DataFrame(pr_res2.pseudotime).reindex(marques_ref_adata_sub_test2.obs_names)
scv.pl.scatter(marques_ref_adata_sub_test2, basis='umap', color = ['pr_pseudotime', 'cell_type'], color_map='gnuplot2', legend_loc='on data')

When I run python dm_res = palantir.utils.run_diffusion_maps(ad.obsp['connectivities']), the dm_res has no index, so I run python connect_projections = pd.DataFrame(marques_ref_adata_sub_test2.obsp['connectivities'].todense(),index=marques_ref_adata_sub_test2.obs_names); dm_res2 = palantir.utils.run_diffusion_maps(connect_projections) instead, I am not sure if the changes I made to the code will affect the final results.
I am not sure if these results meet your expectations, but I think these results are strange.

ManuSetty · 2023-05-09T01:27:06Z

Sorry I wasnt clear. You will need to run dm_res = palantir.utils.run_diffusion_maps(ad.obsp['connectivities'])

Palantir will recognize the sparse matrix and compute the diffusion maps directly on this kernel. To get around the issue of index, you can use dm_res['EigenVectors'].index = ad.obs_names This should help fix the issue.

x1han · 2023-05-09T09:24:56Z

My apologies, the results do not seem to differ much from those obtained using PCA. I have uploaded the data to the network drive. Perhaps when you have the time, you could test to determine exactly where the issue lies. If you have any findings, please inform me of the details. Thank you very much.
https://drive.google.com/file/d/1di9xWjhUYDEz0aQdSf1wCNYX5c-fJ6EU/view?usp=sharing

katosh · 2024-07-16T22:32:08Z

Hi @x1han, I am revisiting this old issue in an attempt to resolve the mystery.

When looking at your data, and recomputing the UMAP, I noticed this bridge of cells connecting the OPC and the MOL1 cluster:

This indicates, that there is a connection in knn-graph that also serves as the basis for the diffusion map and pseudotime computation. To fix this I wanted to remove all cells that connect the start and end of the differentiation trajectory. For this, I first defined the two parts based on the cell type annotation:

import numpy as np

c1 = ["OPC", "COP", "NFOL1"]
c2 = set(ad.obs["cell_type"]).difference(c1 + ["NFOL2"])

c1_mask = ad.obs["cell_type"].isin(c1).values
c2_mask = ad.obs["cell_type"].isin(c2).values
ad.obs["split"] = np.where(c1_mask, "c1", np.where(c2_mask, "c2", "neither"))

sc.pl.embedding(ad, basis="umap", color="split")

And then I selected all cells that directly connect the two parts in the knn-graph:

def is_connecting(i):
    neighbors = ad.obsp["connectivities"].getrow(i).indices
    if c1_mask[i]:
        return np.any(c2_mask[neighbors])
    elif c2_mask[i]:
        return np.any(c1_mask[neighbors])
    else:
        return False


ad.obs["connecting_cells"] = [is_connecting(i) for i in range(ad.n_obs)]

palantir.plot.highlight_cells_on_umap(ad, "connecting_cells")
plt.show()

new_ad = ad[~ad.obs["connecting_cells"], :].copy()

Then I reran Palantir with the same number of neighbors as used in sc.pp.neighbors(new_ad), using C1.1771017.031.F08 as a start cell, and C1.1772096.156.G08 and C1.1772117.034.D06 for the "MOL2" and "MOL5" terminal respectively:

palantir.utils.run_diffusion_maps(new_ad, knn=15)
palantir.utils.determine_multiscale_space(new_ad)
start_cell = palantir.utils.early_cell(new_ad, "OPC", "cell_type")
terminal_states = palantir.utils.find_terminal_states(ad, ["MOL2", "MOL5"], "cell_type")
pr_res = palantir.core.run_palantir(new_ad, start_cell, terminal_states=terminal_states)
palantir.plot.plot_palantir_results(new_ad, s=3)
plt.show()

That seems to look much more like what you seem to expect.

I hope this helps. Please let me know if you have any questions about this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pr_pseudotime result triming #103

pr_pseudotime result triming #103

x1han commented Apr 29, 2023

x1han commented Apr 29, 2023

ManuSetty commented May 1, 2023

x1han commented May 1, 2023

ManuSetty commented May 1, 2023

x1han commented May 3, 2023

ManuSetty commented May 7, 2023

x1han commented May 8, 2023 •

edited

Loading

ManuSetty commented May 9, 2023

x1han commented May 9, 2023

katosh commented Jul 16, 2024 •

edited

Loading

pr_pseudotime result triming #103

pr_pseudotime result triming #103

Comments

x1han commented Apr 29, 2023

x1han commented Apr 29, 2023

ManuSetty commented May 1, 2023

x1han commented May 1, 2023

ManuSetty commented May 1, 2023

x1han commented May 3, 2023

ManuSetty commented May 7, 2023

x1han commented May 8, 2023 • edited Loading

ManuSetty commented May 9, 2023

x1han commented May 9, 2023

katosh commented Jul 16, 2024 • edited Loading

x1han commented May 8, 2023 •

edited

Loading

katosh commented Jul 16, 2024 •

edited

Loading