Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregation of expression data ends up in matrix filled with NaN's #57

Open
zvittorio opened this issue Jan 30, 2024 · 5 comments
Open

Comments

@zvittorio
Copy link

zvittorio commented Jan 30, 2024

Dear SEACell team,

first of all thank you for such an interesting and versatile tool.
I have been recently using it for creating metacells from a scRNA-seq dataset with cells coming from different studies and in turn from different patients.
I wanted to try to repeat the workflow shown for the COVID dataset integration, but I am still at the first round of metacells.
I am running the basic pipeline shown in notebooks/SEACell_computation.ipynb iteratively across the samples, and I am using the soft assignment for binning the cells.
Everything seems to run smoothly, except for some samples which have no apparent difference (in data) from the other ones. In those cases, the expression matrix of the metacell (X slot) is completely filled with NaN's, even though the X slot of the starting anndata object, the anndata layer used for aggregation, and the X_pca are not.
This is an example:

  • adata.X.toarray() :
    image

  • adata.layers['norm_counts'].toarray() :
    image

  • adata.obsm['X_pca'] :
    image

  • whereas this the output of metacell.X.toarray() :
    image

I have also inspected the figures produced in the workflow, but none of them looks abnormal based on my understanding (should I pay attention to one of them specifically in this case? If so, what should I look at?)

Finally, this is the code I have used for producing the metacell object:

for sample in rerun_these :
    print("Analyzing", sample)
    ad_tmp = adata_big[adata_big.obs['Sample'] == sample].copy()
    
    n_SEACells = ceil(ad_tmp.n_obs / 75)
    
    # renormalize 
    ad_tmp.X = ad_tmp.layers['counts'].copy()
    sc.pp.normalize_total(ad_tmp, target_sum=1e4)
    ad_tmp.layers['norm_counts'] = ad_tmp.X.copy()
    
    # rerun pca
    sc.pp.log1p(ad_tmp)
    sc.pp.pca(ad_tmp, n_comps=50)
    
    model = SEACells.core.SEACells(ad_tmp, 
                  build_kernel_on=build_kernel_on, 
                  n_SEACells= n_SEACells , 
                  n_waypoint_eigs=n_waypoint_eigs,
                  convergence_epsilon = 1e-5)
    
    model.construct_kernel_matrix()
    M = model.kernel_matrix
    
    model.initialize_archetypes()
                
    model.fit(min_iter=10, max_iter=1000)
           
    SEACell_soft_ad = SEACells.core.summarize_by_soft_SEACell(ad_tmp, model.A_, celltype_label='celltype_col',summarize_layer='norm_counts', minimum_weight=0.05)    
    
    rerun_dict[sample] = SEACell_soft_ad

Thank you for any help or suggestions you can provide!

Vittorio

anndata     0.7.6
scanpy      1.8.1
sinfo       0.3.4
-----
PIL                 8.2.0
SEACells            NA
backcall            0.2.0
bottleneck          1.3.2
cairo               1.20.1
cffi                1.14.5
colorama            0.4.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.1
debugpy             1.3.0
decorator           5.0.7
fcsparser           0.2.3
h5py                3.2.1
igraph              0.9.6
ipykernel           6.0.0
ipython_genutils    0.2.0
ipywidgets          7.6.3
jedi                0.18.0
joblib              1.0.1
kiwisolver          1.3.1
leidenalg           0.8.7
llvmlite            0.36.0
loompy              3.0.7
matplotlib          3.4.2
matplotlib_inline   NA
mpl_toolkits        NA
natsort             7.1.1
ncls                0.0.67
netifaces           0.10.9
networkx            2.5.1
numba               0.53.1
numexpr             2.7.3
numpy               1.20.3
numpy_groupies      0.9.14
packaging           20.9
palantir            1.2
pandas              1.2.4
parso               0.8.2
pexpect             4.8.0
phenograph          1.5.7
pickleshare         0.7.5
pkg_resources       NA
progressbar         4.2.0
prompt_toolkit      3.0.19
psutil              5.8.0
ptyprocess          0.7.0
pycparser           2.20
pyexpat             NA
pygam               0.8.0
pygments            2.9.0
pynndescent         0.5.4
pyparsing           2.4.7
pyranges            0.0.110
pyrle               0.0.33
python_utils        NA
pytoml              NA
pytz                2021.1
scipy               1.6.3
seaborn             0.11.2
setuptools_scm      NA
simplejson          3.17.2
sitecustomize       NA
six                 1.16.0
sklearn             0.24.2
sorted_nearest      0.0.32
sphinxcontrib       NA
statsmodels         0.12.2
storemagic          NA
tables              3.6.1
tabulate            0.8.9
texttable           1.6.4
tornado             6.1
tqdm                4.61.2
traitlets           5.0.5
typing_extensions   NA
umap                0.5.1
wcwidth             0.2.5
zmq                 22.1.0
-----
IPython             7.25.0
jupyter_client      6.1.12
jupyter_core        4.7.1
notebook            6.4.0
-----
Python 3.9.5 (default, Dec 21 2022, 10:33:37)
@sitarapersad
Copy link
Collaborator

Can you double check for me what the output of SEACell_ad = SEACells.core.summarize_by_SEACell(ad, SEACells_label='SEACell', summarize_layer='raw') gives you? Thanks!

@Gwennerd
Copy link

Gwennerd commented Jun 6, 2024

Hi, I am running into the same problem with running SEACells, I want to use the seacells soft assignment, but indeed get the NaN output even though the data looks normal to me.

@Gwennerd
Copy link

Can you double check for me what the output of SEACell_ad = SEACells.core.summarize_by_SEACell(ad, SEACells_label='SEACell', summarize_layer='raw') gives you? Thanks!

With my code, the result you requested with: SEACell_ad = SEACells.core.summarize_by_SEACell(ad, SEACells_label='SEACell', summarize_layer='raw')

looks like this:

error_Nan

Hopefully this gives you the information you need to help me out.

Kind regards,
Gwen

@GLking123
Copy link

Hi, I am running into the same problem with running SEACells, I want to use the seacells soft assignment, but indeed get the NaN output even though the data looks normal to me.

Hello, I had the same problem recently, did you solve it? Is there any good way, thanks for the reply.

@Gwennerd
Copy link

Hi, I am running into the same problem with running SEACells, I want to use the seacells soft assignment, but indeed get the NaN output even though the data looks normal to me.

Hello, I had the same problem recently, did you solve it? Is there any good way, thanks for the reply.

Sadly not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants