Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference between Kobas-i, Reactome website and ReactomePA (may caused by duplicated background genes) #33

Open
Freya-Cui-2020 opened this issue Oct 18, 2021 · 0 comments

Comments

@Freya-Cui-2020
Copy link

Hello,

I have 482 ensembl genes (411 tranformed into entrz using bitr) to perform Reactome pathway gene erichment analysis.

I used the ReactomePA and kobas-i at the same time, with the q value <0.1, I got 7 pathways by kobas-i

ID                                    Description GeneRatio     Bg       pvalue   p.adjust

1 R-HSA-3700989 Transcriptional_Regulation_by_TP53 19/482 356 5.228496e-06 0.00383667
2 R-HSA-74160 Gene_expression_(Transcription) 43/482 1448 4.022863e-05 0.02108555
3 R-HSA-1362409 Mitochondrial_iron-sulfur_cluster_biogenesis 4/482 11 6.167911e-05 0.02828758
4 R-HSA-73857 RNA_Polymerase_II_Transcription 38/482 1316 1.990795e-04 0.06086855
5 R-HSA-212436 Generic_Transcription_Pathway 35/482 1193 2.684548e-04 0.07035433
6 R-HSA-5689896 Ovarian_tumor_domain_proteases 5/482 38 4.633071e-04 0.09443742
7 R-HSA-2426168 Activation_of_gene_expression_by_SREBF_(SREBP) 5/482 40 5.736862e-04 0.09567521

The ReactomePA gave 4 pathways with q value <0.5
ID Description GeneRatio BgRatio pvalue p.adjust
R-HSA-1362409 R-HSA-1362409 Mitochondrial iron-sulfur cluster biogenesis 4/215 13/10856 9.296164e-05 0.0487148
R-HSA-3700989 R-HSA-3700989 Transcriptional Regulation by TP53 19/215 365/10856 1.153013e-04 0.0487148
R-HSA-5689896 R-HSA-5689896 Ovarian tumor domain proteases 5/215 38/10856 8.571709e-04 0.2414365
R-HSA-2426168 R-HSA-2426168 Activation of gene expression by SREBF (SREBP) 5/215 42/10856 1.362514e-03 0.2878311

The R-HSA-74160, R-HSA-73857 and R-HSA-212436 were not calculated in the analysis by ReactomePA. At the meantime, I had the same enrichment results as kobas-i using the reactome website. To find reasons, I checked three aspects:

First, I checked if the pathway exist in the reactome.db.

get("R-HSA-74160", reactomePATHID2NAME)
[1] "Homo sapiens: Gene expression (Transcription)"
get("R-HSA-212436", reactomePATHID2NAME)
[1] "Homo sapiens: Generic Transcription Pathway"
get("R-HSA-73857",reactomePATHID2NAME)
[1] "Homo sapiens: RNA Polymerase II Transcription"

Then, I excluded the possiblity that the changes caused by the gene ID transformation from ENSEMBL to ENTRZ

(df[2,8]%>%strsplit("\|"))[[1]] %in% (entrz2$ENSEMBL%>%as.vector())%>%table()
FALSE TRUE
1 42
There were 42 enriched genes in ENTRZ ID

Third, I checked the background gene numbers in reactome.db

length(get("R-HSA-74160", reactomePATHID2EXTID))
[1] 1837
length(get("R-HSA-74160", reactomePATHID2EXTID)%>%unique())
[1] 1506
It seemed that the background genes are duplicated.

My question is:
I suspected the difference were caused by the duplicate genes in reactome.db. How to avoid this?
I wanted to draw the cneplot of reactome enrichment results by kobas-i, if the duplicated problem could not be solved, how can I achieved the drawing purpose?

Attached is my R sessionInfo:

R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936
[2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages:
[1] parallel stats4 stats graphics
[5] grDevices utils datasets methods
[9] base

other attached packages:
[1] reactome.db_1.76.0 graphite_1.38.0
[3] org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1
[5] IRanges_2.26.0 S4Vectors_0.30.2
[7] Biobase_2.52.0 BiocGenerics_0.38.0
[9] ReactomePA_1.36.0 clusterProfiler_4.0.5
[11] ggplot2_3.3.5

loaded via a namespace (and not attached):
[1] fgsea_1.18.0
[2] colorspace_2.0-2
[3] ggtree_3.0.4
[4] ellipsis_0.3.2
[5] qvalue_2.24.0
[6] XVector_0.32.0
[7] aplot_0.1.1
[8] rstudioapi_0.13
[9] farver_2.1.0
[10] graphlayouts_0.7.1
[11] ggrepel_0.9.1
[12] bit64_4.0.5
[13] fansi_0.5.0
[14] scatterpie_0.1.7
[15] splines_4.1.1
[16] cachem_1.0.6
[17] GOSemSim_2.18.1
[18] polyclip_1.10-0
[19] jsonlite_1.7.2
[20] GO.db_3.13.0
[21] png_0.1-7
[22] graph_1.70.0
[23] ggforce_0.3.3
[24] BiocManager_1.30.16
[25] compiler_4.1.1
[26] httr_1.4.2
[27] backports_1.2.1
[28] assertthat_0.2.1
[29] Matrix_1.3-4
[30] fastmap_1.1.0
[31] lazyeval_0.2.2
[32] tweenr_1.0.2
[33] tools_4.1.1
[34] igraph_1.2.6
[35] gtable_0.3.0
[36] glue_1.4.2
[37] GenomeInfoDbData_1.2.6
[38] reshape2_1.4.4
[39] DO.db_2.9
[40] dplyr_1.0.7
[41] rappdirs_0.3.3
[42] fastmatch_1.1-3
[43] Rcpp_1.0.7
[44] enrichplot_1.12.3
[45] vctrs_0.3.8
[46] Biostrings_2.60.2
[47] ape_5.5
[48] nlme_3.1-153
[49] ggraph_2.0.5
[50] stringr_1.4.0
[51] lifecycle_1.0.1
[52] DOSE_3.18.3
[53] zlibbioc_1.38.0
[54] MASS_7.3-54
[55] scales_1.1.1
[56] tidygraph_1.2.0
[57] RColorBrewer_1.1-2
[58] curl_4.3.2
[59] memoise_2.0.0
[60] gridExtra_2.3
[61] downloader_0.4
[62] ggfun_0.0.4
[63] yulab.utils_0.0.4
[64] stringi_1.7.5
[65] RSQLite_2.2.8
[66] tidytree_0.3.5
[67] checkmate_2.0.0
[68] BiocParallel_1.26.2
[69] GenomeInfoDb_1.28.4
[70] rlang_0.4.11
[71] pkgconfig_2.0.3
[72] bitops_1.0-7
[73] lattice_0.20-45
[74] purrr_0.3.4
[75] labeling_0.4.2
[76] treeio_1.16.2
[77] patchwork_1.1.1
[78] cowplot_1.1.1
[79] shadowtext_0.0.9
[80] bit_4.0.4
[81] tidyselect_1.1.1
[82] plyr_1.8.6
[83] magrittr_2.0.1
[84] R6_2.5.1
[85] generics_0.1.0
[86] DBI_1.1.1
[87] pillar_1.6.3
[88] withr_2.4.2
[89] KEGGREST_1.32.0
[90] RCurl_1.98-1.5
[91] tibble_3.1.4
[92] crayon_1.4.1
[93] utf8_1.2.2
[94] viridis_0.6.2
[95] grid_4.1.1
[96] data.table_1.14.2
[97] blob_1.2.2
[98] digest_0.6.28
[99] tidyr_1.1.4
[100] gridGraphics_0.5-1
[101] munsell_0.5.0
[102] viridisLite_0.4.0
[103] ggplotify_0.1.0

sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936 LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] reactome.db_1.76.0 graphite_1.38.0 org.Hs.eg.db_3.13.0 AnnotationDbi_1.54.1
[5] IRanges_2.26.0 S4Vectors_0.30.2 Biobase_2.52.0 BiocGenerics_0.38.0
[9] ReactomePA_1.36.0 clusterProfiler_4.0.5 ggplot2_3.3.5

loaded via a namespace (and not attached):
[1] fgsea_1.18.0 colorspace_2.0-2 ggtree_3.0.4 ellipsis_0.3.2
[5] qvalue_2.24.0 XVector_0.32.0 aplot_0.1.1 rstudioapi_0.13
[9] farver_2.1.0 graphlayouts_0.7.1 ggrepel_0.9.1 bit64_4.0.5
[13] fansi_0.5.0 scatterpie_0.1.7 splines_4.1.1 cachem_1.0.6
[17] GOSemSim_2.18.1 polyclip_1.10-0 jsonlite_1.7.2 GO.db_3.13.0
[21] png_0.1-7 graph_1.70.0 ggforce_0.3.3 BiocManager_1.30.16
[25] compiler_4.1.1 httr_1.4.2 backports_1.2.1 assertthat_0.2.1
[29] Matrix_1.3-4 fastmap_1.1.0 lazyeval_0.2.2 tweenr_1.0.2
[33] tools_4.1.1 igraph_1.2.6 gtable_0.3.0 glue_1.4.2
[37] GenomeInfoDbData_1.2.6 reshape2_1.4.4 DO.db_2.9 dplyr_1.0.7
[41] rappdirs_0.3.3 fastmatch_1.1-3 Rcpp_1.0.7 enrichplot_1.12.3
[45] vctrs_0.3.8 Biostrings_2.60.2 ape_5.5 nlme_3.1-153
[49] ggraph_2.0.5 stringr_1.4.0 lifecycle_1.0.1 DOSE_3.18.3
[53] zlibbioc_1.38.0 MASS_7.3-54 scales_1.1.1 tidygraph_1.2.0
[57] RColorBrewer_1.1-2 curl_4.3.2 memoise_2.0.0 gridExtra_2.3
[61] downloader_0.4 ggfun_0.0.4 yulab.utils_0.0.4 stringi_1.7.5
[65] RSQLite_2.2.8 tidytree_0.3.5 checkmate_2.0.0 BiocParallel_1.26.2
[69] GenomeInfoDb_1.28.4 rlang_0.4.11 pkgconfig_2.0.3 bitops_1.0-7
[73] lattice_0.20-45 purrr_0.3.4 labeling_0.4.2 treeio_1.16.2
[77] patchwork_1.1.1 cowplot_1.1.1 shadowtext_0.0.9 bit_4.0.4
[81] tidyselect_1.1.1 plyr_1.8.6 magrittr_2.0.1 R6_2.5.1
[85] generics_0.1.0 DBI_1.1.1 pillar_1.6.3 withr_2.4.2
[89] KEGGREST_1.32.0 RCurl_1.98-1.5 tibble_3.1.4 crayon_1.4.1
[93] utf8_1.2.2 viridis_0.6.2 grid_4.1.1 data.table_1.14.2
[97] blob_1.2.2 digest_0.6.28 tidyr_1.1.4 gridGraphics_0.5-1
[101] munsell_0.5.0 viridisLite_0.4.0 ggplotify_0.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant