Teal Deer

TLDR_LDA_and_Text_Summarization.ipynb is the primary current notebook.

Currently just hacking notebook. However, the notebook scrapes text from a directory of academic research pdf's, and then does LDA on it for prioritization of reading. Dataset for this run included just a handful of papers on chatbots from arxiv. OCR portion relies on: https://github.com/euske/pdfminer/blob/master/tools/pdf2txt.py

In process:
Adding a text summarization feature to try to generate abstracts or short summaries for large blocks of text (i.e., an abstract for the rest of a paper). So, not only could papers be prioritized, but could be summarized as well.

Planned updates - See project tab as well:

Finish out OCR from PDF files part
Complete the text summarization portion - Thanks to Siraj Raval for making the video: https://www.youtube.com/watch?v=ogrJaOIuBx4
Clean up into python scripts with test suites
Experiment with other front-end usecases: i.e., a slackbot is currently underway (notebook to be added later).
Add a CI framework into this repo.
Cartoon for a fun logo :-)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
Research_LDA.ipynb		Research_LDA.ipynb
TLDR_LDA_and_Text_Summarization.ipynb		TLDR_LDA_and_Text_Summarization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Teal Deer

TLDR_LDA_and_Text_Summarization.ipynb is the primary current notebook.

About

Releases

Packages

Languages

DeepLearningSky/teal_deer

Folders and files

Latest commit

History

Repository files navigation

Teal Deer

TLDR_LDA_and_Text_Summarization.ipynb is the primary current notebook.

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages