In this repository we showcase some common usage of Deep Search for Document conversion as well as Data and Knowledge exploration.
Examples rely on having valid credentials in the file
ds-auth.json
(see example content in ./ds-auth.json.example). To obtain your credentials, please refer to the documentation page https://ds4sd.github.io/deepsearch-toolkit/getting_started/#authentication. The file can also be generated viadeepsearch login --output ds-auth.json
Name | Description | |
---|---|---|
1. | Convert documents quick start | Full example on programmatic document conversion |
2. | Convert documents with custom settings | Full example on programmatic document conversion with custom conversion settings |
3. | Visualize bounding boxes | Visualize the bbox of the text elements |
4. | Extract figures from documents | Given a PDF file, extract the figures |
5. | Extract tables | Given a PDF file, extract the tables |
This section will showcase examples which query data processed via Deep Search.
Name | Description | |
---|---|---|
1. | Data query quick start | Example listing data collections, making search in one and more document collections, using source for projection |
2. | Chemistry search queries | Search the chemistry databases for known molecules |
This section will showcase examples for bringing your own documents, csv data, nlp models and more.
Name | Description | |
---|---|---|
1. | Bring your own PDF | Upload your own PDF documents and search on them |
2. | Export to JSON | Export Deep Search index items to JSON |
3. | Bring your own DataFrame | Bring your own DataFrame from CSV, XLSX, etc and explore the content in a knowledge graph |
This section will showcase examples for managing index item attachments and metadata.
Name | Description | |
---|---|---|
1. | Manage attachments | Manage index item attachments |
This section will showcase examples related to the use of knowledge graphs (KGs) in Deep Search.
Name | Description | |
---|---|---|
1. | Using Deep Search KGs with PyTorch Geometric | Download knowledge graphs from Deep Search and import them in PyTorch Geometric. |
This section will showcase examples related to the integration of Deep Search with other tools and utilities.
Name | Description | |
---|---|---|
1. | Annotations on argilla.io | Use argilla.io for annotating the content of documents. |
The examples contained in this catalog depend on the deepsearch-toolkit
as well as
other modules needed for the showcase demonstrated (e.g. pandas
, matplotlib
, rdkit
, etc).
Please refer to the poetry pyproject.toml
or requirements.txt
for a complete list.
Python dependencies are installed with
pip install -r requirements.txt
Additionally, some examples rely on system packages. When this is the case, the README of the individual example will contain more details on which package is required. The auxiliary file apt.txt list all such packages for a Debian-bases OS. They can be installed with
xargs sudo apt-get install < apt.txt
The Deep Search Toolkit
codebase is under MIT license.
For individual model usage, please refer to the model licenses found in the original packages.