Open-source series by Decoding ML in collaboration with Superlinked and MongoDB.
π Tutorial on a tabular semantic search system for Amazon e-commerce products that enables natural language queries.
π Core Features | π οΈ Tech Stack |
β’ Semantic search for tabular data β’ Natural language query processing β’ Multi-attribute vector indexing β’ RESTful API endpoints β’ Tabular semantic search vs. text-to-SQL β’ Interactive web interface |
β’ OpenAI LLMs β’ MongoDB Atlas Vector Search β’ Superlinked β’ FastAPI β’ LlamaIndex β’ Streamlit |
Perfect for developers building search functionality in e-commerce or structured data applications.
Category | Requirements |
---|---|
Skills | Basic knowledge of Python. |
Hardware | Any modern laptop/workstation will do the job (no GPU or powerful computing power required). |
Level | Beginner |
All tools used throughout the course will stick to their free tier, except OpenAI's API, which will cost you <1$ to run all our examples.
Our recommendation for each article:
- Read the article.
- Run the Notebook and the code using the INSTALL_AND_USAGE docs.
- Go deeper into the code
No. | Article | Description | Notebooks | Python code |
---|---|---|---|---|
1 | Forget text-to-SQL: Use this natural query instead | Learn to build a tabular semantic search RESTful API server that enables natural language queries. | β’ 1_eda.ipynb β’ 2_tabular_semantic_search_superlinked.ipynb |
superlinked_app |
2 | Tabular semantic search vs. text-to-SQL (WIP) | Deep dive into how tabular semantic search works and what it offers in addition to text-to-SQL strategies. | β’ 3_tabular_semantic_search_text_to_sql.ipynb | superlinked_app |
.
βββ data/ # Directory where dataset files and processed data will be downloaded.
βββ superlinked_app/ # Main application source code
βββ tools/ # Utility scripts and helper tools
βββ .env # Environment variables for local development
βββ .env.example # Template for environment variables
βββ 1_eda.ipynb # Notebook for Exploratory Data Analysis for the Amazon dataset
βββ 2_tabular_semantic_search_superlinked.ipynb # Demo notebook for Superlinked tabular semantic search
βββ 3_tabular_semantic_search_text_to_sql.ipynb # Examples of text-to-SQL queries
βββ Makefile # Running commands shortcuts
βββ pyproject.toml # Python project dependencies and metadata
βββ uv.lock # Lock file for uv package manager
We will use the ESCI-S: extended metadata for Amazon ESCI dataset dataset released under the Apache-2.0 license.
It is an e-commerce dataset on Amazon products.
The full dataset references ~1.8M unique products. We will work with a sample of 4400 products to make everything lighter, but the code is compatible with the whole dataset.
π Read more on the ESCI-S dataset
π» Explore it in our Dataset Exploration Notebook.
For detailed installation and usage instructions, see our INSTALL_AND_USAGE guide.
Recommendation: While you can follow the installation guide directly, we strongly recommend reading the accompanying articles to gain a complete understanding of the series.
Have questions or running into issues? We're here to help!
Open a GitHub issue for:
- Questions about the series material
- Technical troubleshooting
- Clarification on concepts
Superlinked | MongoDB |
This course is an open-source project released under the MIT license. Thus, as long you distribute our LICENSE and acknowledge your project is based on our work, you can safely clone or fork this project and use it as a source of inspiration for your educational projects (e.g., university, college degree, personal projects, etc.).