Skip to content

πŸ“š Tutorial on a tabular semantic search system for Amazon e-commerce products that enables natural language queries.

License

Notifications You must be signed in to change notification settings

decodingml/information-retrieval-tutorials

Repository files navigation

Hands-on Amazon Tabular Semantic Search

Open-source series by Decoding ML in collaboration with Superlinked and MongoDB.

architecture_1

🎯 What You'll Build

πŸ“š Tutorial on a tabular semantic search system for Amazon e-commerce products that enables natural language queries.

πŸ” Core Features πŸ› οΈ Tech Stack
β€’ Semantic search for tabular data
β€’ Natural language query processing
β€’ Multi-attribute vector indexing
β€’ RESTful API endpoints
β€’ Tabular semantic search vs. text-to-SQL
β€’ Interactive web interface
β€’ OpenAI LLMs
β€’ MongoDB Atlas Vector Search
β€’ Superlinked
β€’ FastAPI
β€’ LlamaIndex
β€’ Streamlit

Perfect for developers building search functionality in e-commerce or structured data applications.

πŸŽ“ Prerequisites

Category Requirements
Skills Basic knowledge of Python.
Hardware Any modern laptop/workstation will do the job (no GPU or powerful computing power required).
Level Beginner

πŸ’° Cost Structure

All tools used throughout the course will stick to their free tier, except OpenAI's API, which will cost you <1$ to run all our examples.

πŸ“š Articles

Our recommendation for each article:

  • Read the article.
  • Run the Notebook and the code using the INSTALL_AND_USAGE docs.
  • Go deeper into the code
No. Article Description Notebooks Python code
1 Forget text-to-SQL: Use this natural query instead Learn to build a tabular semantic search RESTful API server that enables natural language queries. β€’ 1_eda.ipynb
β€’ 2_tabular_semantic_search_superlinked.ipynb
superlinked_app
2 Tabular semantic search vs. text-to-SQL (WIP) Deep dive into how tabular semantic search works and what it offers in addition to text-to-SQL strategies. β€’ 3_tabular_semantic_search_text_to_sql.ipynb superlinked_app

πŸ—οΈ Project Structure

.
β”œβ”€β”€ data/                                          # Directory where dataset files and processed data will be downloaded.
β”œβ”€β”€ superlinked_app/                               # Main application source code
β”œβ”€β”€ tools/                                         # Utility scripts and helper tools
β”œβ”€β”€ .env                                           # Environment variables for local development
β”œβ”€β”€ .env.example                                   # Template for environment variables
β”œβ”€β”€ 1_eda.ipynb                                    # Notebook for Exploratory Data Analysis for the Amazon dataset
β”œβ”€β”€ 2_tabular_semantic_search_superlinked.ipynb    # Demo notebook for Superlinked tabular semantic search
β”œβ”€β”€ 3_tabular_semantic_search_text_to_sql.ipynb    # Examples of text-to-SQL queries
β”œβ”€β”€ Makefile                                       # Running commands shortcuts
β”œβ”€β”€ pyproject.toml                                 # Python project dependencies and metadata
└── uv.lock                                        # Lock file for uv package manager

πŸ’Ύ Dataset

We will use the ESCI-S: extended metadata for Amazon ESCI dataset dataset released under the Apache-2.0 license.

It is an e-commerce dataset on Amazon products.

The full dataset references ~1.8M unique products. We will work with a sample of 4400 products to make everything lighter, but the code is compatible with the whole dataset.

πŸ“š Read more on the ESCI-S dataset

πŸ’» Explore it in our Dataset Exploration Notebook.

πŸš€ Getting Started

For detailed installation and usage instructions, see our INSTALL_AND_USAGE guide.

Recommendation: While you can follow the installation guide directly, we strongly recommend reading the accompanying articles to gain a complete understanding of the series.

streamlit_app_example

πŸ’‘ Questions and Troubleshooting

Have questions or running into issues? We're here to help!

Open a GitHub issue for:

  • Questions about the series material
  • Technical troubleshooting
  • Clarification on concepts

Sponsors

Superlinked MongoDB
Superlinked Mongo

License

This course is an open-source project released under the MIT license. Thus, as long you distribute our LICENSE and acknowledge your project is based on our work, you can safely clone or fork this project and use it as a source of inspiration for your educational projects (e.g., university, college degree, personal projects, etc.).

About

πŸ“š Tutorial on a tabular semantic search system for Amazon e-commerce products that enables natural language queries.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published