Data Exploration and Q&A with LLM

This project implements an interactive data exploration and question-answering system using Large Language Models (LLMs). It provides two main functionalities:

Exploratory Data Analysis (EDA) with automated visualization generation
Natural language question-answering over SQL databases

Features

EDA Pipeline
- Automated SQL query generation from natural language
- Data visualization using LIDA
- Interactive chart editing capabilities
- Visual data exploration
Q&A Pipeline
- Natural language to SQL conversion
- Automated answer generation
- Result visualization
- Data presentation in tabular format

Prerequisites

Python 3.8+
PostgreSQL database
Google Cloud Platform account (for Vertex AI)
Required API keys:
- Groq API
- Google Cloud credentials

Installation

Clone the repository:

git clone <repository-url>
cd <repository-name>

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install required packages:

pip install -r requirements.txt

Set up environment variables by creating a .env file with the following:

GROQ_API_KEY=your_groq_api_key
GOOGLE_APPLICATION_CREDENTIALS=path_to_your_credentials.json
DATABASE_URI=your_database_uri
POSTGRESQL_DATABASE_URI=your_postgresql_uri
DB_DATABASE=your_database_name
DB_USER=your_database_user
DB_HOST=your_database_host
DB_PASSWORD=your_database_password
DB_PORT=your_database_port

Project Structure

src/
├── pipeline/
│   ├── QNA_pipeline.py    # Question-answering pipeline implementation
│   └── eda_pipeline.py    # EDA pipeline implementation
└── utils/
    ├── constants.py       # Configuration and constant values
    └── helpers.py         # Utility functions and helpers

Usage

Start the Streamlit application:

streamlit run app.py

Access the application through your web browser at http://localhost:8501
Choose your analysis type:
- Perform EDA: Enter a natural language query about what you want to explore in your data
- Ask Questions: Enter a specific question about your data
View the results:
- For EDA: Interactive visualizations with editing capabilities
- For Q&A: Text answers with supporting data tables

Features Details

EDA Pipeline

Converts natural language queries to SQL
Automatically generates relevant visualizations
Supports interactive chart editing
Provides data summaries and insights

Q&A Pipeline

Natural language question processing
SQL query generation and execution
Answer generation with context
Result visualization and presentation

Error Handling

The application includes comprehensive error handling and logging:

All operations are logged with appropriate detail levels
User-friendly error messages
Graceful failure handling
Debug information in logs

Acknowledgments

LIDA for visualization generation
Langchain for LLM integration
Streamlit for the web interface
Google Vertex AI for language model capabilities

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Exploration and Q&A with LLM

Features

Prerequisites

Installation

Project Structure

Usage

Features Details

EDA Pipeline

Q&A Pipeline

Error Handling

Acknowledgments

About

Releases

Packages

Languages

NashTech-Labs/EDA-and-QNA-with-LLM

Folders and files

Latest commit

History

Repository files navigation

Data Exploration and Q&A with LLM

Features

Prerequisites

Installation

Project Structure

Usage

Features Details

EDA Pipeline

Q&A Pipeline

Error Handling

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages