I uploaded 2 files programmatically to OpenAI's Vector Store, one in .pdf
and one in .md
format, both covering policy documents for a ficticious company. OpenAI takes care of creating chunks and embeddings in an optimised way, no need to address these ourselves.
I'm using the Assistants API to search these files to answer user questions, using the uploaded files as embedded documents. The Assistant automatically decides which document to use and answers the related questions correctly:
I uploaded the Microsoft Corporation's latest weekly stock price data for the last 2 years in pdf
file format into OpenAI's Vector Store.
I enabled both the 'file_search' and the 'code_interpreter' tools for the Assistant and asked it to visualise the data:
Prompt: Visualize the Microsoft Corporations stock prices for the last 2 years
I repeated this exercise with providing a .json
file instead of the pdf
and OpenAI was able to create the same result. At the time of writing, other file formats, such as .csv
or .xlsx
were not supported for embeddings.
-
Prerequisites:
- Make sure Python3 is installed.
- If you don't have an account with OpenAI, create one here: https://openai.com/ then create a project API key under Dashboard / API keys.
-
Clone the project.
-
Create a virtual environment inside the project folder:
python -m venv venv
-
Activate the virtual environment:
Mac:
source venv/bin/activate
Windows:
venv\Scripts\activate
-
Select interpreter in VSCode:
(on Mac) - Cmd + Shift + P ---> Select Interpreter ---> Select the created
venv
environment(on Windows) -
-
Install the python dependencies:
pip install -r requirements.txt
-
Create an
.env
file in the root folder and add your project's API key:OPENAI_API_KEY=your-unique-opanai-project-key
-
Run Jupyter Notebook:
jupyter notebook
-
Run the code snippets in the given/desired order.
- OpenAI: https://openai.com