Due to a recent change in Pinecone's free account offerings, we can now only create index of type `serverless`. Previously, it was possible to create index of type `pods`. As a result, the code for uploading and retrieving vectors using LangChain's previous version is now failing. #23571

Shaileshv1331 · 2024-06-27T07:57:10Z

Shaileshv1331
Jun 27, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

Previous Approach (POD-based Index)

#### Uploading Vectors to Pinecone pod-based Index
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
import pinecone

def load_docs(directory):
    loader = DirectoryLoader(directory, glob='**/*.pdf')
    documents = loader.load()
    print(documents)
    return documents

def split_docs(documents, chunk_size=1000, chunk_overlap=20):
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = text_splitter.split_documents(documents)
    return docs

documents = load_docs(path)
len(documents)
docs = split_docs(documents)
print(len(docs))

embeddings = OpenAIEmbeddings(openai_api_key= openai_api_key)
pinecone.init(api_key=pinecone_api_key, environment=env_name)
index = Pinecone.from_documents(docs, embeddings, index_name=index_name)


#### Retrieving Similar Documents
from langchain.vectorstores import Pinecone
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.embeddings.openai import OpenAIEmbeddings
import pinecone

embeddings = OpenAIEmbeddings(openai_api_key= api_key)
pinecone.init(api_key=pinecone_api_key,environment=env_name)
index = Pinecone.from_existing_index(index_name, embeddings)
llm = ChatOpenAI(temperature=0.0,model_name='gpt-3.5-turbo', openai_api_key=api_key),retriever=index.as_retriever(top_k=10)
qa=ConversationalRetrievalChain.from_llm(llm = llm, return_source_documents=True)

Updated Approach (Serverless Index)

#### Uploading Vectors to Pinecone Serverless Index
from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
import os

os.environ['OPENAI_API_KEY'] = openai_api_key
os.environ['PINECONE_API_KEY'] = pinecone_api_key

embeddings = OpenAIEmbeddings()

def load_docs(directory):
    loader = DirectoryLoader(directory, glob='**/*.pdf')
    documents = loader.load()
    print(documents)
    return documents

def split_docs(documents, chunk_size=500, chunk_overlap=200):
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = text_splitter.split_documents(documents)
    return docs

documents = load_docs(data_path)
docs = split_docs(documents)

vectorstore_from_docs = PineconeVectorStore.from_documents(
    docs,
    index_name=index_name,
    embedding=embeddings
)


#### Retrieving Similar Documents
from langchain_pinecone import PineconeVectorStore
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

llm = ChatOpenAI(temperature=0.0, model_name=model_name, openai_api_key=openai_api_key)
index = PineconeVectorStore(index_name=index_name, embedding=embeddings)
qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=index.as_retriever(top_k=10))

Description

Around 8-9 months ago, you might have used the code (Previous Approach code)to upload vectors to a Pinecone index and retrieve similar documents. This approach worked well with the Pinecone free account at that time.(JFYI I used langchain version 0.0.237 at that time)

Issue Encountered

If you use the that code now, you might encounter an error due to changes in Pinecone's free account, which now only supports serverless indexes. The error stack trace might look like this:

TypeError: expected string or bytes-like object

This error occurred due to changes in the types of indexes supported by the free account. Unfortunately, the POD-based index, which was previously used, is no longer supported under the current free account plan.

To align with the new serverless index type, it's necessary to revise your code using the latest versions of LangChain, specifically utilizing langchain_pinecone.PineconeVectorStore for uploading vectors and langchain_openai.OpenAIEmbeddings for retrieval. The updated approach I mentioned above, tested with LangChain version 0.1.9, ensures successful vector uploading and retrieval operations.

Key Changes

Updated Imports: Use langchain_pinecone.PineconeVectorStore and langchain_openai.OpenAIEmbeddings to handle the serverless index.
Initialization and Upload: Directly initialize and upload vectors using PineconeVectorStore.from_documents.
Document Retrieval: Update the retrieval code to use ConversationalRetrievalChain and PineconeVectorStore.

Conclusion

By following this updated approach, you can smoothly transition to using Pinecone’s serverless indexes with LangChain for both uploading vectors and retrieving similar documents. This ensures that your workflow remains efficient and compatible with Pinecone's latest offerings. This guide helps the LangChain and NLP community adapt to changes in Pinecone’s index types, leveraging the latest features in LangChain for a robust solution to managing and querying document embeddings.

Feel free to implement these changes and share your experiences with the community!

System Info

Ubuntu 22.04.4 LTS
python==3.10.12

@Shaileshv1331 · 2024-06-27T08:03:50Z

dosubot[bot]
bot Jun 27, 2024

Hey there, @Shaileshv1331! I'm here to help you with any bugs, questions, or contributions you might need. Let's work together to solve this issue!

To update your code to use Pinecone's serverless indexes for uploading and retrieving vectors with LangChain, you can follow the example provided below. This approach aligns with the recent changes in Pinecone's free account offerings.

Updated Approach (Serverless Index)

Uploading Vectors to Pinecone Serverless Index

from langchain.document_loaders import DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_pinecone import PineconeVectorStore
from langchain_openai import OpenAIEmbeddings
import os

# Set environment variables
os.environ['OPENAI_API_KEY'] = openai_api_key
os.environ['PINECONE_API_KEY'] = pinecone_api_key

# Initialize embeddings
embeddings = OpenAIEmbeddings()

# Load and split documents
def load_docs(directory):
    loader = DirectoryLoader(directory, glob='**/*.pdf')
    documents = loader.load()
    print(documents)
    return documents

def split_docs(documents, chunk_size=500, chunk_overlap=200):
    text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    docs = text_splitter.split_documents(documents)
    return docs

documents = load_docs(data_path)
docs = split_docs(documents)

# Create Pinecone vector store from documents
vectorstore_from_docs = PineconeVectorStore.from_documents(
    docs,
    index_name=index_name,
    embedding=embeddings
)

Retrieving Similar Documents

from langchain_pinecone import PineconeVectorStore
from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain

# Initialize LLM and Pinecone vector store
llm = ChatOpenAI(temperature=0.0, model_name=model_name, openai_api_key=openai_api_key)
index = PineconeVectorStore(index_name=index_name, embedding=embeddings)

# Create a conversational retrieval chain
qa = ConversationalRetrievalChain.from_llm(llm=llm, retriever=index.as_retriever(top_k=10))

This updated approach ensures compatibility with Pinecone's serverless indexes and uses the latest versions of LangChain, specifically langchain_pinecone.PineconeVectorStore for uploading vectors and langchain_openai.OpenAIEmbeddings for retrieval ^[1].

To continue talking to Dosu, mention @dosu.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Due to a recent change in Pinecone's free account offerings, we can now only create index of type `serverless`. Previously, it was possible to create index of type `pods`. As a result, the code for uploading and retrieving vectors using LangChain's previous version is now failing. #23571

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Due to a recent change in Pinecone's free account offerings, we can now only create index of type serverless. Previously, it was possible to create index of type pods. As a result, the code for uploading and retrieving vectors using LangChain's previous version is now failing. #23571

Shaileshv1331 Jun 27, 2024

Checked other resources

Commit to Help

Example Code

Previous Approach (POD-based Index)

Updated Approach (Serverless Index)

Description

Issue Encountered

Key Changes

Conclusion

System Info

Replies: 1 comment

dosubot[bot] bot Jun 27, 2024

Updated Approach (Serverless Index)

Uploading Vectors to Pinecone Serverless Index

Retrieving Similar Documents

Due to a recent change in Pinecone's free account offerings, we can now only create index of type `serverless`. Previously, it was possible to create index of type `pods`. As a result, the code for uploading and retrieving vectors using LangChain's previous version is now failing. #23571

Shaileshv1331
Jun 27, 2024

dosubot[bot]
bot Jun 27, 2024