How to use RefineDocumentsChain ? #18571

dnnane · 2024-03-05T10:30:38Z

dnnane
Mar 5, 2024

Checked other resources

I added a very descriptive title to this question.
I searched the LangChain documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.

Commit to Help

I commit to help with one of those options 👆

Example Code

from langchain.chains import RefineDocumentsChain, LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI

document_prompt = PromptTemplate(
    input_variables=["page_content"],
    template="{page_content}"
)
document_variable_name = "context"
llm = OpenAI()
prompt = PromptTemplate.from_template(
    "Summarize this content: {context}"
)
initial_llm_chain = LLMChain(llm=llm, prompt=prompt)
initial_response_name = "prev_response"
prompt_refine = PromptTemplate.from_template(
    "Here's your first summary: {prev_response}. "
    "Now add to it based on the following context: {context}"
)
refine_llm_chain = LLMChain(llm=llm, prompt=prompt_refine)
chain = RefineDocumentsChain(
    initial_llm_chain=initial_llm_chain,
    refine_llm_chain=refine_llm_chain,
    document_prompt=document_prompt,
    document_variable_name=document_variable_name,
    initial_response_name=initial_response_name,
    return_intermediate_steps=True
)

Description

I'd like to use the RefineDocumentsChain, here is the code in the documentation : https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.refine.RefineDocumentsChain.html.

It is not complete imo, I am not sure how to use it from an input to get an output. ie: Shall I chunk files before or not ? How to get the result from the invoke ?

Can someone please give a complete example from input to output?

System Info

Not relevant

malaqel · 2024-03-19T22:03:55Z

malaqel
Mar 19, 2024

I have the same issue as well

0 replies

maximeperrindev · 2024-03-20T09:08:24Z

maximeperrindev
Mar 20, 2024

@dnnane,

You can chain the RefineDocumentChainwith a document retriever and an output parser. For example :

from langchain.chains import RefineDocumentsChain, LLMChain
from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
import bs4
from langchain import hub
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load, chunk and index the contents of the blog.
bs_strainer = bs4.SoupStrainer(class_=("post-content", "post-title", "post-header"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs_strainer},
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever()

llm = OpenAI()

document_prompt = PromptTemplate(
    input_variables=["page_content"],
    template="{page_content}"
)
document_variable_name = "context"

prompt = PromptTemplate.from_template(
    "Summarize this content: {context}"
)

initial_llm_chain = LLMChain(llm=llm, prompt=prompt)
initial_response_name = "prev_response"
prompt_refine = PromptTemplate.from_template(
    "Here's your first summary: {prev_response}. "
    "Now add to it based on the following context: {context}"
)
refine_llm_chain = LLMChain(llm=llm, prompt=prompt_refine)

chain = RefineDocumentsChain(
    initial_llm_chain=initial_llm_chain,
    refine_llm_chain=refine_llm_chain,
    document_prompt=document_prompt,
    document_variable_name=document_variable_name,
    initial_response_name=initial_response_name,
    return_intermediate_steps=True
)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | chain
    | StrOutputParser()
)

print(rag_chain.invoke("What is Task Decomposition?"))

This will output the document refining result. You can then chain it to another prompt to answer the question or to add more steps.

0 replies

Haribiddacdw · 2024-03-27T10:52:19Z

Haribiddacdw
Mar 27, 2024

error: missing some input keys:{'input_documents'}"

1 reply

s99100532 Jul 5, 2024

RefineDocumentsChain is a chain class that accept {'input_documents': docs} as input in which the dict key is configurable, please checkout the src for more details ~
e.g.

docs = TextLoader(XXX) # List of Document

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use RefineDocumentsChain ? #18571

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to use RefineDocumentsChain ? #18571

dnnane Mar 5, 2024

Checked other resources

Commit to Help

Example Code

Description

System Info

Replies: 3 comments · 1 reply

malaqel Mar 19, 2024

maximeperrindev Mar 20, 2024

Haribiddacdw Mar 27, 2024

s99100532 Jul 5, 2024

dnnane
Mar 5, 2024

Replies: 3 comments 1 reply

malaqel
Mar 19, 2024

maximeperrindev
Mar 20, 2024

Haribiddacdw
Mar 27, 2024