Shaping Haystack 2.0 #5568
Replies: 22 comments 8 replies
-
It's now available a PyPI package shipping the code in
This is already used by the (work in progress) Chroma Document Store, see its pyproject.toml |
Beta Was this translation helpful? Give feedback.
-
Sentence Transformers Embedders for Haystack 2.xIn our recent Embedders proposal (https://github.com/deepset-ai/haystack/blob/c38943721fbd702367a8934cb660a72b3143eb86/proposals/text/5390-embedders.md) we defined how embedding text should work in Haystack 2.x. Today, thanks to @anakin87 , we added the first building blocks of this new architecture: On their own, Embedders simply take some textual data, be it raw strings or whole Documents, and create an embedding for it. Soon you will also be able to use these Embedders to perform dense retrieval, so stay tuned! For more information, check out the relevant issue: #5567 |
Beta Was this translation helpful? Give feedback.
-
LLM Support in Haystack 2.0 - ProposalThe proposal for LLM support in Haystack 2.0 has been finally merged 🎉 Here the Proposal's text: https://github.com/deepset-ai/haystack/blob/main/proposals/text/5540-llm-support-2.0.md and here you can find an earlier thread on the same topic: https://discordapp.com/channels/993534733298450452/1141684516709212160/1141684516709212160 The key takeaway from this Proposal is that we're breaking down
This separation should give you more control over the LLM, how is queried, and adds visibility to what it produces. The tradeoff is a more verbose pipeline definition, but we're collecting user feedback to better understand the impact and potential solutions. The implementation of these components has already started, so soon you can expect more announcements on this topic and working examples 🚀 |
Beta Was this translation helpful? Give feedback.
-
Hello, bilgeyucel. |
Beta Was this translation helpful? Give feedback.
-
AnswerBuilder component now available!We just merged a PR that introduces a new component for Haystack 2.0 - the The |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
PyPDFToDocument component is now availablePyPDFToDocument component is designed to convert PDF files into a list of Document objects, which can then be seamlessly used in Haystack 2.0 pipelines. For more details, see this PR Here is how you can use this component: from haystack.preview.components.file_converters.pypdf import PyPDFToDocument
paths = [preview_samples_path / "pdf" / "react_paper.pdf"]
converter = PyPDFToDocument()
output = converter.run(paths=paths)
docs = output["documents"]
assert len(docs) == 1
assert "ReAct" in docs[0].text |
Beta Was this translation helpful? Give feedback.
-
LinkContentFetcher component releasedLinkContentFetcher is responsible for fetching content from a given URL and converting it into a Document object, which can then be used in your Haystack 2.0 pipeline. For more details, see #5724 Here is how you can use this component: from haystack.preview import Document
from haystack.preview.components.fetchers import LinkContentFetcher
lcf = LinkContentFetcher()
doc: Document = lcf.fetch(url="example.com")
print(doc) |
Beta Was this translation helpful? Give feedback.
-
Extractive QA now supportedWith the release of from haystack.preview import Pipeline, Document
from haystack.preview.document_stores import MemoryDocumentStore
from haystack.preview.components.retrievers import MemoryBM25Retriever
from haystack.preview.components.readers import ExtractiveReader
document_store = MemoryDocumentStore()
documents = [
Document(text="My name is Jean and I live in Paris."),
Document(text="My name is Mark and I live in Berlin."),
Document(text="My name is Giorgio and I live in Rome."),
]
document_store.write_documents(documents)
qa_pipeline = Pipeline()
qa_pipeline.add_component(instance=MemoryBM25Retriever(document_store=document_store), name="retriever")
qa_pipeline.add_component(instance=ExtractiveReader(model_name_or_path="deepset/tinyroberta-squad2"), name="reader")
qa_pipeline.connect("retriever", "reader")
question = "Who lives in Paris?"
result = qa_pipeline.run({"retriever": {"query": question}, "reader": {"query": question}})
print(result["reader"]["answers"].data) |
Beta Was this translation helpful? Give feedback.
-
Does Haystack 2.0 include HyDE(Hypothetical Document Embeddings) as a retrieval method? |
Beta Was this translation helpful? Give feedback.
-
Hi, is there an update on this discussion? I think this has not been updated for a while, so I wonder what is the current status and when should we expect v2 to be ready. |
Beta Was this translation helpful? Give feedback.
-
@sandangel the updates stopped when we merged Haystack 2.x preview to main. Simply follow the development on the main branch. The old main branch is now 1.x branch. We have recently released beta3 of Haystack 2.x and expect these beta releases to continue. There is no definite cutoff date for the 2.0 final but it should come soon-ish. |
Beta Was this translation helpful? Give feedback.
-
Status Update 🚀
|
Beta Was this translation helpful? Give feedback.
-
@TuanaCelik @vblagoje Thank you so much for the update. I will follow the development on main branch. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone, we have just published a new discussion entry: Haystack 2.0-Beta. The new discussion will serve as your ultimate guide until the stable release of Haystack 2.0. |
Beta Was this translation helpful? Give feedback.
-
Closing this discussion in favor of the Haystack 2.0-Beta discussion following the beta release. |
Beta Was this translation helpful? Give feedback.
-
Since Haystack v1.15, we’ve been slowly introducing new components and features to Haystack in the background in preparation for Haystack 2.0 (or v2). After the work we’ve put into the new design of the Haystack API over the last few months, we’re at a point where we would love to start involving the Haystack community in our thought process and slowly gather your input and feedback. In this discussion, we would like to highlight where we are at for the design of the new Haystack API for 2.0, what we want to achieve with the new design, and what our current considerations are.
❓ What does the new 2.0 version mean?
Haystack 2.0 will be a major update to the design of Haystack nodes and pipelines. We believe that the pipeline concept is a fundamental requirement and an optimal fit for building applications with LLMs. Therefore, Pipelines and Nodes will continue to be the foundation of Haystack 2.0. However, the general pipeline structure, Nodes API, and the connection between DocumentStore and Retrievers will change. So, this will be a breaking change for Haystack users.
🏆 Motivation behind Haystack 2.0
At deepset, we put a lot of thought and care into maintaining Haystack as a robust, user-friendly, and production-ready LLM framework. As we have collected feedback from the Haystack community over the years and observed the advancements in the NLP field, such as LLMs and Agents, we see the need to update the pipeline structure with Haystack 2.0 to better align with our users’ needs and state-of-the-art NLP approaches.
When ready, Haystack 2.0 will introduce many improvements, flexibility and, most importantly, it will allow Haystack users to implement customizations and extensions to Haystack much more easily. The new pipeline structure will allow for more flexible, robust, and powerful pipelines. As we change the pipeline structure, we’ll be adapting all components to the new structure, therefore, rewriting many of them. This update gives us the opportunity to enhance the pipeline structure to better make use of LLMs, improve our Agent and Memory implementations, better define the connection between the DocumentStore and Retriever, and so on.
📍 Current status of Haystack 2.0
Haystack 2.0 is still a work in progress. We are defining the requirements for a more powerful and robust LLM framework with continuous feedback from the community, and we’re implementing the new Haystack API so that it’s aligned with the advances in NLP.
Although still in beta, you can find what’s been implemented so far in the preview package of the Haystack repository. To learn how and when components will be migrated, have a look at the Migrate Components to Pipeline v2 roadmap item, where we keep track of issues and PRs about Haystack 2.0. For a detailed overview of the current state of 2.0, check out Sara’s presentation about Haystack 2.0.
Additionally, here is the complete list of proposals so far shaping the design of Haystack 2.0:
🧱 Implemented 2.0 Components and DocumentStores
Using implemented components and document stores, you can already start to:
Full List of Components
Full List of Document Stores
⭐ Highlights of Haystack 2.0
Pipeline Nodes will be now called Components.
The new pipeline structure will provide better support for LLMs. The flexible connection between components will introduce new mechanisms, such as parallel branching and looping, that extend the capabilities of pipelines. Components will control the input and output of the pipeline. Thus, components with dynamic input parameters, such as those that use prompts with variables, will easily integrate into the pipeline. Overall, these refinements will not only improve the linear workflows but also ensure that pipelines seamlessly align with the nature of LLMs.
Here is what a RAG pipeline might look like in Haystack 2.0.👇🏼
Representation of a RAG pipeline in Haystack 2.0
The Components API will change. Components will define the name and the type of all of their inputs and outputs. The new API will reduce complexity and make it easier to create custom components such as Haystack integrations for third-party APIs and databases. The connections between components will be validated before running the pipeline, and Haystack will generate better error messages with instructions on fixing the errors.
Retrievers will be customized for DocumentStore, not for retrieval methods. Each DocumentStore will have its own Retriever, highly specialized for that specific DocumentStore, handling all its requirements without being bound to a generic interface. Integrating a new DocumentStore will be easier, and the specialized Retriever will be able to adapt more quickly to the new features of the DocumentStore.
The Embedder will be a separate component instead of being a part of a Retriever. Retrievers won’t be responsible for creating embeddings, the new Embedder component will handle the creation of embeddings. The Retriever class will be simplified, and adding support for new embedding providers and approaches will be more straightforward.
Pipeline serialization will be more flexible and optimized for humans. JSON, TOML, HCL will be used as serialization formats. Serialization and deserialization of pipelines sharing the same component instance will be possible.
➡️ What’s next?
As we iterate on Haystack 2.0, we’ll update this discussion regularly to reflect the latest changes. We’ll share the design proposals with you in the comments below, update the list above as well and start a conversation about topics where we need your input. As we share more information about Haystack 2.0, please feel free to share your feedback or concerns. If you’d like to get notified when there is an update about Haystack 2.0, subscribe to this entry. You can always contact us using the comments section or the Haystack Discord server to ask questions.
Beta Was this translation helpful? Give feedback.
All reactions