GitHub - SeeMirra/Wingman: Custom AI Generator -- Pretrain your LLM Models with this Automated Embedding Generator and model Q&A Interface. Uses Retrieval Augmented Generation (RAG) to reduce hallucinations and ground the LLM on a source of truth

Pretrain your Machine Learning Models (LLM) with your own data
Comes complete with an Automated Embedding Generator and model Q&A Interface
View Demo · Report Bug · Request Feature

Table of Contents

About The Project
- Demo Animation
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

Automatically Pretrain your Machine Learning Models (LLM) with:

PDF Files
Github Repositories
Scraped HTML files in a local folder

This tooling will accept input in the form of a github repo url, pdf file, or local html files folder and perform the following actions:

generate_embedding_github/pdf.py
• Break apart your input data into manageable chunks
• Send chunked data to Ray Serve Cluster
• Use Ray Cluster to create an embedding from our input chunks

serve run serve:deployment
• Use Ray Cluster to download a Foundational Model
• Load Foundation Model with our Embedding on top
• Start a WebServer and make the Model available via api

query.py "what is the api endpoint to get a list of agents"
• Allow you to interface with the model through the API

This tooling is for anyone who wants to train an LLM on a specific source of knowledge in a simple way, where all the heavy lifting has been abstracted behind-the-scenes

Wingman is built on top of a Ray Cluster so it can either be scalable and distributed, or can be run on just one machine.

Demo Animation

⬆️

Built With

This project (and many others) would not be possible without the following:

Link	Name	Developer	Description
Link	Faiss	Facebook Research	A library for efficient similarity search and clustering of dense vectors.
Link	LangChain	LangChain	LangChain is a framework for developing applications powered by language models.
Link	Ray	Ray Project	Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Link	Python	Python Software Foundation	Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation.
Link	PyTorch	The Linux Foundation	Tensors and Dynamic neural networks in Python with strong GPU acceleration.
Link	Beautiful Soup	Leonard Richardson	Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree.
Link	Typing_Inspect	Ivan Levkivskyi	The typing_inspect module defines an experimental API for runtime inspection of types defined in the Python standard typing module.

⬆️

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

16+GB of VRam (24GB Recommended)
Linux or WSL
Whatever Python Virtual Environment You'd like
(We like MiniConda)

Installation

Once you are in the project folder and have your venv/conda environment loaded, run the following:

pip install -r requirements.txt
python generate_embedding_pdf.py ./PathTo/local.pdf
(Optional) Modify prompt in serve.py on line 30 to suit your use case
serve run serve:deployment
python query.py "what is the api endpoint to disable data collection for a specified agent"

Adding an interim launcher as well as a UI are both on the current roadmap for this open source edition.

⬆️

Usage

python query.py "what is the api endpoint to disable data collection for a specified agent"
/api/sn_agent/agents/{agent_id}/data/off.

python query.py "what is the api endpoint for the ActivitySubscriptions API"
The API endpoint for the ActivitySubscriptions API is /now/actsub/activities.

python query.py "what is the api endpoint to get a list of agents"
The API endpoint to get a list of agents is "/api/sn_agent/agents/list.

python query.py "what is the api endpoint of the Agent Client Collector API"
The API endpoint of the Agent Client Collector API is "https://<sn_agent-host>:<sn_agent-port>/api/agent-client-collector/admin".

⬆️

Roadmap

For Additional Features such as Page Number Citations, Additional Programming Language Compability, Multi-LLM Pipelines (summarize relevant passages for better context utilization), Mulimodal Model Support (Train your knowledge embedding based on data in images), Increased Accuracy via 3D Vector Database (Vector Cloud) Support, Agent Support (Complete tasks based on facts in ingested knowledge source), Docker & Kubernetes Support, and more please contact us about our Enterprise Software Suite.

See the open issues for a full list of proposed features (and known issues).

⬆️

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

⬆️

License

Commercial use prohibited.

Contact us for a commercial license for our Enterprise Version.

⬆️

Contact

Christian Mirra - LinkedIn

Project Link: https://github.com/SeeMirra/Wingman/

⬆️

Acknowledgments

Todo -- This list is currently incomplete

⬆️

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
SPP_APIGuide.pdf		SPP_APIGuide.pdf
Salesforce_REST_API_Developer_Guide.pdf		Salesforce_REST_API_Developer_Guide.pdf
azure-devops-get-started-azure-devops.pdf		azure-devops-get-started-azure-devops.pdf
generate_embedding_github.py		generate_embedding_github.py
generate_embedding_pdf.py		generate_embedding_pdf.py
local_embeddings.py		local_embeddings.py
local_pipelines.py		local_pipelines.py
query.py		query.py
requirements.txt		requirements.txt
serve.py		serve.py
wingman.gif		wingman.gif
wingman_logo.png		wingman_logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation