Merge pull request #654 from chhoumann/feature/local-llms

chhoumann · Mar 3, 2024 · d8d7c64 · d8d7c64
2 parents ac95d2f + 0fc88ae
commit d8d7c64
Show file tree

Hide file tree

Showing 18 changed files with 638 additions and 117 deletions.
diff --git a/docs/docs/AIAssistant.md b/docs/docs/AIAssistant.md
@@ -3,14 +3,16 @@ title: AI Assistant
 ---
 
 # AI Assistant
-The AI Assistant in QuickAdd leverages the power of OpenAI's GPT-3 and GPT-4 models to act as your personal AI assistant within Obsidian. It can streamline your workflows by automating routine tasks and providing intellectual support. To use this feature, you need the QuickAdd plugin and an OpenAI API key. 
+
+The AI Assistant in QuickAdd leverages the power of Large Language Models (LLMs) to act as your personal AI assistant within Obsidian. It can streamline your workflows by automating routine tasks and providing intellectual support. To use this feature, you need the QuickAdd plugin and a provider you'd like to use.
 
 ## How to Setup the AI Assistant
+
 To set up the AI Assistant, follow these steps:
 
 1. In Obsidian, create a new folder dedicated to AI prompt templates, e.g. `bins/ai_prompts`.
 2. Navigate to QuickAdd settings and locate the "AI Assistant" section. Specify the path to the folder you created in step 1.
-3. In the same section, paste your OpenAI API key into the "OpenAI API Key" field.
+3. In the same section, add a provider to get started. If you are using OpenAI, you will need to add your API key to the settings. As of v1.8.x, you need to enter your API key in the [provider](#providers) settings. The video below is from an older version, but the process is the similar.
 
 ![AI Assistant Setup](./Images/AI_Assistant_Setup.gif)
 
@@ -31,46 +33,91 @@ Here's an example of how you can set up a prompt template:
 
 You can also use AI Assistant features from within the [API](./QuickAddAPI.md).
 
+## Providers
+
+QuickAdd supports multiple providers for LLMs.
+The only requirement is that they are OpenAI-compatible, which means their API should be similar to OpenAIs.
+
+Here are a few providers that are known to work with QuickAdd:
+
+-   [OpenAI](https://openai.com)
+-   [TogetherAI](https://www.together.ai)
+-   [Groq](https://groq.com)
+-   [Ollama (local)](https://ollama.com)
+
+Paid providers expose their own API, which you can use with QuickAdd. Free providers, such as Ollama, are also supported.
+
+By default, QuickAdd will add the OpenAI provider. You can add more providers by clicking the "Add Provider" button in the AI Assistant settings.
+
+Here's a video showcasing adding Groq as a provider:
+
+<video controls style={{width: "100%"}}>
+
+  <source src="https://github.com/chhoumann/quickadd/assets/29108628/493b556a-a8cd-4445-aa39-054d379c7bb9" type="video/mp4"/>
+</video>
+
+### Local LLMs
+
+You can use your own machine to run LLMs. This is useful if you want to keep your data private, or if you want to use a specific model that isn't available on the cloud.
+To use a local LLM, you need to set up a server that can run the model.
+You can then add the server as a provider in QuickAdd.
+
+One such server is [Ollama](https://ollama.com). Ollama is a free, open-source, and self-hosted LLM server. You can set up Ollama on your own machine, and then use it as a provider in QuickAdd.
+You can find the [quick start documentation here](https://github.com/ollama/ollama/blob/main/README.md#quickstart).
+Ollama binds to the port `11434` ([src](https://github.com/ollama/ollama/blob/main/docs/faq.md#how-can-i-expose-ollama-on-my-network)), so your provider settings would be as follows:
+
+```
+Name: Ollama
+URL: http://localhost:11434/v1
+Api Key: (empty)
+```
+
+And that's it! You can now use Ollama as a provider in QuickAdd.
+Make sure you add the model you want to use. [mistral](https://ollama.com/library/mistral) is great.
+
 ## AI Assistant Settings
+
 Within the main AI Assistant settings accessible via QuickAdd settings, you can configure the following options:
 
-- OpenAI API Key: The key to interact with OpenAI's models.
-- Prompt Templates Folder: The location where all your prompt templates reside.
-- Default model: The default OpenAI model to be used.
-- Show Assistant: Toggle for status messages.
-- Default System Prompt Template: Sets the behavior of the model.
+-   OpenAI API Key: The key to interact with OpenAI's models.
+-   Prompt Templates Folder: The location where all your prompt templates reside.
+-   Default model: The default OpenAI model to be used.
+-   Show Assistant: Toggle for status messages.
+-   Default System Prompt Template: Sets the behavior of the model.
 
 For each individual AI Assistant command in your macros, you can set these options:
 
-- Prompt Template: Determines the prompt template to use.
-- Model: Specifies the OpenAI model to use, overriding the default model.
-- Output Name Variable: Sets the variable name for the AI Assistant’s output.
-- System Prompt Template: Determines the models behavior, overriding the default system prompt template.
+-   Prompt Template: Determines the prompt template to use.
+-   Model: Specifies the OpenAI model to use, overriding the default model.
+-   Output Name Variable: Sets the variable name for the AI Assistant’s output.
+-   System Prompt Template: Determines the models behavior, overriding the default system prompt template.
 
 You can also tweak model parameters in advanced settings:
-- **temperature:** Allows you to adjust the sampling temperature between 0 and 2. Higher values result in more random outputs, while lower values make the output more focused and deterministic.
-- **top_p:** This parameter relates to nucleus sampling. The model considers only the tokens comprising the top 'p' probability mass. For example, 0.1 means only tokens from the top 10% probability mass are considered.
-- **frequency_penalty:** A parameter ranging between -2.0 and 2.0. Positive values penalize new tokens based on their frequency in the existing text, reducing the model's tendency to repeat the same lines.
-- **presence_penalty:** Also ranging between -2.0 and 2.0, positive values penalize new tokens based on their presence in the existing text, encouraging the model to introduce new topics.
 
+-   **temperature:** Allows you to adjust the sampling temperature between 0 and 2. Higher values result in more random outputs, while lower values make the output more focused and deterministic.
+-   **top_p:** This parameter relates to nucleus sampling. The model considers only the tokens comprising the top 'p' probability mass. For example, 0.1 means only tokens from the top 10% probability mass are considered.
+-   **frequency_penalty:** A parameter ranging between -2.0 and 2.0. Positive values penalize new tokens based on their frequency in the existing text, reducing the model's tendency to repeat the same lines.
+-   **presence_penalty:** Also ranging between -2.0 and 2.0, positive values penalize new tokens based on their presence in the existing text, encouraging the model to introduce new topics.
 
 ## AI-Powered Workflows
+
 You can create powerful workflows utilizing the AI Assistant. Some examples are:
 
-- **Generating Writing Prompts:** Using links to related notes to generate writing prompts.
-- **Summarizer:** Create summaries of selected text.
-- **Transform Selected:** Transform selected text based on provided instructions.
-- **Flashcard Creator:** Generate flashcards based on selected text.
-- **Get Me Started Writing About…:** Generate points to kickstart your writing on a given topic.
-- **Manual Prompt:** Provide a manual prompt to the AI assistant.
-- **Alternative Viewpoints:** Obtain alternative perspectives and improvements on your draft.
-- **Prompt Chaining:** Chain multiple prompts together, with each prompt using the output of the previous one.
+-   **Generating Writing Prompts:** Using links to related notes to generate writing prompts.
+-   **Summarizer:** Create summaries of selected text.
+-   **Transform Selected:** Transform selected text based on provided instructions.
+-   **Flashcard Creator:** Generate flashcards based on selected text.
+-   **Get Me Started Writing About…:** Generate points to kickstart your writing on a given topic.
+-   **Manual Prompt:** Provide a manual prompt to the AI assistant.
+-   **Alternative Viewpoints:** Obtain alternative perspectives and improvements on your draft.
+-   **Prompt Chaining:** Chain multiple prompts together, with each prompt using the output of the previous one.
 
 All of these examples, and more, can be found in [Christian's blog post about the AI Assistant](https://bagerbach.com/blog/obsidian-ai).
 
 Please note, using the AI Assistant will incur costs depending on the API usage. Set spending limits on your OpenAI account to avoid unexpected expenses. Play around with different models to find the one that best suits your needs.
 
 ### Example: Summarizer
+
 Here’s a simple prompt where you select some text, and then use the assistant with that prompt.
 Then it’ll spit out an AI-generated summary:
 
@@ -79,4 +126,4 @@ Please summarize the following text. Use only the text itself as material for su
 {{value}}
 ```
 
-You can use the getting-started demonstration shown earlier to set this up.
+You can use the getting-started demonstration shown earlier to set this up.
diff --git a/src/ai/AIAssistant.ts b/src/ai/AIAssistant.ts
@@ -1,22 +1,29 @@
 import GenericSuggester from "src/gui/GenericSuggester/genericSuggester";
-import type { Model } from "./models";
 import { TFile } from "obsidian";
 import { getMarkdownFilesInFolder } from "src/utilityObsidian";
 import invariant from "src/utils/invariant";
 import type { OpenAIModelParameters } from "./OpenAIModelParameters";
 import { settingsStore } from "src/settingsStore";
-import { encodingForModel } from "js-tiktoken";
+import type { TiktokenModel} from "js-tiktoken";
+import { encodingForModel, getEncoding } from "js-tiktoken";
 import { OpenAIRequest } from "./OpenAIRequest";
 import { makeNoticeHandler } from "./makeNoticeHandler";
-import { getModelMaxTokens } from "./getModelMaxTokens";
+import type { Model } from "./Provider";
+import { getModelMaxTokens } from "./aiHelpers";
 
 export const getTokenCount = (text: string, model: Model) => {
 	// gpt-3.5-turbo-16k is a special case - it isn't in the library list yet. Same with gpt-4-1106-preview and gpt-3.5-turbo-1106.
-	let m = model === "gpt-3.5-turbo-16k" ? "gpt-3.5-turbo" : model;
+	let m = model.name === "gpt-3.5-turbo-16k" ? "gpt-3.5-turbo" : model.name;
 	m = m === "gpt-4-1106-preview" ? "gpt-4" : m;
 	m = m === "gpt-3.5-turbo-1106" ? "gpt-3.5-turbo" : m;
 
-	return encodingForModel(m).encode(text).length;
+	// kind of hacky, but we'll be using this general heuristic to support non-openai models
+	try {
+		return encodingForModel(m as TiktokenModel).encode(text).length;
+	} catch {
+		const enc = getEncoding("cl100k_base");
+		return enc.encode(text).length;
+	}
 };
 
 async function repeatUntilResolved(
@@ -379,7 +386,7 @@ export async function ChunkedPrompt(
 		);
 
 		const maxChunkTokenSize =
-			getModelMaxTokens(model) / 2 - systemPromptLength; // temp, need to impl. config
+			getModelMaxTokens(model.name) / 2 - systemPromptLength; // temp, need to impl. config
 
 		// Whether we should strictly enforce the chunking rules or we should merge chunks that are too small
 		const shouldMerge = settings.shouldMerge ?? true; // temp, need to impl. config
@@ -398,7 +405,10 @@ export async function ChunkedPrompt(
 
 				if (strSize > maxCombinedChunkSize) {
 					throw new Error(
-						`The chunk "${chunk.slice(0, 25)}..." is too large to fit in a single prompt.`
+						`The chunk "${chunk.slice(
+							0,
+							25
+						)}..." is too large to fit in a single prompt.`
 					);
 				}
 

diff --git a/src/ai/OpenAIRequest.ts b/src/ai/OpenAIRequest.ts
@@ -1,10 +1,10 @@
-import type { Model } from "./models";
 import { requestUrl } from "obsidian";
 import type { OpenAIModelParameters } from "./OpenAIModelParameters";
 import { settingsStore } from "src/settingsStore";
 import { getTokenCount } from "./AIAssistant";
-import { getModelMaxTokens } from "./getModelMaxTokens";
 import { preventCursorChange } from "./preventCursorChange";
+import type { Model } from "./Provider";
+import { getModelProvider } from "./aiHelpers";
 
 type ReqResponse = {
 	id: string;
@@ -38,25 +38,31 @@ export function OpenAIRequest(
 
 		const tokenCount =
 			getTokenCount(prompt, model) + getTokenCount(systemPrompt, model);
-		const maxTokens = getModelMaxTokens(model);
+		const { maxTokens } = model;
 
 		if (tokenCount > maxTokens) {
 			throw new Error(
-				`The ${model} API has a token limit of ${maxTokens}. Your prompt has ${tokenCount} tokens.`
+				`The ${model.name} API has a token limit of ${maxTokens}. Your prompt has ${tokenCount} tokens.`
 			);
 		}
 
+		const modelProvider = getModelProvider(model.name);
+
+		if (!modelProvider) {
+			throw new Error(`Model ${model.name} not found with any provider.`);
+		}
+
 		try {
 			const restoreCursor = preventCursorChange();
 			const _response = requestUrl({
-				url: `https://api.openai.com/v1/chat/completions`,
+				url: `${modelProvider.endpoint}/chat/completions`,
 				method: "POST",
 				headers: {
 					"Content-Type": "application/json",
 					Authorization: `Bearer ${apiKey}`,
 				},
 				body: JSON.stringify({
-					model,
+					model: model.name,
 					...modelParams,
 					messages: [
 						{ role: "system", content: systemPrompt },
@@ -72,7 +78,7 @@ export function OpenAIRequest(
 		} catch (error) {
 			console.log(error);
 			throw new Error(
-				`Error while making request to OpenAI API: ${
+				`Error while making request to ${modelProvider.name}: ${
 					(error as { message: string }).message
 				}`
 			);

diff --git a/src/ai/Provider.ts b/src/ai/Provider.ts
@@ -0,0 +1,52 @@
+export interface AIProvider {
+	name: string;
+	endpoint: string;
+	apiKey: string;
+	models: Model[];
+}
+
+export interface Model {
+	name: string;
+	maxTokens: number;
+}
+
+const OpenAIProvider: AIProvider = {
+	name: "OpenAI",
+	endpoint: "https://api.openai.com/v1",
+	apiKey: "",
+	models: [
+		{
+			name: "gpt-3.5-turbo",
+			maxTokens: 4096,
+		},
+		{
+			name: "gpt-3.5-turbo-16k",
+			maxTokens: 16384,
+		},
+		{
+			name: "gpt-3.5-turbo-1106",
+			maxTokens: 16385,
+		},
+		{
+			name: "gpt-4",
+			maxTokens: 8192,
+		},
+		{
+			name: "gpt-4-32k",
+			maxTokens: 32768,
+		},
+		{
+			name: "gpt-4-1106-preview",
+			maxTokens: 128000,
+		},
+		{
+			name: "text-davinci-003",
+			maxTokens: 4096,
+		},
+	],
+};
+
+
+export const DefaultProviders: AIProvider[] = [
+	OpenAIProvider,
+];
diff --git a/src/ai/aiHelpers.ts b/src/ai/aiHelpers.ts
@@ -0,0 +1,39 @@
+import { settingsStore } from "src/settingsStore";
+
+export function getModelNames() {
+	const aiSettings = settingsStore.getState().ai;
+
+	return aiSettings.providers
+		.flatMap((provider) => provider.models)
+		.map((model) => model.name);
+}
+
+export function getModelByName(model: string) {
+	const aiSettings = settingsStore.getState().ai;
+
+	return aiSettings.providers
+		.flatMap((provider) => provider.models)
+		.find((m) => m.name === model);
+}
+
+export function getModelMaxTokens(model: string) {
+	const aiSettings = settingsStore.getState().ai;
+
+	const modelData = aiSettings.providers
+		.flatMap((provider) => provider.models)
+		.find((m) => m.name === model);
+
+	if (modelData) {
+		return modelData.maxTokens;
+	}
+
+	throw new Error(`Model ${model} not found with any provider.`);
+}
+
+export function getModelProvider(modelName: string) {
+	const aiSettings = settingsStore.getState().ai;
+
+	return aiSettings.providers.find((provider) =>
+		provider.models.some((m) => m.name === modelName)
+	);
+}
diff --git a/src/ai/getModelMaxTokens.ts b/src/ai/getModelMaxTokens.ts
diff --git a/src/ai/models.ts b/src/ai/models.ts