OpenAI: Migrate to async client and enhance API support #219

Tostino · 2024-11-13T16:54:19Z

Major changes:

Migrate to async OpenAI client to support query cancellation and timeouts
Add client caching using global dictionary (GD) to improve performance
Migrate to using raw responses to minimize type conversions and improve performance
Add comprehensive support for all OpenAI API parameters
Add support for client create/destroy methods

Implementation details:

Replace sync OpenAI client with AsyncOpenAI for better control flow
Implement client caching in GD to reuse connections
Add query cancellation support using asyncio
Remove list_models and embed function implementations from openai.py to consolidate API handling
Move functionality directly into the SQL functions for consistency
Return raw API responses to minimize conversions
Add complete OpenAI API parameter support across all functions
Standardize parameter naming with leading underscore
Update OpenAI and tiktoken package versions

Package updates:

openai: 1.44.0 -> 1.51.2
tiktoken: 0.7.0 -> 0.8.0

Breaking changes:

Functions now return raw JSON responses instead of parsed objects
Functions marked as parallel unsafe due to HTTP API constraints
Parameter names now prefixed with underscore to reduce naming conflicts

Known issues:

Inconsistent performance inside the plpython environment.
- The first call to the endpoint is quick (3ms), and then every call after it is very delayed (40ms). Need to figure out what is happening here. I cannot reproduce outside of plpython.

Major changes: - Migrate to async OpenAI client to support query cancellation and timeouts - Add client caching using global dictionary (GD) to improve performance - Migrate to using raw responses to minimize type conversions and improve performance - Add comprehensive support for all OpenAI API parameters - Add support for client create/destroy methods Implementation details: - Replace sync OpenAI client with AsyncOpenAI for better control flow - Implement client caching in GD to reuse connections - Add query cancellation support using asyncio - Remove list_models and embed function implementations from openai.py to consolidate API handling - Move functionality directly into the SQL functions for consistency - Return raw API responses to minimize conversions - Add complete OpenAI API parameter support across all functions - Standardize parameter naming with leading underscore - Update OpenAI and tiktoken package versions Package updates: - openai: 1.44.0 -> 1.51.2 - tiktoken: 0.7.0 -> 0.8.0 Breaking changes: - Functions now return raw JSON responses instead of parsed objects - Functions marked as parallel unsafe due to HTTP API constraints - Parameter names now prefixed with underscore for consistency

cevian

Thank you so much for this PR! I did a preliminary check and have a bunch of questions. I just want to understand the motivation/reasoning behind some decisions. Also we'll have to decide on the json vs vector return of some of these functions. I think you are right we'll need both sets of functions. Let me ask some of my colleagues about the naming conventions we want to use here.

projects/extension/sql/idempotent/001-openai.sql

cevian · 2024-11-14T02:15:26Z

projects/extension/ai/openai.py

+
+    return openai.AsyncOpenAI(**client_kwargs)
+
+def get_or_create_client(plpy, GD: Dict[str, Any], api_key: str = None, api_key_name: str = None, base_url: str = None) -> Any:


Do you have any number showing that creating the client is expensive (and thus worth it to store in GD)?. Does this allow connection reuse or something? And if it's the latter then how/when do connections get closed? Is there a keepalive timeout.

Storing the client in GD seems like a good amount of complexity and I'd like to find out what we are gaining/loosing for it.

Yup, benchmarks here: #116 (comment)

Also note, that there is a known issue with the 2nd (and 3rd...etc) call to the client for the api has some extra 40ms delay that doesn't happen when I have this code running outside of a pl/python environment (noted in the thread above). I really should have mentioned that directly in the PR. will edit to mention that fact that it needs to be identified. Once that is fixed, the benchmark numbers should look much better.

Even with the above issue, this is still much faster, and lower CPU than the original implementation where we recreate the client.

Note specifically the CPU reduction. Recreating the client is heavy on CPU, I know this from past projects but the benchmarks also bare this out.

I believe the connection is closed after the request is complete, and the client becomes ready for the next call. If the request is cancelled early, we attempt to kill things gracefully.

projects/extension/sql/idempotent/001-openai.sql

Tostino · 2024-11-14T04:14:37Z

@cevian Alright, so when removing the _underscore prefix we will need to make changes to the user argument that will differ from the OpenAI API / python client. It is an invalid argument name, along with text which you changed to text_input already.

Let me know what to go with.

cevian · 2024-11-14T14:57:07Z

@Tostino I believe we used openai_user before. Let's stick with that.

Fix naming conflicts with Postgres reserved words. Reverted parallel safety changes. Went from unsafe -> safe for functions. Reverted function volatility changes.

Tostino · 2024-11-14T15:53:09Z

@cevian Alright, all changes made. Also noticed that when I rebased things I had accidentally committed changes to the ai--0.4.0.sql file, so I got that reverted.

This still has the performance problem we need to dig into before it's merged, but at least any of the other issues can be discussed and fixed in the meantime.

cevian · 2024-11-14T19:01:54Z

@Tostino I am still not convinced we need ai.openai_client_create( as a public function. Can we just pass any client options we need as a jsonb to the other functions via a client_extra_args parameter?

Tostino · 2024-11-14T22:35:18Z

I'm good with that solution. Seems to solve the problem statement I was originally trying to solve. Will get it done tonight.

… client_extra_args parameter to all relevant functions that interact with the client (other than the `_simple` function that I think needs a rethinking).

Tostino · 2024-11-15T16:20:37Z

Well...some kid went and ripped out my neighborhoods internet interconnection wiring last night. Was slightly delayed.

Tested to make sure the client_extra_args were being passed through properly, and it seems to be with my initial "kick the tires" tests.

Tostino · 2024-11-19T22:50:42Z

I should have a little time to try and figure out that 2nd run issue this week (or at least attempt, i'm not a Python dev so I am not used to the profiling tools in this space).

@cevian Is there anything else you see that needs attention at this point?

alejandrodnm · 2024-12-03T13:59:19Z

Hey @Tostino is:

The first call to the endpoint is quick (3ms), and then every call after it is very delayed (40ms). Need to figure out what is happening here. I cannot reproduce outside of plpython.

A problem with the current state as well? Or introduced by your PR?

If it's a current issue, could you open a github issue for it and we can tackle that on a separate PR.

Tostino · 2024-12-03T19:00:04Z

@alejandrodnm No, the current state is just a much slower overall call time every time (I believe it was roughly 25-30ms/call) and much higher CPU usage. Not a current issue, was introduced by the PR.

Sorry, holidays had me a bit busy. Will get back to this as soon as I can.

alejandrodnm · 2024-12-03T19:24:44Z

@Tostino don't worry. Just wanted to see if we could support you better. You've put a lot of effort into this, and we really appreciate it.

Tostino requested a review from a team as a code owner November 13, 2024 16:54

Tostino mentioned this pull request Nov 13, 2024

Missing OpenAI functionality #116

Open

cevian reviewed Nov 14, 2024

View reviewed changes

Tostino added 3 commits November 14, 2024 10:22

Remove underscore prefix from all arguments.

9484652

Fix naming conflicts with Postgres reserved words. Reverted parallel safety changes. Went from unsafe -> safe for functions. Reverted function volatility changes.

Revert accidental changes to ai--0.4.0.sql

ce17af4

Whitespace fixes...darn IDE.

9c7c824

Remove client create/destroy functions from the public interface. Add…

c8aacd7

… client_extra_args parameter to all relevant functions that interact with the client (other than the `_simple` function that I think needs a rethinking).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenAI: Migrate to async client and enhance API support #219

OpenAI: Migrate to async client and enhance API support #219

Tostino commented Nov 13, 2024 •

edited

Loading

cevian left a comment

cevian Nov 14, 2024

Tostino Nov 14, 2024 •

edited

Loading

Tostino commented Nov 14, 2024

cevian commented Nov 14, 2024

Tostino commented Nov 14, 2024 •

edited

Loading

cevian commented Nov 14, 2024

Tostino commented Nov 14, 2024

Tostino commented Nov 15, 2024

Tostino commented Nov 19, 2024

alejandrodnm commented Dec 3, 2024

Tostino commented Dec 3, 2024

alejandrodnm commented Dec 3, 2024


		return openai.AsyncOpenAI(**client_kwargs)

		def get_or_create_client(plpy, GD: Dict[str, Any], api_key: str = None, api_key_name: str = None, base_url: str = None) -> Any:

OpenAI: Migrate to async client and enhance API support #219

Are you sure you want to change the base?

OpenAI: Migrate to async client and enhance API support #219

Conversation

Tostino commented Nov 13, 2024 • edited Loading

cevian left a comment

Choose a reason for hiding this comment

cevian Nov 14, 2024

Choose a reason for hiding this comment

Tostino Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

Tostino commented Nov 14, 2024

cevian commented Nov 14, 2024

Tostino commented Nov 14, 2024 • edited Loading

cevian commented Nov 14, 2024

Tostino commented Nov 14, 2024

Tostino commented Nov 15, 2024

Tostino commented Nov 19, 2024

alejandrodnm commented Dec 3, 2024

Tostino commented Dec 3, 2024

alejandrodnm commented Dec 3, 2024

Tostino commented Nov 13, 2024 •

edited

Loading

Tostino Nov 14, 2024 •

edited

Loading

Tostino commented Nov 14, 2024 •

edited

Loading