Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observered latency in the chain.invoke #28750

Open
5 tasks done
KalakondaSainath opened this issue Dec 16, 2024 · 3 comments
Open
5 tasks done

Observered latency in the chain.invoke #28750

KalakondaSainath opened this issue Dec 16, 2024 · 3 comments
Labels
Ɑ: core Related to langchain-core

Comments

@KalakondaSainath
Copy link

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

import os
from azure.identity import ClientSecretCredential, get_bearer_token_provider
from langchain_openai import AzureChatOpenAI

def llm_connection(model="gpt-4o",
temperature=0.2,
top_p=0.1,
max_tokens=2000,
max_retries=1):
credentials = {
"tenant_id": os.getenv("AZURE_TENANT_ID"),
"client_id": os.getenv("AZURE_CLIENT_ID"),
"client_secret": os.getenv("AZURE_CLIENT_SECRET"),
"openai_endpoint": os.getenv("API_BASE"),
"azure_api_version": os.getenv("AZURE_API_VERSION", "2024-04-01-preview"),
"subscription_key": os.getenv("SUBSCRIPTION_KEY"),
}

llm = instantiate_llm( credentials, 
                       azure_deployment = model, 
                       temperature = temperature,
                       top_p = top_p, 
                       max_tokens = max_tokens,
                       max_retries = max_retries)
return llm

def instantiate_llm(
credentials: dict,
azure_deployment: str = "gpt-4o",
temperature: float = 0.2,
top_p=0.1,
max_tokens: int = 1000,
max_retries: int = 1,
):
"""
Instantiate llm model
"""

csc = ClientSecretCredential(
    tenant_id=credentials["tenant_id"],
    client_id=credentials["client_id"],
    client_secret=credentials["client_secret"],
)

llm = AzureChatOpenAI(
    azure_endpoint=credentials["openai_endpoint"],
    api_version=credentials["azure_api_version"],
    azure_deployment=azure_deployment,
    azure_ad_token_provider=get_bearer_token_provider(
        csc, "https://cognitiveservices.azure.com/.default"
    ),
    default_headers={"Ocp-Apim-Subscription-Key": credentials["subscription_key"]},
    temperature=temperature,
    top_p=top_p,
    max_tokens=max_tokens,
    max_retries=max_retries,
)

return llm

prompt_template = ChatPromptTemplate.from_messages(
[("system", EMAIL_WRITER_SYSTEM)]
)
chain = prompt_template | llm | StrOutputParser()
response = chain.invoke(<request_payload>)

Error Message and Stack Trace (if applicable)

Retrying request to /chat/completions in 0.376881 seconds

Description

We have observed intermittent latency in the chain.invoke
At times it takes couple of minutes before it makes Open AI HTTP Post Request and there is no logging on the operation taking time.
With the same payload, at times the whole chain completes in 10 seconds where as we observe the latency with no logging and Error message Retrying request to /chat/completions in

We would like to understand the issue with langchain invoke on why there is latency observed in the

Note: both the requests are with same payload

Latency observed during this sequence of steps
image

No latency during this call -
image

System Info

$ pip freeze
aiohappyeyeballs==2.4.3
aiohttp==3.10.10
aiosignal==1.3.1
annotated-types==0.7.0
anyio==4.6.2.post1
async-timeout==4.0.3
attrs==24.2.0
azure-core==1.32.0
azure-functions==1.21.3
azure-identity==1.19.0
beautifulsoup4==4.12.3
certifi==2024.8.30
cffi==1.17.1
charset-normalizer==3.4.0
click==8.1.7
colorama==0.4.6
coverage==7.6.4
cryptography==43.0.3
databricks-sql-connector==3.4.0
dataclasses-json==0.6.7
distro==1.9.0
et_xmlfile==2.0.0
exceptiongroup==1.2.2
fastapi==0.115.4
frozenlist==1.5.0
greenlet==3.1.1
h11==0.14.0
httpcore==1.0.6
httpx==0.27.2
httpx-sse==0.4.0
idna==3.10
iniconfig==2.0.0
isodate==0.7.2
jiter==0.7.0
jsonpatch==1.33
jsonpointer==3.0.0
langchain==0.3.7
langchain-community==0.3.5
langchain-core==0.3.15
langchain-openai==0.2.6
langchain-text-splitters==0.3.2
langsmith==0.1.140
lxml==5.3.0
lz4==4.3.3
marshmallow==3.23.1
msal==1.31.0
msal-extensions==1.2.0
msrest==0.7.1
multidict==6.1.0
mypy-extensions==1.0.0
numpy==1.26.4
oauthlib==3.2.2
openai==1.54.3
openpyxl==3.1.5
orjson==3.10.11
packaging==24.1
pandas==2.2.0
pluggy==1.5.0
portalocker==2.10.1
propcache==0.2.0
py4j==0.10.9.5
pyarrow==16.1.0
pycparser==2.22
pydantic==2.9.2
pydantic-settings==2.6.1
pydantic_core==2.23.4
PyJWT==2.9.0
pyspark==3.2.2
pytest==8.3.3
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.2
pywin32==308
PyYAML==6.0.2
regex==2024.11.6
requests==2.32.3
requests-oauthlib==2.0.0
requests-toolbelt==1.0.0
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
SQLAlchemy==2.0.35
starlette==0.41.2
tenacity==9.0.0
thrift==0.20.0
tiktoken==0.8.0
tomli==2.0.2
tqdm==4.67.0
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2024.2
urllib3==2.2.3
uvicorn==0.32.0
yarl==1.17.1

@dosubot dosubot bot added the Ɑ: core Related to langchain-core label Dec 16, 2024
@keenborder786
Copy link
Contributor

@KalakondaSainath I would suggest you use LANGSMITH to trace your chain, this way you will be better able to break down the latency coming from each runnable in chain.

@rayamoh
Copy link

rayamoh commented Dec 17, 2024

@keenborder786 - Bringing LANGSMITH to our org is big task going through approvals and buy the product. Can we get inputs on why there is gap of 2.5 mins from below run, between ClientSecret token and Retry request. Any additional logs we can enable in langchain to understand how retry logic "wait_exponential" is being used in create_base_retry_decorator

image

@KalakondaSainath

@keenborder786
Copy link
Contributor

@rayamoh I think so the best solution in your case will be to write a custom Call Back Handler which keep track of time tracking and then use it in chain:

prompt = PromptTemplate.from_template("Answer the following question: {question}")
chain = prompt | ChatOpenAI()
chain.invoke(input = {'question': 'What is LangChain?'}, config=RunnableConfig(callbacks=[StdOutCallbackHandler(), StreamingStdOutCallbackHandler()]))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: core Related to langchain-core
Projects
None yet
Development

No branches or pull requests

3 participants