Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AzureAIDocumentIntelligenceLoader is calling the wrong endpoint and we can't change anything #28666

Open
5 tasks done
ambodiam opened this issue Dec 11, 2024 · 0 comments
Open
5 tasks done
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature

Comments

@ambodiam
Copy link

ambodiam commented Dec 11, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

di_endpoint = 'https://<endpoint>.cognitiveservices.azure.com/'
key = 'api_key'
file_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-REST-api-samples/master/curl/form-recognizer/sample-layout.pdf"

# Step 2: Load the file with AzureAIDocumentIntelligenceLoader
loader = AzureAIDocumentIntelligenceLoader(
    api_endpoint=di_endpoint,
    url_path = file_url,
    api_key=key, 
    api_version='2023-07-31',
    api_model='prebuilt-layout'
)

documents = loader.load()

Error Message and Stack Trace (if applicable)

2024-12-11 14:44:00,968 [MainThread  ] [INFO ]  Request URL: 'https://amirrahnamadisweden.cognitiveservices.azure.com/documentintelligence/documentModels/prebuilt-layout:analyze?api-version=REDACTED&outputContentFormat=REDACTED'
Request method: 'POST'
Request headers:
    'content-type': 'application/json'
    'Content-Length': '146'
    'Accept': 'application[/json](http://localhost:8888/json)'
    'x-ms-client-request-id': 'fa6d3882-b7c5-11ef-9cb0-76d7b7a30648'
    'x-ms-useragent': 'REDACTED'
    'User-Agent': 'azsdk-python-ai-documentintelligence[/1.0.0b4](http://localhost:8888/1.0.0b4) Python[/3.11.11](http://localhost:8888/3.11.11) (macOS-14.7-arm64-arm-64bit)'
    'Ocp-Apim-Subscription-Key': 'REDACTED'
A body is sent with the request
2024-12-11 14:44:00,977 [MainThread  ] [DEBUG]  Starting new HTTPS connection (1): amirrahnamadisweden.cognitiveservices.azure.com:443
2024-12-11 14:44:01,298 [MainThread  ] [DEBUG]  https://amirrahnamadisweden.cognitiveservices.azure.com:443 "POST /documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&outputContentFormat=markdown HTTP[/11](http://localhost:8888/11)" 404 56
2024-12-11 14:44:01,300 [MainThread  ] [INFO ]  Response status: 404
Response headers:
    'Content-Length': '56'
    'Content-Type': 'application[/json](http://localhost:8888/json)'
    'apim-request-id': 'REDACTED'
    'Strict-Transport-Security': 'REDACTED'
    'x-content-type-options': 'REDACTED'
    'Date': 'Wed, 11 Dec 2024 13:44:00 GMT'
---------------------------------------------------------------------------
ResourceNotFoundError                     Traceback (most recent call last)
Cell In[30], line 18
      9 # Step 2: Load the file with AzureAIDocumentIntelligenceLoader
     10 loader = AzureAIDocumentIntelligenceLoader(
     11     api_endpoint=di_endpoint,
     12     url_path = file_url,
   (...)
     15     api_model='prebuilt-layout'
     16 )
---> 18 documents = loader.load()

File [~/code/rag_azure/venv/lib/python3.11/site-packages/langchain_core/document_loaders/base.py:31](http://localhost:8888/lab/tree/~/code/rag_azure/venv/lib/python3.11/site-packages/langchain_core/document_loaders/base.py#line=30), in BaseLoader.load(self)
     29 def load(self) -> list[Document]:
     30     """Load data into Document objects."""
---> 31     return list(self.lazy_load())

File [~/code/rag_azure/venv/lib/python3.11/site-packages/langchain_community/document_loaders/doc_intelligence.py:103](http://localhost:8888/lab/tree/~/code/rag_azure/venv/lib/python3.11/site-packages/langchain_community/document_loaders/doc_intelligence.py#line=102), in AzureAIDocumentIntelligenceLoader.lazy_load(self)
    101     yield from self.parser.parse(blob)
    102 elif self.url_path is not None:
--> 103     yield from self.parser.parse_url(self.url_path)  # type: ignore[arg-type]
    104 elif self.bytes_source is not None:
    105     yield from self.parser.parse_bytes(self.bytes_source)

File [~/code/rag_azure/venv/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/doc_intelligence.py:98](http://localhost:8888/lab/tree/~/code/rag_azure/venv/lib/python3.11/site-packages/langchain_community/document_loaders/parsers/doc_intelligence.py#line=97), in AzureAIDocumentIntelligenceParser.parse_url(self, url)
     95 def parse_url(self, url: str) -> Iterator[Document]:
     96     from azure.ai.documentintelligence.models import AnalyzeDocumentRequest
---> 98     poller = self.client.begin_analyze_document(
     99         self.api_model,
    100         AnalyzeDocumentRequest(url_source=url),
    101         # content_type="application[/octet-stream](http://localhost:8888/octet-stream)",
    102         output_content_format="markdown" if self.mode == "markdown" else "text",
    103     )
    104     result = poller.result()
    106     if self.mode in ["single", "markdown"]:

File [~/code/rag_azure/venv/lib/python3.11/site-packages/azure/core/tracing/decorator.py:105](http://localhost:8888/lab/tree/~/code/rag_azure/venv/lib/python3.11/site-packages/azure/core/tracing/decorator.py#line=104), in distributed_trace.<locals>.decorator.<locals>.wrapper_use_tracer(*args, **kwargs)
    103 span_impl_type = settings.tracing_implementation()
    104 if span_impl_type is None:
--> 105     return func(*args, **kwargs)
    107 # Merge span is parameter is set, but only if no explicit parent are passed
    108 if merge_span and not passed_in_parent:

File [~/code/rag_azure/venv/lib/python3.11/site-packages/azure/ai/documentintelligence/_operations/_patch.py:537](http://localhost:8888/lab/tree/~/code/rag_azure/venv/lib/python3.11/site-packages/azure/ai/documentintelligence/_operations/_patch.py#line=536), in DocumentIntelligenceClientOperationsMixin.begin_analyze_document(self, model_id, analyze_request, pages, locale, string_index_type, features, query_fields, output_content_format, output, **kwargs)
    535 cont_token: Optional[str] = kwargs.pop("continuation_token", None)
    536 if cont_token is None:
--> 537     raw_result = self._analyze_document_initial(
    538         model_id=model_id,
    539         analyze_request=analyze_request,
    540         pages=pages,
    541         locale=locale,
    542         string_index_type=string_index_type,
    543         features=features,
    544         query_fields=query_fields,
    545         output_content_format=output_content_format,
    546         output=output,
    547         content_type=content_type,
    548         cls=lambda x, y, z: x,
    549         headers=_headers,
    550         params=_params,
    551         **kwargs,
    552     )
    553     raw_result.http_response.read()  # type: ignore
    554 kwargs.pop("error_map", None)

File [~/code/rag_azure/venv/lib/python3.11/site-packages/azure/ai/documentintelligence/_operations/_operations.py:713](http://localhost:8888/lab/tree/~/code/rag_azure/venv/lib/python3.11/site-packages/azure/ai/documentintelligence/_operations/_operations.py#line=712), in DocumentIntelligenceClientOperationsMixin._analyze_document_initial(self, model_id, analyze_request, pages, locale, string_index_type, features, query_fields, output_content_format, output, **kwargs)
    711 except (StreamConsumedError, StreamClosedError):
    712     pass
--> 713 map_error(status_code=response.status_code, response=response, error_map=error_map)
    714 error = _deserialize(_models.ErrorResponse, response.json())
    715 raise HttpResponseError(response=response, model=error)

File [~/code/rag_azure/venv/lib/python3.11/site-packages/azure/core/exceptions.py:163](http://localhost:8888/lab/tree/~/code/rag_azure/venv/lib/python3.11/site-packages/azure/core/exceptions.py#line=162), in map_error(status_code, response, error_map)
    161     return
    162 error = error_type(response=response)
--> 163 raise error

ResourceNotFoundError: (404) Resource not found
Code: 404
Message: Resource not found
...

Description

The problem here is that my documentintelligence resource is of type FormRecognizer. I have double checked that in AI studio, the api key and endpoint work perfectly fine (see image below).

Screenshot 2024-12-11 at 14 31 24

But the problem is that langchain by default calls the documentintelligence endpoint even though you pass the url explicitly:

https://amirrahnamadisweden.cognitiveservices.azure.com:443 "POST /documentintelligence/documentModels/prebuilt-layout:analyze?api-version=2023-07-31&outputContentFormat=markdown

System Info

System Information
------------------
> OS:  Darwin
> OS Version:  Darwin Kernel Version 23.6.0: Wed Jul 31 20:48:04 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T6030
> Python Version:  3.11.11 (main, Dec  3 2024, 17:20:40) [Clang 16.0.0 (clang-1600.0.26.4)]

Package Information
-------------------
> langchain_core: 0.3.21
> langchain: 0.3.9
> langchain_community: 0.3.9
> langsmith: 0.1.147
> langchain_openai: 0.2.11
> langchain_text_splitters: 0.3.2

Optional packages not installed
-------------------------------
> langserve

Other Dependencies
------------------
> aiohttp: 3.11.10
> async-timeout: Installed. No version info available.
> dataclasses-json: 0.6.7
> httpx: 0.28.0
> httpx-sse: 0.4.0
> jsonpatch: 1.33
> langsmith-pyo3: Installed. No version info available.
> numpy: 1.26.4
> openai: 1.57.0
> orjson: 3.10.12
> packaging: 24.2
> pydantic: 2.9.2
> pydantic-settings: 2.6.1
> PyYAML: 6.0.2
> requests: 2.32.3
> requests-toolbelt: 1.0.0
> SQLAlchemy: 2.0.36
> tenacity: 9.0.0
> tiktoken: 0.8.0
> typing-extensions: 4.12.2
@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature
Projects
None yet
Development

No branches or pull requests

1 participant