Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unstructured-ingest crashes from version 0.2.2 and above #317

Open
outcastofmusic opened this issue Dec 20, 2024 · 0 comments
Open

unstructured-ingest crashes from version 0.2.2 and above #317

outcastofmusic opened this issue Dec 20, 2024 · 0 comments

Comments

@outcastofmusic
Copy link

When I try to run unstructured-ingest I get the following error:

2024-12-20 08:07:36,002 MainProcess ERROR    Exception raised while running partition
Traceback (most recent call last):
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/utils/retries.py", line 204, in retry_with_backoff_async
    return await func()
           ^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/utils/retries.py", line 149, in do_request
    raise PermanentError(exception) from exception
unstructured_client.utils.retries.PermanentError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_ingest/v2/pipeline/interfaces.py", line 171, in run_async
    return await self._run_async(fn=fn, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_ingest/v2/pipeline/steps/partition.py", line 67, in _run_async
    partitioned_content = await fn(**fn_kwargs)
                          ^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_ingest/v2/processes/partitioner.py", line 188, in run_async
    return await self.partition_via_api(filename, metadata=metadata, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_ingest/utils/dep_check.py", line 62, in wrapper_async
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_ingest/v2/processes/partitioner.py", line 170, in partition_via_api
    elements = await call_api_async(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_ingest/v2/unstructured_api.py", line 74, in call_api_async
    res = await client.general.partition_async(request=partition_request)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/general.py", line 205, in partition_async
    http_res = await self.do_request_async(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/basesdk.py", line 332, in do_request_async
    http_res = await utils.retry_async(
               ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/utils/retries.py", line 153, in retry_async
    return await retry_with_backoff_async(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/utils/retries.py", line 206, in retry_with_backoff_async
    raise exception.inner
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/utils/retries.py", line 121, in do_request
    res = await func()
          ^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/basesdk.py", line 302, in do
    raise e
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/unstructured_client/basesdk.py", line 295, in do
    http_res = await client.send(req, stream=stream)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/httpx/_client.py", line 1629, in send
    response = await self._send_handling_auth(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/httpx/_client.py", line 1657, in _send_handling_auth
    response = await self._send_handling_redirects(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/httpx/_client.py", line 1694, in _send_handling_redirects
    response = await self._send_single_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/httpx/_client.py", line 1730, in _send_single_request
    response = await transport.handle_async_request(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/httpx/_transports/default.py", line 394, in handle_async_request
    resp = await self._pool.handle_async_request(req)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 255, in handle_async_request
    await self._close_connections(closing)
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/httpcore/_async/connection_pool.py", line 343, in _close_connections
    with AsyncShieldCancellation():
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/httpcore/_synchronization.py", line 214, in __enter__
    self._anyio_shield.__enter__()
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 425, in __enter__
    task_state = _task_states[host_task]
                 ~~~~~~~~~~~~^^^^^^^^^^^
  File "/home/agis/.cache/pypoetry/virtualenvs/unstructured-procesing-SCs5zkBJ-py3.11/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 686, in __getitem__
    assert isinstance(key, asyncio.Task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

I run unstructured-ingest using the following command:

unstructured-ingest \
  local \
    --input-path $INPUT_PATH \
    --output-dir $OUTPUT_DIR \
    --recursive \
    --partition-by-api \
    --api-key $UNSTRUCTURED_API_KEY \
    --partition-endpoint $UNSTRUCTURED_API_URL \
    --chunking-endpoint $UNSTRUCTURED_API_URL \
    --chunking-strategy "by_title" \
    --strategy auto \
    --ocr-languages "eng,ell" \
    --additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"

The issue seems to happen from version 0.2.2 and above, while everything works fine for version 0.2.1

I am using python=3.11.10, and unstructured=0.16.11

Anyone have any idea on this? Seems to crash inside anyio which has version 4.7.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant