Skip to content

Unstructured.IO: ETL for LLMs

Welcome to Unstructured.IO! We're here on a mission to make all of your documents available for LLM applications, from PDFs and Word Docs to emails and markdown. To get started, check out our open source offerings.

Tried the open source library and ready for more power? Check out our products page to learn more about our paid API and Unstructured Platform, and ETL tool built around our core file transformation capabilities.

Learn more

Section Description
Company Website Unstructured.io product and company info
Documentation Full unstructured documentation

Popular repositories Loading

  1. unstructured unstructured Public

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    HTML 9.5k 794

  2. unstructured-api unstructured-api Public

    Python 595 127

  3. unstructured-inference unstructured-inference Public

    Python 165 53

  4. pipeline-sec-filings pipeline-sec-filings Public archive

    Preprocessing pipeline notebooks and API supporting text extraction from SEC documents

    Jupyter Notebook 140 30

  5. unstructured-python-client unstructured-python-client Public

    A Python client for the Unstructured hosted API

    Python 85 17

  6. unstructured-js-client unstructured-js-client Public

    A Typescript client for the Unstructured hosted API

    TypeScript 43 12

Repositories

Showing 10 of 36 repositories
  • Unstructured-IO/unstructured-ingest’s past year of commit activity
    HTML 23 Apache-2.0 21 51 20 Updated Dec 18, 2024
  • unstructured Public

    Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

    Unstructured-IO/unstructured’s past year of commit activity
    HTML 9,467 Apache-2.0 794 131 (3 issues need help) 49 Updated Dec 18, 2024
  • unstructured-js-client Public

    A Typescript client for the Unstructured hosted API

    Unstructured-IO/unstructured-js-client’s past year of commit activity
    TypeScript 43 MIT 12 5 3 Updated Dec 18, 2024
  • unstructured-python-client Public

    A Python client for the Unstructured hosted API

    Unstructured-IO/unstructured-python-client’s past year of commit activity
    Python 85 MIT 17 9 5 Updated Dec 18, 2024
  • docs Public

    Documentation for all Unstructured products and libraries

    Unstructured-IO/docs’s past year of commit activity
    MDX 5 17 0 18 Updated Dec 17, 2024
  • base-images Public

    Store Dockerfiles and Packer configs for images to use as a base to build upon

    Unstructured-IO/base-images’s past year of commit activity
    Shell 3 Apache-2.0 2 1 2 Updated Dec 16, 2024
  • Unstructured-IO/unstructured-api’s past year of commit activity
    Python 595 Apache-2.0 127 27 7 Updated Dec 14, 2024
  • Unstructured-IO/unstructured-platform-plugins’s past year of commit activity
    Python 3 Apache-2.0 1 0 1 Updated Dec 2, 2024
  • azure-ai-hub-gateway-solution-accelerator Public Forked from Azure-Samples/ai-hub-gateway-solution-accelerator

    Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services

    Unstructured-IO/azure-ai-hub-gateway-solution-accelerator’s past year of commit activity
    Bicep 0 MIT 42 0 0 Updated Nov 22, 2024
  • Unstructured-IO/unstructured-inference’s past year of commit activity
    Python 165 Apache-2.0 53 19 12 Updated Oct 25, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.