🔥 Open-source no-code web data extraction platform. Turn websites to APIs and spreadsheets with no-code robots in minutes! [In Beta]
-
Updated
Dec 18, 2024 - TypeScript
🔥 Open-source no-code web data extraction platform. Turn websites to APIs and spreadsheets with no-code robots in minutes! [In Beta]
Extract Keywords from sentence or Replace keywords in sentences.
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Lightweight library for scraping web-sites with LLMs
📰 Let ChatGPT Summarize Hacker News for You
🚜 Parse text and tables from PDF files.
A powerful Python library for getting rich data from the Vietnam Stock Market using just a few lines of code
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
Benchmarking PDF libraries
Wikipedia information extraction library
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
A python client for the Sypht API
This repository provides usage examples for the Python module Newspaper3k.
Accurate, private and configurable document retrieval LLM
A Python utility to digitize plots.
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Structured HTML table data extraction from URLs in Go that has almost no external dependencies
Superpipe - optimized LLM pipelines for structured data
Scraping assistant tool. Editing and maintaining CSS/XPath selectors across webpages.
Add a description, image, and links to the data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the data-extraction topic, visit your repo's landing page and select "manage topics."