One API to scrape everything you need from URLs for your AI tool and vector database.
π§ Work in Progress π§
Our API provides a comprehensive suite of data extraction and processing capabilities:
- π§Ό Clean HTML (JavaScript and CSS removed)
- π LLM-friendly Markdown conversion
- π« Ad-free, cookie banner-free, and dialog-free content
- πΈ Website screenshots (auto-saved to AWS S3 or Cloudflare R2)
- π€ LLM-generated SEO-friendly content
- π LLM-extracted key information (summary, features, FAQs, etc.)
- π§ Ready-to-use embeddings for vector database integration (auto-saved to db)
pnpm i
cd apps/api && pnpm exec playwright install
pnpm dev
open http://localhost:3000
{
"clean_html": "...",
"LLM_friendly_markdown": "...",
"clean_text": "...",
"screenshot_url": "...",
"llm_extracts_key_info": {
"what": "...",
"summary": "...",
"features": ["...", "..."],
"faqs": [{"q": "...", "a": "..."}]
},
"llm_summarized_detail": "...",
"embeddings": [...]
}
TODO
TODO