DiscovAI Crawl API 🕷️🔍

One API to scrape everything you need from URLs for your AI tool and vector database.

🚧 Work in Progress 🚧

🌟 Features

Our API provides a comprehensive suite of data extraction and processing capabilities:

🧼 Clean HTML (JavaScript and CSS removed)
📝 LLM-friendly Markdown conversion
🚫 Ad-free, cookie banner-free, and dialog-free content
📸 Website screenshots (auto-saved to AWS S3 or Cloudflare R2)
🤖 LLM-generated SEO-friendly content
🔑 LLM-extracted key information (summary, features, FAQs, etc.)
🧠 Ready-to-use embeddings for vector database integration (auto-saved to db)

🔧 Installation

pnpm i
cd apps/api && pnpm exec playwright install

🚀 Usage

pnpm dev
open http://localhost:3000

📦 API Response Structure

{
  "clean_html": "...",
  "LLM_friendly_markdown": "...",
  "clean_text": "...",
  "screenshot_url": "...",
  "llm_extracts_key_info": {
    "what": "...",
    "summary": "...",
    "features": ["...", "..."],
    "faqs": [{"q": "...", "a": "..."}]
  },
  "llm_summarized_detail": "...",
  "embeddings": [...]
}

📚 Documentation

TODO

🤝 Contributing

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.vscode		.vscode
apps		apps
.gitignore		.gitignore
.npmrc		.npmrc
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
turbo.json		turbo.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiscovAI Crawl API 🕷️🔍

🌟 Features

🔧 Installation

🚀 Usage

📦 API Response Structure

📚 Documentation

🤝 Contributing

About

Releases

Packages

Contributors 2

Languages

License

DiscovAI/DiscovAI-crawl

Folders and files

Latest commit

History

Repository files navigation

DiscovAI Crawl API 🕷️🔍

🌟 Features

🔧 Installation

🚀 Usage

📦 API Response Structure

📚 Documentation

🤝 Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages