Firecrawl V1 is here! With that we introduce a more reliable and developer friendly API.
August 29th, 2024
Here is what’s new:
- Output Formats for /scrape. Choose what formats you want your output in.
- New /map endpoint for getting most of the URLs of a webpage.
- Developer friendly API for /crawl/{id} status.
- 2x Rate Limits for all plans.
- Go SDK and Rust SDK
- Teams support
- API Key Management in the dashboard.
- onlyMainContent is now default to true.
- /crawl webhooks and websocket support.
Learn more about it here
Start using v1 right away at https://firecrawl.dev
What's Changed (including v0 + v1)
- Delete .DS_Store by @szepeviktor in #8
- [Bugfix] added normalized apikey to craw/status route by @rafaelsideguide in #12
- [Feat] improving reative paths by @rafaelsideguide in #4
- Fix typos by @szepeviktor in #9
- [Feat] Added html to markdown table parser by @rafaelsideguide in #11
- Option to extract only the main content, excluding headers, navs, footers etc. by @nickscamara in #14
- [Feat] Adding pdf parser by @rafaelsideguide in #17
- adding ci-cd workflow by @rafaelsideguide in #20
- adding workflow by @rafaelsideguide in #21
- adding env secrets by @rafaelsideguide in #22
- [Feat] Added TSDocs and types for js-sdk by @rafaelsideguide in #28
- Added option to replace all relative paths with absolute paths by @rafaelsideguide in #25
- [Bugfix] Fixed scrape preview test by @rafaelsideguide in #30
- Caleb: fixing some documentation and rebuilding the server by @calebpeffer in #32
- Rate limit fixes for crawl status by @nickscamara in #36
- Better logging by @nickscamara in #35
- [Feat] Added type declarations by @rafaelsideguide in #31
- Refactor api routes by @nickscamara in #37
- Logging by @nickscamara in #38
- Cjp/making db auth optional <> Running project locally by @calebpeffer in #40
- chore: add context.close by @mattzcarey in #46
- Fixes table parsing for websites such as news.ycombinator.com (HN) by @nickscamara in #52
- [Feat] Server health check + slack message by @rafaelsideguide in #53
- [Feat] Added blocklist for social media urls by @rafaelsideguide in #55
- [Feat:mvp] Search Endpoint => serp api + firecrawl => 🔥 🔍 by @nickscamara in #56
- [Feat] Added anthropic vision api by @rafaelsideguide in #5
- [Bugfix] Trim and Lowercase all urls by @rafaelsideguide in #13
- Implements the ability for the crawler to output all the links it found, without scraping by @nickscamara in #34
- Serper params by @nickscamara in #62
- Support for tbs, filter, lang, country and location with Serper search. by @rogerserper in #61
- [Feat] Added allowed urls by @rafaelsideguide in #64
- /search support in node sdk by @nickscamara in #72
- Free credits increase by @nickscamara in #75
- [Bugfix] JS-SDK: Remove dotenv and add tests by @mdp in #68
- [Feat] Coupon system by @rafaelsideguide in #66
- Specific website params support by @nickscamara in #83
- Greenpay fixes by @nickscamara in #84
- [Feat] Implemented retry attempts to handle 502 errors by @rafaelsideguide in #67
- feat: LLM Extraction (mvp) by @nickscamara in #90
- Update README.md by @bllchmbrs in #110
- Add Posthog Logging by @ericciarla in #109
- Refactor of main web scraper + Partial data streaming by @nickscamara in #120
- [Feat] Added includeHTML option by @rafaelsideguide in #126
- Cancel Job Route by @nickscamara in #129
- [Feat] Added max depth option by @rafaelsideguide in #130
- Add keyAuth endpoint by @ericciarla in #131
- [Test] Added integration tests suite by @rafaelsideguide in #118
- Adds Zod Integration for LLM Extraction in the Firecrawl JS SDK by @nickscamara in #135
- [Docs] Updated examples by @rafaelsideguide in #137
- Switching to AGPL - We Need Your Consent! by @calebpeffer in #134
- Nsc/refactor scraping order by @nickscamara in #139
- Update models.ts by @ericciarla in #144
- Timeout on /scrape by @nickscamara in #145
- [Doc] Added default value for crawlOptions.limit by @rafaelsideguide in #142
- feat: 4x-5x faster crawler (fast mode) by @nickscamara in #149
- Add Docker Compose for easy self hosting by @chand1012 in #119
- refactor: fix typo in WebScraper/index.ts by @eltociear in #27
- [Tests] Added crawl test suite -> crawl improvements by @rafaelsideguide in #153
- feat: Docx Support by @nickscamara in #158
- Fixes pdfs not found if .pdf is not present by @nickscamara in #29
- Update README.md: Typo fix by @elimisteve in #160
- [Feat] Added rate limits by @rafaelsideguide in #151
- Allow override of API URL by @mattjoyce in #166
- feat: HyperDX Integration by @nickscamara in #167
- beta: Fire-Engine fallback by @nickscamara in #174
- Add additional file extensions to crawler.ts by @tractorjuice in #77
- [Bug] Fixing /crawl limit by @rafaelsideguide in #143
- Update issue templates by @rafaelsideguide in #180
- [Feat] Added proxy and media blocking support for Playwright by @JakobStadlhuber in #181
- update: wait until body attached in playwright-service by @qyou in #170
- feat: Allow privacy/legal/ other pages in social media websites by @nickscamara in #168
- [Bug] Added data check for python SDK by @rafaelsideguide in #176
- Fix FIRECRAWL_API_URL bug, also various PyLint fixes by @mattjoyce in #178
- [Feat] Added idempotency key to crawl route by @rafaelsideguide in #132
- Feat: Provide more details for 429 error msg by @simonha9 in #190
- Limit on /search is not deterministic by @Keredu in #186
- Various PyPi Metadata by @mattjoyce in #191
- [Test] Added sdk e2e tests by @rafaelsideguide in #183
- Allow users to manually set the waitFor param on /scrape by @nickscamara in #200
- [Feat] Added custom scraping conditions for readme docs by @rafaelsideguide in #204
- Feat/screenshot support by @ericciarla in #207
- feat: New pricing/limits changes by @nickscamara in #216
- [sdk] Fixes waiting status not being present on check status by @nickscamara in #218
- Fixed fire-engine content bug by @rafaelsideguide in #228
- Use @ instead of # for default BULL_AUTH_KEY. Hash mark is reserved for URI fragments. by @rombru in #229
- feat: Ability to forward headers to reliable providers for auth etc... by @nickscamara in #221
- Improvements to the blocklist regex by @nickscamara in #185
- Playwright service bugs #222 #179 #197 by @mattjoyce in #224
- [Fix] Changed timeout parameter name on js sdk by @rafaelsideguide in #211
- [Bug] Improved js response and test for getting partial_data by @rafaelsideguide in #212
- Better fallbacks for initial crawl start by @nickscamara in #148
- [Feat] Added custom scraping for google-drive pdf usecase by @rafaelsideguide in #235
- Script to check local vs published versions of sdk packages by @mattjoyce in #213
- Update README.md by @calebpeffer in #140
- Refactor custom scraping and added security center vanta by @nickscamara in #237
- [Feat] Added scroll xpaths on fire-engine for handling readme docs by @rafaelsideguide in #239
- Add Kubernetes configuration for Firecrawl deployment by @JakobStadlhuber in #236
- [Feat] Improved the scrape for gdrive pdfs by @rafaelsideguide in #238
- Add support for Self-Hosted Webhook URL Usage and added project_id into the webhook payload by @JakobStadlhuber in #234
- Partial Data Sliding window of 50 by @nickscamara in #242
- feat: Transactional emails for rate limits and credit limits by @nickscamara in #244
- Python-SDK transitional build setup for pyproject.toml by @mattjoyce in #196
- [Feat] CI/CD for publishing js and python SDKs by @rafaelsideguide in #246
- 194 sdk ci pipeline for publishing pythonnode sdk by @rafaelsideguide in #248
- fixing minor problems on workflow by @rafaelsideguide in #250
- Fix 208 py sdk interval poll name by @mattjoyce in #252
- Moving from fetch to axios and preventing deadlocks by setting timeouts on fallbacks by @nickscamara in #260
- ignoreSitemap feature, pageOptions now respected in the initial crawl as well by @nickscamara in #263
- Fixed bugs associated with absolute path replacements by @nickscamara in #268
- Only fetch webhook from db if self host webhook not set and using db auth by @nickscamara in #262
- [Feat] Added route to clean completed jobs and a github action cron that triggers every 24h by @rafaelsideguide in #265
- [Feat] Added allowBackwardCrawling option by @rafaelsideguide in #269
- [Feat] Added jobId to webhook data by @rafaelsideguide in #272
- Clusters support to scale our API to # of CPUs running by @nickscamara in #274
- Added pageOptions.removeTags by @rafaelsideguide in #275
- Added logging to python sdk FIRECRAWL_LOGGING_LEVEL by @mattjoyce in #256
- Python sdk improve response handling by @mattjoyce in #253
- [Feat] Added parsePDF option to pageOptions by @rafaelsideguide in #271
- [Feat] Added metadata.pageStatusCode and metadata.pageError properties by @rafaelsideguide in #276
- Make maxDepth relative to the entered url by @ericciarla in #277
- Test/load testing by @rafaelsideguide in #175
- Fixes crawler getting confused with base paths that contain www. by @nickscamara in #283
- Feat/max depth relative by @ericciarla in #285
- [Feat] Improvements on response document types by @rafaelsideguide in #298
- test: Rate Limit Unit Tests by @nickscamara in #282
- Added local host support for the javascript SDK by @NeevJewalkar in #296
- [Test] Transcribed from e2e to unit tests for many cases by @rafaelsideguide in #294
- [Feat] Added support for RegEx in removeTags by @AndyMik90 in #297
- [Bug] Fixed includeHTML to use cleanedHtml as response by @rafaelsideguide in #301
- Cjp/email to posthog logging by @calebpeffer in #302
- Fix Broken Link by @Lakr233 in #303
- docs: Fix pydanti to pydantic by @100gle in #308
- [Bug] Fixed axios bug that were making jobs stuck on active queue by @rafaelsideguide in #312
- fix multi-word search term issue: /search (w/o Serp) by @snippet in #313
- [Bug] Fixed clean jobs by @rafaelsideguide in #317
- [Bug] Fixed the regex test for google drive pdf files by @rafaelsideguide in #318
- add some types by @snippet in #316
- [Bug] Added default values and fixed pdf bug by @rafaelsideguide in #321
- [Test] Added E2E tests for checking metadata values by @rafaelsideguide in #322
- pageOptions.onlyIncludeTags param by @nickscamara in #328
- Update CONTRIBUTING.md by @george-zakharov in #329
- Fix headers | pageOptions not being passed correctly to fire engine by @nickscamara in #331
- [Tests] Added crawl limit unit test by @rafaelsideguide in #323
- Adds pageOptions.includeRawHtml and new extraction mode "llm-extraction-from-raw-html" by @ericciarla in #332
- [Proposal] new feature allowExternalContentLinks by @snippet in #336
- [PROPOSAL] (deps): making sure all deps are always up to date by @Sanix-Darker in #338
- [PROPOSAL] (docker-compose) regroup envs vars between services by @Sanix-Darker in #337
- new playwright service by @snippet in #325
- apps/playwright-service(deps): bump the prod-deps group in /apps/playwright-service with 3 updates by @dependabot in #346
- Logging for all scraper methods by @nickscamara in #363
- setting up docker to ts playwright service by @snippet in #361
- (Docs) Self Host added new ts playwright service instructions by @snippet in #362
- apps/test-suite(deps): bump the prod-deps group in /apps/test-suite with 6 updates by @dependabot in #347
- apps/test-suite(deps-dev): bump typescript from 5.4.5 to 5.5.3 in /apps/test-suite in the dev-deps group by @dependabot in #348
- apps/api(deps): bump the prod-deps group in /apps/api with 28 updates by @dependabot in #349
- [Feat] Added veeva to special case params by @rafaelsideguide in #371
- Only check Supabase if configured to when cancelling job by @StefanTerdell in #374
- dependabot for security checks, fixed crawl test by @rafaelsideguide in #370
- [Feat] Added implementation for saving docs on supabase by @rafaelsideguide in #326
- Self-hosting quality of life fixes by @StefanTerdell in #375
- [Feat] Added fire-engine fallback for getting sitemaps by @rafaelsideguide in #386
- fix(bull): requeue jobs after restart by @mogery in #393
- Fix USE_DB_AUTHENTICATION checks for self-host by @kun432 in #395
- [BUG] Fixed docker self hosting issue by @rafaelsideguide in #396
- Slack Alerts for when queue is over capacity by @nickscamara in #397
- CI for alerts instead by @nickscamara in #398
- Log Fire-engine page errors by @nickscamara in #394
- Sitemap fallback fixes w/ fire-engine by @nickscamara in #399
- Separate Rate Limiter from Main Redis by @nickscamara in #417
- [Feat] Pass along current, total, current_step, and current_url in js sdk by @jhoseph88 in #391
- Redis Health Checks by @nickscamara in #424
- Small fix for empty pageOptions by @rafaelsideguide in #420
- Caleb: Return a list of links on a page by default by @calebpeffer in #423
- [Docs] Updating docs by @rafaelsideguide in #427
- Fix queue stuck bug via lock settings changes by @mogery in #429
- fix(js-sdk): transform tests with ts-jest and configure node by @mogery in #433
- fix(WebScraper): infinite regex leading to fly.io instance hangs by @mogery in #436
- Support chrome-cdp and restructure sitemap fire-engine support. by @tomkosm in #410
- /scrape should now be 600ms-900ms faster by @nickscamara in #447
- Notion Website Fixes by @nickscamara in #453
- Firecrawl UI template by @ericciarla in #458
- Added regex for links in sitemap by @rafaelsideguide in #449
- Added logger by @rafaelsideguide in #450
- Add scrape monitoring by @mogery in #455
- Readiness liveness probes by @JakobStadlhuber in #457
- Client side error handling by @nickscamara in #461
- Admin router + Improve redis notifications by @nickscamara in #460
- support custom models by @NiuBlibing in #414
- DNS Cacheable Lookup to avoid DNS blocking ops by @nickscamara in #478
- Fire-engine to be the default for our cloud-service by @nickscamara in #479
- [Bug] Rate limiter bug on switch for some cases by @rafaelsideguide in #482
- [Feat] Better self hosting guide by @rafaelsideguide in #483
- Fix parallelization issues with caching by @nickscamara in #484
- [Bug] Issue with crawl going beyond Limit by @rafaelsideguide in #485
- [Bug] pdfs and logging pdf events, also added trycatchs for docx by @rafaelsideguide in #475
- [Bug] Nested sitemaps by @rafaelsideguide in #491
- Fix AMZN | Removal of redis alerts by @nickscamara in #492
- Custom engine params fix by @nickscamara in #494
- [Bug] Fixed the empty excludes.filter is undefined bug by @rafaelsideguide in #503
- [Feat] Added fullpagescreenshot capabilities by @rafaelsideguide in #504
- fix(js-sdk): build both CommonJS and ESM versions by @mogery in #432
- fix(js-sdk): build both CJS and EJS versions by @mogery in #508
- Improve logs by @tak-s in #496
- Digikey support by @nickscamara in #512
- [Feat] Add Go SDK implementation by @KentHsu in #497
- Redlock cache in auth by @nickscamara in #529
- Update redis urls in example .env by @wahpiangle in #515
- Self-host fix: Moving comments of .env.example values from end-of-line to above-line. by @kevinswiber in #531
- fix: go-sdk module name by @KentHsu in #534
- Fixed e2e tests by @rafaelsideguide in #536
- Removed obsoleted declaration by @matsubo in #532
- [Feat] Added go-sdk as submodule by @rafaelsideguide in #537
- feat: Move scraper to queue by @mogery in #459
- Reduce metrics ingestion w/ HyperDX v0.8.1 by @nickscamara in #480
- Check team credits based on the crawl limit by @nickscamara in #554
- added variables to beta customers by @rafaelsideguide in #555
- Set the crawl limit to the remaining credits by @nickscamara in #559
- [Feat] Added check job and cancel to fire-engine requests by @rafaelsideguide in #560
- [Bug] Added a way for dealing with DNS without IP resolution by @rafaelsideguide in #561
- Added Sentry Monitoring by @nickscamara in #562
- Internal Concurrency Limits <> Job Priorities by @nickscamara in #566
- [V1] Release by @rafaelsideguide in #527
- [Feat]: Add RUST SDK client for firecrawl API by @Sanix-Darker in #373
- Update README.md by @ericciarla in #590
- [Bug] Moved delete rawHtml to end of controller by @rafaelsideguide in #582
- Fixed RPC cloudflare issues with job ids by @nickscamara in #593
- [v1] Websockets SDKs by @rafaelsideguide in #591
- [v1] LLM Extract by @nickscamara in #586
- [v1] Webhooks by @rafaelsideguide in #594
- fix(v1): maxDepth by @rafaelsideguide in #607
- Ensuring USE_DB_AUTHENTICATION is true in single URL scraper. by @kevinswiber in #516
- Bill team async by @nickscamara in #619
- Optimize check credits func w/ caching by @nickscamara in #624
- Feat: Added go html to md parser by @rafaelsideguide in #608
- Feat: parser singleton by @rafaelsideguide in #629
New Contributors
- @szepeviktor made their first contribution in #8
- @rafaelsideguide made their first contribution in #12
- @nickscamara made their first contribution in #14
- @calebpeffer made their first contribution in #32
- @mattzcarey made their first contribution in #46
- @rogerserper made their first contribution in #61
- @mdp made their first contribution in #68
- @bllchmbrs made their first contribution in #110
- @ericciarla made their first contribution in #109
- @chand1012 made their first contribution in #119
- @eltociear made their first contribution in #27
- @elimisteve made their first contribution in #160
- @mattjoyce made their first contribution in #166
- @tractorjuice made their first contribution in #77
- @JakobStadlhuber made their first contribution in #181
- @qyou made their first contribution in #170
- @simonha9 made their first contribution in #190
- @Keredu made their first contribution in #186
- @rombru made their first contribution in #229
- @NeevJewalkar made their first contribution in #296
- @AndyMik90 made their first contribution in #297
- @Lakr233 made their first contribution in #303
- @100gle made their first contribution in #308
- @snippet made their first contribution in #313
- @george-zakharov made their first contribution in #329
- @Sanix-Darker made their first contribution in #338
- @dependabot made their first contribution in #346
- @StefanTerdell made their first contribution in #374
- @mogery made their first contribution in #393
- @kun432 made their first contribution in #395
- @jhoseph88 made their first contribution in #391
- @tomkosm made their first contribution in #410
- @NiuBlibing made their first contribution in #414
- @tak-s made their first contribution in #496
- @KentHsu made their first contribution in #497
- @wahpiangle made their first contribution in #515
- @kevinswiber made their first contribution in #531
- @matsubo made their first contribution in #532
Full Changelog: https://github.com/mendableai/firecrawl/commits/v1.0.0