We would like to thank the following people for their contributions to Crawl4AI:
- Unclecode - Project Creator and Main Developer
- Nasrin - Project Manager and Developer
- Aravind Karnam - Head of Community and Product
- aadityakanjolia4 - Fix for
CustomHTML2Textis not defined. - FractalMind - Created the first official Docker Hub image and fixed Dockerfile errors
- ketonkss4 - Identified Selenium's new capabilities, helping reduce dependencies
- jonymusky - Javascript execution documentation, and wait_for
- datehoer - Add browser prxy support
- dvschuyl - AsyncPlaywrightCrawlerStrategy page-evaluate context destroyed by navigation #304
- nelzomal - Enhance development installation instructions #286
- HamzaFarhan - Handled the cases where markdown_with_citations, references_markdown, and filtered_html might not be defined #293
- NanmiCoder - fix: crawler strategy exception handling and fixes #271
- paulokuong - fix: RAWL4_AI_BASE_DIRECTORY should be Path object instead of string #298
- TheRedRad - feat: add force viewport screenshot option #1694
- ChiragBellara - fix: avoid Common Crawl calls for sitemap-only URL seeding #1746
- YuriNachos - fix: replace tf-playwright-stealth with playwright-stealth #1714, fix: respect
<base>tag for relative link resolution #1721, fix: include GoogleSearchCrawler script.js in package #1719, fix: allow local embeddings by removing OpenAI fallback #1717, docs: add missing CacheMode import #1715, docs: fix return types to RunManyReturn #1716 - christian-oudard - fix: deep-crawl CLI outputting only the first page #1667
- vladmandic - fix: VersionManager ignoring CRAWL4_AI_BASE_DIRECTORY env var #1296
- nnxiong - fix: script tag removal losing adjacent text in cleaned_html #1364
- RoyLeviLangware - fix: bs4 deprecation warning (text -> string) #1077
- garyluky - fix: proxy auth ERR_INVALID_AUTH_CREDENTIALS #1281
- Martichou - investigation: browser context memory leak under continuous load #1640, #943
- danyQe - identified: temperature typo in async_configs.py #973
- saipavanmeruga7797 - identified: local HTML file crawling bug with capture_console_messages #1073
- stevenaldinger - identified: duplicate PROMPT_EXTRACT_BLOCKS dead code in prompts.py #931
- chrizzly2309 - identified: JWT auth bypass when no credentials provided #1133
- complete-dope - identified: console logging error attribute issue #729
- TristanDonze - feat: add configurable device_scale_factor for screenshot quality #1463
- charlaie - feat: add redirected_status_code to CrawlResult #1435
- mzyfree - investigation: Docker concurrency performance and pool resource management #1689
- nightcityblade - fix: prevent AdaptiveCrawler from crawling external domains #1805
- Otman404 - fix: return in finally block silently suppressing exceptions in dispatcher #1763
- SohamKukreti - fix: from_serializable_dict ignoring plain data dicts with "type" key #1803, fix: deep-crawl streaming mirrors Python library behavior #1798
- Br1an67 - fix: handle nested brackets and parentheses in LINK_PATTERN regex #1790, identified: strip markdown fences in LLM JSON responses #1787, fix: preserve class/id in cleaned_html #1782, fix: guard against None LLM content #1788, fix: strip port from domain in is_external_url #1783, fix: UTF-8 encoding for CLI output #1789, fix: configurable link_preview_timeout #1793, fix: wait_for_images on screenshot endpoint #1792, fix: cross-platform terminal input in CrawlerMonitor #1794, fix: UnicodeEncodeError in URL seeder #1784, fix: wire mean_delay/max_range into dispatcher #1786, fix: DOMParser in process_iframes #1796, fix: require api_token for /token endpoint #1795
- nightcityblade - feat: add score_threshold to BestFirstCrawlingStrategy #1804
- phamngocquy - identified: raw HTML URL token leak #1179
- AkosLukacs - docs: fix docstring param name crawler_config -> config #1494
- dominicx - docs: fix css_selector type from list to string #1308
- hoi - fix: add TTL expiry for Redis task data #1730
- maksimzayats - docs: modernize deprecated API usage across 25 files #1770
- jtanningbed - fix: add newline before opening code fence in html2text #462
- Ahmed-Tawfik94 - identified: redirect target verification in URL seeder #1622
- hafezparast - identified: PDFContentScrapingStrategy deserialization fix #1815; fix: screenshot distortion, deep crawl timeout/arun_many, CLI encoding #1829
- pgoslatara - chore: update GitHub Actions to latest versions #1734
- 130347665 - feat: type-list pipeline in JsonCssExtractionStrategy #1290
- microHoffman - feat: add --json-ensure-ascii CLI flag for Unicode handling #1668
- sufianuddin - fix: Documentation for JsonCssExtractionStrategy
- tautikAg - fix: Markdown output has incorect spacing
- cardit1 - fix: 'AsyncPlaywrightCrawlerStrategy' object has no attribute 'downloads_path'
- dmurat - fix: Incorrect rendering of inline code inside of links
- Sparshsing - fix: Relative Urls in the webpage not extracted properly
We also want to thank all the users who have reported bugs, suggested features, or helped in any other way to make Crawl4AI better.
If you've contributed to Crawl4AI and your name isn't on this list, please open a pull request with your name, link, and contribution, and we'll review it promptly.
Thank you all for your contributions!