Feature: Add Chrome support, rate-limit retry logic, and update docs by theFuribundi · Pull Request #40 · timf34/Substack2Markdown

theFuribundi · 2026-02-03T05:32:21Z

This PR refactors substack_scraper.py to add cross-browser support and improve stability when scraping large archives.

Dual Browser Support: Added support for Google Chrome via webdriver_manager. The script now accepts a --browser argument (chrome, edge, or auto) and attempts to auto-detect the available browser if one fails.
Rate Limit Handling: Implemented exponential backoff and retry logic. If a 429 Too Many Requests or empty template is detected, the script now waits and retries instead of crashing or skipping the post.
Smart Skipping: Moved the sleep timer inside the existence check. The script now verifies if a local file exists before pausing, allowing it to "fast-forward" through already downloaded posts without unnecessary delays.
Human-like Behavior: Replaced fixed sleep timers with randomized "jitter" (10-20s) and added wait times for JS rendering to prevent detection and ensure content loads fully.
Improved Logging: Switched print statements to tqdm.write to prevent interference with the progress bar.
Documentation: Updated README.md with the correct clone URL and ensured requirements.txt includes webdriver_manager.

Tested on Ubuntu 24.04.

Add Chrome support, retry logic, and smart skipping

eb45685

Provide feedback