After running into only paid tools or overly complicated setups for turning web pages into structured data for LLMs, I was pretty much tired of this, wanted free open source solution to convert websites to MD format so built Mojo (for NotebookLM, or any RAG-like solution)
After running into only paid tools or overly complicated setups for turning web pages into structured data for LLMs, I was pretty much tired of this, wanted free open source solution to convert websites to MD format so built Mojo (for NotebookLM, or any RAG-like solution)
Mojo it's extremly fast, supports proxy rotation and it's MIT licensed -> https://github.com/malvads/mojo
It should start by looking at robot.txt.