App Page

Indy Jones

Webpage to markdown to searchable knowledge.

Web scraper that turns any URL into clean markdown saved to my knowledge base.

Private Active

Why I built it

I wanted full web pages to become usable, searchable knowledge instead of raw HTML or flat PDFs.

Problem

What made this worth building.

I find things worth keeping all the time. Blog posts, articles, research threads, essays. But saving them is messy. Raw scrapers give unstructured text. PDFs are flat. Nothing is indexed or easily searchable later. Capturing full web pages in a usable format — not raw HTML, not flat PDFs — is harder than it should be. And once captured, it should be searchable, not just stored.

Value

What the app gives back.

Paste a link. It reads the page and formats clean markdown. No cleanup work. The output goes straight into my knowledge base. It handles the messy cases too: JavaScript-rendered content, complex layouts, and weird formatting. One URL in, usable knowledge out. The speed matters. When I find something interesting, I should be able to capture it and move on. The knowledge base does the rest.

Build

Build notes.

Build 01

Built on Cloudflare's latest crawling tech plus a custom solution.

Build 02

Outputs clean structured markdown and handles JS-rendered content with complex layouts.

Build 03

Direct feed into the knowledge base.

Preview

Screens, flow, and product shape.

Target capture

A crawl entry point that starts from a page and a clear extraction target.

Structured extraction

An intermediate state that reduces raw page content into smaller usable parts.

Output review

A review layer that makes the extracted content easier to trust and reuse.

Ecosystem

How it fits with everything else.

Indy Jones feeds into my knowledge base, which Zara searches. PO handles highlights, Indy Jones handles full pages, Transcriber handles audio. Everything becomes searchable through Zara.