Building with the Cloudflare Stack

Building with the Cloudflare Stack

Cloudflare is best known as a CDN / anti-DDoS provider, but is also best-in-class for their domain registrar and R2 (AWS S3-like blob storage but cheaper).
More recently though, they’ve truly been pumping out new offerings, and seem to have maintained great documentation, UX and price throughout. I’ve been intending to build a blog search engine, so I decided to build it with the Cloudflare stack to test it out. Here’s a doc of a plan using their stack + issues I ran into.
This is roughly the plan I was following:
  • A Cloudflare Worker (serverless runner ~= AWS Lambda) runs on a schedule, scraping various configured blogs. It writes .md files to Cloudflare R2, and adds rows to Cloudflare D1 Beta (SQL database). It uses cheerio and turndown to extract the right DOM elements and turn them into markdown.
  • Another Cloudflare Worker, running every minute, checks the Cloudflare D1 db for databases that haven’t yet been embedded. It then gets the text from R2, chunks them, embeds them using bge-*-en-v1.5 via Cloudflare Workers AI Beta (serverless AI offering ~= Replicate), and then puts them in Cloudflare Vectorize Beta (vector DB ~= Pinecone).
  • A search front-end served on Cloudflare Pages via Next.js offers a search box, and calls out to Workers AI to embed queries, searches for them in the Vectorize DB, and then reranks the original documents using Cohere Rerank (the only non-Cloudflare offering!)
But, I ended up giving up on the Cloudflare mono-stack for this task because:
  • Vectorize limits topK to 20: A limit of 20 retrieved results just isn’t enough for this sort of task, where ideally the vector DB would return the top 100 (or more) chunk matches to be then reranked with Cohere.
    • This is a beta limitation that will be fixed in the future
  • Vectorize is slow: With a small test database of only 200 vectors, a single query vector and a target of 10 results, Vectorize took 400-700 ms for a 384-dim embedding, and 1000-1500 ms for a 1024-dim embedding.
    • By comparison, Turbopuffer promises <100ms latency for warm vectors, and still a p90 of 300ms even for cold vectors. They note that most providers in this space promise 10-200ms.
  • Workers AI is slow (but maybe cloud embedding is bad): Workers AI seems to take 200-1200ms to return vectors for a single query.
    • By comparison, the same model takes 5ms via transformers.js on a local node.js instance.
    • Embedding is a cheap enough op that perhaps it should be run alongside your main server. Regardless, there is a failure of Workers AI to leverage that well.
    • Models also often seem to be cold taking ~1-4s to embed on the first try, which is very surprising given that CF offers only ~10 models.
Notably - all these above measurements were made from a Cloudflare Worker, so they should be providing optimistic estimates of network latency.
Other than the major blockers, there are also some schleps:
  • Workers AI is definitely beta: The docs are incompleta, and the errors it throws are unhelpful. Also, the batched API signature doesn’t seem to work at all. But it’s a beta.
  • The edge runtime is still immature: Several JS packages seem to still make some assumptions of node, or node || browser. I ran into this with turndown where the latest release didn’t work out of the box, but fortunately someone had an open PR with a fix.
  • Scraping needs a proxy: This isn’t really a Cloudflare-specific issue, just something that I didn’t see coming. Datacenter IPs are blocked by default by Cloudflare (lol) and using a conservative rate limit won’t help you.
As of the time of writing, I’ve stopped building this midway through to migrate stacks, so I don’t yet know how well Cloudflare Pages performs.