Reducto releases Deep Extract

50 points by raunakchowdhuri 2 days ago

We used Reducto and it did struggle with long documents. As we process financial documents going over 300+ pages using Gemini 3 Flash is producing high accuracy extracts super fast.

raunakchowdhuri 1 day ago

We've made a lot of changes in the past few months that make our standard extract much, much better, as well as Deep Extract for documents even longer than that. We'd love for you to give it a try!

willwjack 1 day ago

Any learnings from deploying agents at such massive scale?

raunakchowdhuri 1 day ago

The big one is that LLMs get lazy on repetitive tasks. They'll skip rows or consolidate entries instead of grinding through every last one. So you need verify-and-re-extract loops rather than single-pass processing. Breaking work into sub-agent chunks with explicit correctness criteria defined upfront (e.g., "line items must sum to the stated total") lets the system self-verify autonomously. At scale (28M+ fields), this approach actually outperformed expert human labelers!

skadamat 1 day ago

How does this compare to DataLab (https://www.datalab.to/)

adit_a 1 day ago

We're releasing an open dataset for challenging structured extraction tasks as a starting point for people to do any comparisons soon!
vikp and the Datalab team have done great work in the space, but their structured extraction product is closer to our baseline /extract api since both of those are single pass extractions.
Deep Extract is more accurate than any structured extraction product we've tried, but the approach comes with a very clear cost/latency tradeoff over a single pass extraction. We have free credits if you'd like to do a side by side

nbnn 1 day ago

Irud

cyanydeez 1 day ago

I like to play guess which LLM open source package is that XKCD comic.

Looks like it's something like: https://huggingface.co/docs/transformers/model_doc/layoutxlm

observationist 1 day ago

It's a good harness, sir.