Research
Page Index
A RAG approach that replaces chunking, embeddings, and vector databases with hierarchical tree navigation over full documents—trading embedding pipelines for online reasoning.
Have you heard about this new RAG approach? It reportedly removes the need for chunking, embeddings, or vector databases while still maintaining high retrieval accuracy.
In standard retrieval-augmented generation (RAG) pipelines, a document is first segmented into chunks using a predefined strategy. These chunks are then embedded into a vector space and stored in a database. At inference time, a user query is embedded and compared via similarity search to retrieve the most relevant chunks, which are then passed into a language model for answer generation.
This pipeline is effective, but it introduces structural limitations. In large documents, chunking can break continuity, removing formatting, hierarchical structure, and cross-paragraph dependencies. Even with overlap strategies, global context can degrade across segments.
The proposed alternative, often referred to as Page Index, attempts to address this. Instead of chunking, the model first processes the entire document and constructs a hierarchical tree representation. Each node in the tree contains structured metadata such as a section title, a summary, and a page or span reference. Conceptually, this resembles a dynamically generated table of contents embedded in the model's context window.
At query time, retrieval is not performed via vector similarity search. Instead, the model performs hierarchical reasoning over the tree structure. It identifies the most relevant section based on semantic interpretation of the query, navigates to that node, and retrieves the associated content.
If the selected section does not contain sufficient information, the model recursively traverses adjacent or nested nodes until the relevant information is found.
The mechanism can also support cross-referential navigation. For example, references such as "see Table 3" can trigger traversal to linked nodes within the tree, enabling structured multi-hop retrieval over the document representation.
Empirically, reports on financial and structured-document benchmarks suggest strong performance, in some cases reaching very high accuracy relative to standard RAG pipelines.
However, the approach introduces new constraints. First, computational cost increases due to multiple sequential LLM calls required for tree construction, navigation, and retrieval. This introduces higher latency per query compared to embedding-based retrieval.
Second, the entire tree structure must remain within the model's context window or be rehydrated in segments, which limits scalability across large corpora or multi-document corpora.
Third, the system shifts complexity from embedding infrastructure to reasoning overhead, effectively replacing vector search with hierarchical inference.
In summary, Page Index can be viewed as a structured retrieval alternative to traditional RAG, trading offline embedding pipelines for online reasoning over hierarchical document representations.
