Microsoft has released GraphRAG, a graph-based approach to retrieval-augmented generation (RAG) that enables question-answering over private or previously unseen datasets. GraphRAG is now available on GitHub.
The tool offers more structured information retrieval and comprehensive response generation than traditional RAG approaches. Accompanying the GraphRAG code repository is a solution accelerator, providing an easy-to-use API experience hosted on Azure, deployable without coding.
GraphRAG employs a large language model (LLM) to automate the extraction of a knowledge graph from any collection of text documents. This graph-based data index can report on the semantic structure of the data prior to user queries by detecting “communities” of densely connected nodes in a hierarchical fashion.
Each community summary describes its entities and their relationships, offering an overview of a dataset without needing to know specific questions in advance.
In recent evaluations, GraphRAG demonstrated its ability to answer “global questions” that address the entire dataset, a task where naive RAG approaches often fail.
By considering all input texts, GraphRAG’s community summaries provide more comprehensive and diverse answers. This method uses a map-reduce approach, grouping community reports up to the LLM context window size, mapping the question across each group to create community answers, and reducing these into a final global answer.
Comparative studies using GPT-4 showed that GraphRAG outperforms naive RAG in comprehensiveness and diversity, with a 70–80% win rate. It also performed better than hierarchical source-text summarization at lower token costs. These results highlight GraphRAG’s efficiency in generating detailed and varied answers from large datasets.
GraphRAG’s potential applications extend to various fields requiring deep data insights. By making both GraphRAG and its solution accelerator publicly available, the developers aim to make graph-based RAG approaches accessible for users needing to understand data at a global level.