The emergence of advanced AI models has revolutionised the field of natural language processing, enabling machines to analyse, interpret, and respond to human language with increasing accuracy and sophistication. However, despite the significant advancements achieved in these models, some AI-powered assistants, such as ChatGPT, still face challenges in accurately answering complex questions derived from Securities and Exchange Commission filings. Researchers from Patronus AI discovered that even the best-performing AI model configuration, OpenAI’s GPT-4-Turbo, can only answer 79% of questions correctly on Patronus AI’s new test.
Partnering with LangChain, Redis has produced the Redis RAG template, optimized for creating factually consistent, LLM-powered chat applications. By leveraging Redis as the vector database, the template ensures rapid context retrieval and grounded prompt construction, making it a crucial tool for developers to create chat applications that provide responsive and precise AI responses.
The Redis RAG template is a REST API that allows developers to interact with public financial documents, such as Nike’s 10k filings. This application uses FastAPI and Uvicorn to serve client requests via HTTP. It also uses UnstructuredFileLoader to parse PDF documents into raw text, RecursiveCharacterTextSplitter to split the text into smaller chunks, and the ‘all-MiniLM-L6-v2’ sentence transformer from HuggingFace to embed text chunks into vectors. Moreover, it utilizes Redis as the vector database for real-time context retrieval and the OpenAI ‘gpt-3.5-turbo-16k’ LLM to generate answers to user queries.
In a recent interaction with AIM, Redis CTO Yiftach Shoolman said “Your data is everywhere, on your laptop on the organization repository on AWS s3 on Google Cloud Storage, whatever. You need a platform to bring the data to a vector database like Redis. cut to pieces based on the relevant knowledge”.
Criticising ChatGPT he said, “ ChatGPT doesn’t know anything because it was not trained on your data, adding users need to look for the data that is relevant to their request in their knowledge base that they just created.
The RAG template offers deployable reference architectures that blend efficiency with adaptability, providing developers with a comprehensive set of options to create factually consistent, LLM-powered chat applications with responsive and precise AI responses.
LangChain’s hub of deployable architectures also includes tool-specific chains, LL M chains, and technique-specific chains, which reduce the friction in deploying APIs. LangServe is central to deploying these templates, using FastAPI to transform LLM-based Chains or Agents into operational REST APIs, enhancing accessibility, and production-ready.