The world is drowning in information, and it is becoming increasingly challenging to find, understand, and use the data we need. Enter Hybrid Search and Retrieval Augmented Generation (HSRAG), a powerful approach that combines traditional search techniques with cutting-edge machine learning models to help us navigate the information deluge and generate contextually relevant content.
Traditional search engines work by parsing documents into chunks and indexing these chunks. The algorithm then searches this index for relevant results based on a user’s query. The most common algorithm for this is the BM25 algorithm, a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It calculates the relevance of each document by considering various factors like term frequency, inverse document frequency, and the length of the document.
On the other hand, Retrieval Augmented Generation is a new paradigm in machine learning that uses large language models (LLMs) to improve search and discovery. The LLMs, like the GPT-3.5-turbo, generate relevant content based on context. For instance, given a heading, they can draft paragraphs, help with journaling, and planning.
In the hybrid model, the system initially uses the BM25 to narrow down the most relevant chunks. It then applies the retrieval augmented generation model to these shortlisted chunks to find the most contextually relevant content. The system also employs semantic search, where it uses an embedding model to convert text into numerical representations or vectors. The system can then compute the cosine similarity between these vectors to find the most relevant results.
To comprehend how Hybrid Search and Retrieval Augmented Generation (HSRAG) operates, we dive into a practical example. Let’s envision an application that assists content creators by providing contextually relevant information based on their queries.
This application utilizes an advanced algorithm combining traditional search methods like BM25 and state-of-the-art language models. Initially, upon a user’s query, the application employs its search algorithm to sift through its data repository (stored as a dictionary object for simplification). This algorithm outputs a shortlist of the most relevant documents considering aspects like term frequency and document length.
# Get hits from opensearch
os_response = query_opensearch(query, os_client, INDEX_NAME)
os_hits = parse_os_response(os_response)
Yet, this is merely the initial step. The application proceeds to refine this shortlist further using its language model. The language model transforms the text from each document into numerical vectors using an embedding model. These vectors are compared for similarity, aiding in filtering the most contextually pertinent results.
# Get hits from semantic index
semantic_response = query_semantic(query, tokenizer, model, doc_embeddings_array)
semantic_hits = parse_semantic_response(semantic_response, embedding_index)
Thus, the application integrates traditional search results (os_hits) and language model results (semantic_hits) to deliver a distinctive, comprehensive response. This response includes ranked text chunks (context) from various documents based on their relevance to the original query.
# Combine os and semantic hits and rank them
context = get_chunks_from_hits(os_hits + semantic_hits)
This practical instance underscores the power and versatility of the HSRAG model. It exemplifies how the system can amalgamate conventional search algorithms with modern machine learning models to yield efficient, contextually relevant results.
It’s worth noting that databases are rapidly evolving to support HSRAG systems. PostgreSQL, a popular open-source relational database, now supports similarity search on vector data (pgvector). Additionally, vector databases like Qdrant offer metadata fields which can be utilized to implement hybrid search for RAG. This paves the way for increasingly efficient and sophisticated systems, offering content creators an even more dynamic tool for sourcing and creating content.
The working example provided in this blog showcases how Obsidian parses documents into chunks and builds OpenSearch and semantic indices on these chunks. It also demonstrates how the system retrieves and ranks documents based on their relevance to a user’s query. This implementation serves as a compelling use case for the potential of HSRAG in practical applications. For a more in-depth understanding, refer to Obsidian’s implementation as discussed in the provided link.
HSRAG has broad implications across numerous industries. For instance, it can be used to augment personal productivity tools. Imagine writing software that, given a topic, can draft initial content for you. This functionality can be a game-changer for content creators, journalists, or anyone who works with the written word.
In the corporate world, HSRAG can be used to navigate through large repositories of documents. It could be product requirements, technical design documents, internal wikis, or even code repositories. By retrieving contextually relevant information from these vast resources, HSRAG can significantly reduce the time spent searching for information.
It’s not just about internal databases either. The hybrid model can augment retrieval with web or internal search when necessary. This could be useful in fields where information gets updated rapidly, like tech news, medical research, or legal precedents. When documents and notes go stale, the system can look up the web or internal documents for more recent information.
Like any technology, HSRAG has its strengths and weaknesses. On the plus side, it’s a highly powerful tool for data discovery and content generation. It can sift through vast amounts of information and retrieve contextually relevant results. This makes it an excellent tool for anyone dealing with large repositories of documents or data.
On the downside, getting HSRAG up and running requires a good understanding of both traditional search algorithms and large language models. It also requires a careful balance between the two - how much to rely on BM25 and how much on LLMs. There’s also the matter of computational resources. Running complex machine learning models on large data sets requires significant computational power, which can be a hurdle for smaller organizations.
In the age of information overload, Hybrid Search and Retrieval Augmented Generation offers a promising way to navigate the deluge. By combining traditional search algorithms with cutting-edge machine learning models, it promises to revolutionize how we search and use information. Whether you’re a content creator, a corporate employee, or just someone looking to make sense of the world, HSRAG could be an invaluable tool in your arsenal. However, it does require a certain level of expertise and resources, so it might not be for everyone. But for those who can harness it, the possibilities are truly exciting.