Knowledge Bot

One of the coolest ideas that you can implement with the recent AI developments is a ChatBot that has access to your own knowledge-base which can be queried with natural language to provide semantic signifiant responses with references. The knowledge-base can be some book you've written, a corporate knowledge base like Confluence or DokuWiki or some collection of documents that you have somewhere.

How can you do this?

The main flow is the following:

Get a sentence embedding generating model to convert the text from your knowledge base, sentence by sentence, to a set of vectors that for a machine will provide the means to see if two sentences are alike. We'll be using a pre-trained sentence transformer from sentence bidirectional encoder representation from transformers
Create a service that will periodically scan your knowledge-base for changes and process the new text it finds with the sentence embedding we've selected, storing the data in a machine-readable format.
Use a vector database that allows storing and searching through the embeddings. We'll be using Redis as this is a very fast RAM-resident database that allows quick search through our embeddings using various metrics.
Create a service that will take your question, create an embedding, search it through the vector database and then create a prompt that a large language model will be able to convert into an answer. From the vector database you will be getting the top N most similar sentences that match the question.
Create a serice that takes as input the top N most similar sentences you found previously, create a special prompt from them that we'll be passing to a Large Language Model - we'll be using llama2 released recently by Meta - to create the answer for the question based on the found similar sentences.
Create a UI that interacts with the previously described sentence encoder and large language model.