Springing into AI - Part 10: RAG

    Welcome back, In Part 9 of the series we had a look at using tool-calling as a means to perform custom business logic functions and presenting the business data obtained from there to LLM so that it may have the capability to respond to prompts more contextually aimed towards a particular business use case. We continue our journey on the pre-trained to specific date shortcomings of a LLM and see how we can adapt solution further where we address the problem of presenting our documents to it using Retrieval Augmented Generation (RAG), so that we can empower end user with information from their prompts about it. Excited ? Let's get into it.


Retrieval Augmented Generation (RAG) - Theory

    RAG can be summarized as a two step process. In the first step, we store our information content through a custom ETL process into a special kind of persistence store named Vector Store where each chunk is stored as a series of multi dimension N dimension vectors through process called Vector Embeddings. In the second step, a semantical similar search is performed between these vectors and the user chat prompt so that it maybe augmented before presented to the LLM for its operation. The figure below presents an overview of the landscape resolving around various components.   


    So my dear friend, from above:

  • RAG - ETL : As mentioned above, this step revolves around pre-loading content dhere you or the domain experts present the relevant content information in form of documents pre-loaded . The ETL process involved is typically composed of Reader, Transformer and Writer. Spring AI offers a series of classes that we can use to adapt the ETL or even the flexibility to create our own custom ETL workflow. Spring AI out of the box provides default implementations for these.
    • Document Reader: This is the first step of the ETL process, where document(s) are being read and passed to transformer for further processing.
    • Document Transformer: In the second step of the ETL, the transformer is responsible for performing operations such as splitting the document into smaller chunks, or modifying the content obtained depending on the use case.
    • Document Writer: This is the last step of the ETL, where the transformed documents be it chunked or modified are then finally stored in the persistence store. For persistence solutions where vector store is supported, Vector Embeddings is the cornerstone of the entire process as it is here where the text, images are stored in form of multi dimensional vectors that are then analyzed further for semantically similarity search against user prompts. In the process of semantic similarity search vector comparison is done using one of different distance type approach such as Cosine, Euclidean or Negative Inner Product. To appreciate the dimensionality involved, as humans our brain at most is capable of processing in 2D or 3D formats, while the AI semantic similar search happens in 1536 (default) dimensions or more. How amazing is that. 
            The figure below gives the various options available to use for our applications at each step. The one's that we would be using for our playground from each of the components is underlined.


  • Chat Client : When a user prompts the bot, their prompt is then compared against embedded vectors as mentioned above for semantically similar texts. The result of the search in combination with the user prompt is then presented to the LLM providing it enough context for it to operate upon to answer the user a meaningful response. Spring AI offers us this functionality for augmenting it through means of a QuestionAndAnswers advisor that takes in our VectorStore and carries out the entire sequence of events discussed.

Retrieval Augmented Generation (RAG) - Playground

    For our understanding, we will use a sample PDF that is a bank statement with fake data and present that as a use case to our application so that as an end user we can prompt on it for information from our awesome AI application, and fingers crossed it does whatever we have discussed above. If not we will use the force to sway the application to behave the way we want it to 😏. Before we dive into the code let's get some admin out of the way with regards to our setup and then we will do a code walkthrough. 

  • Source Code: can be found here
  • Dependencies:
    • spring-ai-starter-vector-store-pgvector : Required for supporting pgvector store
    • spring-ai-advisors-vector-store : Required for using QuestionAndAnswersAdvisor
    • spring-ai-tikka-document-reader : Required for reading PDF's
  • Container: 
    • pgvector : Our chosen vector store implementation which is a Postgres extension
  • Embedding Model:
    • mxbai-embed-large : Required as an embedding model that can pulled from ollama. It is this that would be used under the hood in the transformation phase from PgVectorStore to embed our chunked documents into vectors.
  • API Endpoint:
    • http://localhost:8080/chat/rag : Test endpoint where we will prompt for information against the sample bank statement PDF.

Code Walkthrough

    As mentioned, before we use RAG, we have to provide our supported documents via the ETL process. This in our case is a mocked banked statement that resides in src/main/resources/pdfs/bank.pdf.  To ingest this document at startup we have a class called IngestionService that is doing the ETL for us. Let's look at that below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
public class IngestionService implements CommandLineRunner {

    @Value("classpath:/pdfs/bank.pdf")
    private Resource resource;

    @Autowired
    private VectorStore vectorStore;

    @Override
    public void run(String... args) throws Exception {

        log.info("Beginning to ETL custom document..");

        // Read
        final DocumentReader documentReader = new TikaDocumentReader(resource);
        List<Document> documentList = documentReader.read();
        log.info("Document read");

        // Chunk
        final DocumentTransformer documentTransformer = new TokenTextSplitter();
        List<Document> chunkedDocuments = documentTransformer.apply(documentList);
        log.info("Splitted document into {}", chunkedDocuments.size());

        // Write
        vectorStore.add(chunkedDocuments);
        log.info("Saved documents...");
    }
}

    In above:
  • Line 1: We make use of Spring CommandLineRunner that is used to run custom code once the application starts up. It is here where the method exposed on Line 10 is invoked allowing us to do ETL on the supplied document.
  • Line 4: We supply the document for our demo that we would be using as a Resource that would be supplied to the Document Reader.
  • Line 15-16: We make use of a TikaDocumentReader which is a PDF reader that reads our supplied document. 
  • Line 20-21: Using the TokenTextSplitter as our DocumentTransformer, we split the supplied document into chunks of individual document. 
  • Line 25: We use the spring boot configured vector store from our dependencies which in our case would be the PgVectorStore that acts as a DocumentWriter to store the vectors in the pgvector datasource. This pgvector is an extension to Postgres. For our use case we run this persistence as a docker container.
    That's our RAG ETL done and dusted with, how cool is that ? SpringAI offers more for further extensibility and more options to configure our custom workflows for RAG should we desire. When our application starts it would run this ETL and then store the data into pgvector persistence. This from the database level looks like below. The default schema shown is pre-configured by Spring provided we supply the property spring.ai.vectorstore.pgvector.initialize-schema=true.

    pgvector Schema

    Sample ETL Run

    
    From above, we can see that due to the size of the document which was only a pageor so at most with not much size the auto chunker was only able to split the document into 1 chunk represented by the number of rows. Take my word for it, it you have a sufficient sized PDF, it will chunk it into more chunks. The intrested thing to note above is that if we look at "embedding" column, we can see that our document has been vectorized for semantic similar search processing. 

    From a chat client perspective, everything still remains as per usual, the only difference being that we injected the Spring configured VectorStore into it, and configured an additional advisor QuestionAndAnswers advisor passing it the injected vector store so it can augment accordingly and present to LLM.

    Sample Run

    From our use case we have a PDF document that looks like


    Depending upon the model, the results may vary. Since we using a very basic model, if we prompt it to provide us information to see what our account balance looks like, we get the following result:


    From above, we asked the model to give us some information w.r.t our balance as of 1 March. The result obtained from the LLM concludes that it was able to process its response information based off the document that was augmented via the process discussed above along with the prompt to let us know that our balance was about 40,000 pounds 👦. I implore you to try our your own sample documents and play around with various prompts. 


    How awesome was that above ? So exciting to see it all in action. I hope you have enjoyed the journey so far. In the future post we will further entertain our hunger for knowledge by looking into the most going around buzzword - MCP (Model Context Protocol). Stay tuned...


Comments

Popular posts from this blog

Springing into AI - Part 4: LLM - Machine Setup

Springing into AI - Part 1: 1000 feet overview

Springing into AI - Part 2: Generative AI