Springing into AI - Part 6: Chat Client

     Welcome back and hope you having an absolutely marvelous day. I am so excited to finally get you coding your first very basic chat client application in Java using SpringAI. Don't worry, I promise this is not as scary as some of the Conjuring or Annabelle movies. In Part 4 and Part 5 of the series we had a brief look at the core composition of  a very important Java class "ChatClient", that is being offered by SpringAI, and some options at our disposal for working with LLM's, be it locally on your own machine or using third party providers. By the end of this one, we would connect it all together with mission to have our own chat client that we can interact with, and using this as the foundation for future articles as we embark like Bilbo Baggins on this adventurous journey. Let's get started

Architecture

    The architecture for our application for current and future posts (unless advised otherwise when we deal with Model Context Protocol, RAG, etc. ) would flow as following:


    From figure above, the user (that is us or a front end application), would be using the API endpoints we expose via our application. These API's would then internally make use of ChatClient offered as part of SpringAI configured to interact with a local running LLM. 

Development Setup

    For our adventurous journey, we need to ensure our travel bag is packed in with certain setup that would aid us in our AI world as we fight orcs, balrogs and cave trolls (well not literally). A checklist of our items include the following for our trip:
  • LLM:
    • Rancher for Desktop: I tried using it with OpenUI extension. This requires presence of Ollama on the machine, if absent, it provides option to have it running it as a docker container. This setup when I experimented with LLM interaction via our application produced very slow response times for every prompt.
    • Ollama: Since we would be playing a lot with LLM, for better performance and faster response, I pivoted to installing Ollama and then pulling the llama3.2:1b model from its library locally for experimentation. If we look at the model offerings (parameters, context length, etc) it would be a good one to get hands dirty. Please feel free to try others. 


Journey

    We start our journey at start.spring.io. Here we can configure the skeleton our application would have such as dependencies (libraries), build tool, spring version etc. You are welcome to add as many dependencies you want and I encourage you to try out some other AI libraries offered there. For our journey, I configured it with following:


    As our journey continues to grow more adventurous with different perils, we will add more dependencies where applicable. If you see the dependency section, the description of the libraries we using is pretty self explanatory to see what they offer. Clicking "Generate" would allow us download as a zip file which we can extract and open in IntelliJ as a project. 

  • Configuration: Spring loads its application configuration from either a properties or a YAML file. Depending upon the flavor of your choice you can choose one you feel comfortable with. I am going with YAML. Our configuration looks like below:

    • logging advisor: This enables viewing of the logs in the console when Spring AI interacts with the model. We have to explicitly enable in the code that we would be using the logging to view it though through means of an advisor.  
    • base-url: Ollama installed locally on our machine runs on port number 11434. The URI specified here helps Spring AI internally invoke our prompts to the underlying LLM model. If you are using third party provider directly, then this may not be required and you would have to configure the provided API key, or their specific configuration. 
    • model: This specifies the LLM we would be using. Feel free to download and play with other models offered.

Play 1: Generic Chat Response

To try out our basic interaction with LLM, we will create an application endpoint, that we can invoke using Postman, and view our responses from it. The code fragment shown below entails how we can create our "Chat Client". 


  • Chat Client: As previously discussed in Part 5 of the series, "Chat Client" forms the heart of the entire operation. It is using this client that we can empower ourselves to communicate with the underlying LLM for wide variety of tasks.
  • SimpleLoggingAdvisor: Spring AI offers us the flexibility of advisors, that offer us as a way to inject our custom code before and after the request and response is made and obtained. The class used here is one that spring offers out of the box for us to use. As we continue on our journey, we will play with different advisors.

  • http://localhost:8080/chat/generic: This is our first endpoint we create and we can use this to POST our custom prompts to. The code above simply takes our prompt and instructs Spring AI to invoke our request to the LLM using call( ). The content( ) function here  returns the actual content obtained from the LLM.        

  • Sample Run: Great, we have everything we need, how simple is that ? So exciting...I know the fingers are itching to just start the application and do some dry runs and see our AI app we build shine. Below is sample from some of the run's for our basic application. In subsequent post, we will build on this to make it more powerful. Feel free to try your own prompts. 😊. 


Play 2: Token Metadata

The previous play, showed a basic response we got from the LLM that we then presented as part of our API response. Let's try another play where we can see the token metadata, i.e. some information regarding the use of tokens that we incurred as part of our invocation. Tokens, are basically the currency when work with LLM's. Since we playing with a local running LLM, we do not incur any cost, but should we be using third party providers, there are costs associated to it. 


  • http://localhost:8080/chat/tokens: This is our second endpoint that we expose via our awesome application. The difference between this and the first is merely the content of our response that we return. The first returned the direct response text, while invoking "chatResponse( )" returns the content as well as some metadata providing the token usage to us 
  • Sample Run: 

    If we look at a sample of the snippet response above, we are presented with some more information about the token statistics such as promptTokens, completionTokens from our invocation. This is useful to us as it indicates the amount of tokens we have utilized, and if not running LLM locally, and instead using a third party provider would have to incur costs for the usage based on the varying costing model each vendor provider employs. 

        In the next post, we will continue to look at observability which is a vital component in the landscape to know the insights of our application and specifically answer the question how do we know the amount of tokens being used as they form the social currency when working with GenAI applications. Stay tuned....       

Comments

Popular posts from this blog

Springing into AI - Part 4: LLM - Machine Setup

Springing into AI - Part 1: 1000 feet overview

Springing into AI - Part 2: Generative AI