Springing into AI - Part 6: Chat Client
Problem
We have the chosen LLM we want to use, we even have the programming language and the framework we want to use to create our awesome Gen AI application. How do we interact with the LLM through the SpringAI framework ?
Solution
Spring AI offers us the powerful class called ChatClient. This is the core class that we would use as developers to empower our applications to interact with the configured LLM. The heavy lifting of the underlying operations is all done by SpringAI. From an architecture perspective, on a high level, the fundamental design for our GenAI applications would look like below.
From figure above, the user (that is us or a front end application), would be using the API endpoints we expose via our application. These API's would then internally make use of ChatClient offered as part of SpringAI configured to interact with a local running LLM.
Machine Setup
For our adventurous journey, we need to ensure our travel bag is packed in with certain setup that would aid us in our AI world as we fight orcs, balrogs and cave trolls (well not literally). A checklist of our items include the following for our trip:
- LLM: The choice is this may vary based upon your needs, but for experimentation and playgrounding, I am going with having a LLM run locally on my machine using Ollama setup and using the llama3.2:1b model from its library.
- Java: Amazon Corretto 17
- Spring: 3.5.4
- Spring AI: 1.0.1
- Build Tool: Maven
- Development IDE: IntelliJ Community Edition
- Postman: This tool will allow us to test our API endpoints as we interact with the application
- OS: Windows (we work with what we have)
- Source Code: Article Source Code
NOTE: The versions of the library and languages mentioned above may change as newer versions are released. Where applicable, the source code would be updated accordingly.
Journey
We start our journey at start.spring.io, a useful online utility where we can derive the skeleton of our project with relevant dependencies and fast track out development efforts.
As our journey continues to grow more adventurous with different perils, we will add more dependencies where applicable. If you see the dependency section, the description of the libraries we using is pretty self explanatory to see what they offer. Clicking "Generate" would allow us download as a zip file which we can extract and open in IntelliJ as a project.
- Configuration: Spring loads its application configuration from either a properties or a YAML file. Depending upon the flavor of your choice you can choose one you feel comfortable with. I am going with YAML. Our configuration looks like below:
- logging advisor: This enables viewing of the logs in the console when Spring AI interacts with the model. We have to explicitly enable in the code that we would be using the logging to view it though through means of an advisor.
- base-url: Ollama installed locally on our machine runs on port number 11434. The URI specified here helps Spring AI internally invoke our prompts to the underlying LLM model. If you are using third party provider directly, then this may not be required and you would have to configure the provided API key, or their specific configuration.
- model: This specifies the LLM we would be using. Feel free to download and play with other models offered.
Play 1: Generic Chat Response
As mentioned ChatClient forms the heart of the operation allowing us to communicate with the LLM. Code below shows a bare metal configuration of setting up ChatClient along with a logging advisor so we can track our usage and monitor request, responses made to and fro from LLM.
We derive a test endpoint that we will invoke from a HTTP client like Postman as an example. You can even use curl should you wish. In this endpoint we will let the user ask a request of GenAI, and let our application respond back.
- http://localhost:8080/chat/generic: The code above simply takes our prompt and instructs Spring AI to invoke our request to the LLM using call( ). The content( ) function here returns the actual content obtained from the LLM. NOTE: There are many options that we can provide in chat client based on our application need, be it doing various levels of prompting (system, resources etc), prompt caching, chat options etc. For simplicity, basic method chaining is shown above.
- Sample Run: Great, we have everything we need, how simple is that ? When we invoke the above endpoint, a sample obtained response from LLM is illustrated below. Feel free to try your own prompts. 😊.
Play 2: Token Metadata
We have learned the fundamental that token is the social currency when developing GenAI application. Surely there should be a way to see how much tokens etc were consumer per request ? Since we playing with a local running LLM, we do not incur any cost, but this is important when we using external provider for our GenAI application.
- http://localhost:8080/chat/tokens: This is our second endpoint that we expose via our awesome application. The difference between this and the first is merely the content of our response that we return. The first returned the direct response text, while invoking "chatResponse( )" returns the content as well as some metadata providing the token usage to us
- Sample Run:
If we look at a sample of the snippet response above, we are presented with some more information about the token statistics such as promptTokens, completionTokens from our invocation. This is useful to us as it indicates the amount of tokens we have utilized, and if not running LLM locally, and instead using a third party provider would have to incur costs for the usage based on the varying costing model each vendor provider employs.
.webp)
Comments
Post a Comment