Springing into AI - Part 6: Chat Client
Problem
We are creating a GenAI application, and would like to enrich user experience such that they can have the ability to interact/query relevant information with the chosen LLM. Beyond the basics, we should also have a mechanism to fine tune the response generated by LLM, ability to switch between different LLM's and providers, and a pattern through which we can tap into the request/response to and fro from the LLM.
Solution
Spring AI offers us a powerful class called ChatClient. This is the core class that we would use as developers to empower our applications to interact with the configured LLM. The heavy lifting of the underlying operations is all done by SpringAI. From an architecture perspective, on a high level, the fundamental design for our GenAI applications would look like below.
From figure above, the user (that is us or a front end application), would be using the API endpoints we expose via our application. These API's would then internally make use of ChatClient offered as part of SpringAI configured to interact with an underlying chosen LLM.
Setup
- LLM: llama3.2:1b running locally (port: 11434) using Ollama setup
- Java: Amazon Corretto 17
- Spring: 3.5.4
- Spring AI: 1.0.1
- Testing: Postman
- Source Code: here
Demo Screenshots
- Our GenAI application is only capable of telling jokes. This we have done on purpose to learn how to put guardrails in the system to ensure in real world application that our LLM only responds within the context boundary of the domain, not beyond it.
- Positive Scenario: The image below presents a sample response when we ask for a joke.
- Negative Scenario/Guardrails: The image below presents a sample response when we ask our application to do something that it should not entertain at all. This is a vital section of responsible AI when running in production systems to ensure we restrict the scope of LLM operation only to the boundary of our contextual domain.
- Property Configuration: In its basic form, SpringAI provides a very powerful and flexible mode of configuring the underlying LLM we may wish to use. This can be LLM we run locally or a LLM that we may use through a third party provider. It has different dependencies for each of the third party provider (e.g: OpenAI, AWS Bedrock etc) that it supports, and it continues to grow. Under the hood it does the heavylifting for us while making our lives simpler through configuration.
- Line 1 - 3: We enable a logging mechanism where we can inspect the activity around nature of our request/response between the application and LLM. This is vital as it can help us debug
- Line 5 - 10: In this playground we are running a LLM on our machine locally, so we configure the Ollama base URI and the chat model. It is to be noted that this configuration would be vendor dependent and require vendor dependency in the build tool to be present. For example in case of AWS Bedrock, you may be required to provide AWS credentials, similary for OpenAPI, its API key maybe required.
- Chat Client Configuration: It is a good practice to keep the system flexible from configuration point of view, rather than piling all configuration in one method. It helps with loose coupling and helping better manage and maintain.
- Ling 16: Configures a basic instancy of a ChatClient from the builder. We will configure various other aspect of the chat client in its own configuration adhering to loose coupling and ease of maintenance.
- Line 21 - 25: There are certain inference parameters (Temperature, TopK, TopP, Penalty, MaxTokenx, etc) that can aid in fine tuning the LLM. Chat Options is a useful mechanism in this case to configure these options. It is to be noted, the above is only shown for visiblity where default values are used for each of the inference parameter.
- Line 29 - 33: Advisors are a very powerful feature of chat client which allows the freedom and flexibility of tapping into the nature of request before going to the LLM, and the response after its being obtained from the LLM. Since we want to know what is happening with the LLM, we use a handy utility SimpleLoggingAdvisor that helps us gain some metrics w.r.t our token usage.
- Line 37 - 46: Responsible AI is a crucial element when developing Gen AI application. In this particular case we demonstrate Guardrails where we instruct the LLM define its boundary of operation of what it is and is not capable of doing. This in production world helps prevent malicious user from using prompts to gain information they shouldn't.
- Chat Client: We created a basic dummy endpoint where user can send the request, we process that request and use ChatClient to prompt the LLM for a response. The call( ) shown below propagates the request along with our SystemMessage (used as guardrails) to the LLM. The response obtained in form of ChatClientResponseSpec contains the LLM response along with additional metadata structured around promptTokens, completionTokens, totalTokens. Each of such information is essential as tokens are the currency when developing GenAI application, and can cost your pocket a lot if not monitored carefully.

Comments
Post a Comment