Springing into AI - Part 6: Chat Client
Welcome back and hope you having an absolutely marvelous day. I am so excited to finally get you coding your first very basic chat client application in Java using SpringAI. Don't worry, I promise this is not as scary as some of the Conjuring or Annabelle movies. In Part 4 and Part 5 of the series we had a brief look at the core composition of a very important Java class "ChatClient", that is being offered by SpringAI, and some options at our disposal for working with LLM's, be it locally on your own machine or using third party providers. By the end of this one, we would connect it all together with mission to have our own chat client that we can interact with, and using this as the foundation for future articles as we embark like Bilbo Baggins on this adventurous journey. Let's get started
Architecture
Development Setup
- LLM:
- Rancher for Desktop: I tried using it with OpenUI extension. This requires presence of Ollama on the machine, if absent, it provides option to have it running it as a docker container. This setup when I experimented with LLM interaction via our application produced very slow response times for every prompt.
- Ollama: Since we would be playing a lot with LLM, for better performance and faster response, I pivoted to installing Ollama and then pulling the llama3.2:1b model from its library locally for experimentation. If we look at the model offerings (parameters, context length, etc) it would be a good one to get hands dirty. Please feel free to try others.
- Java: Amazon Corretto 17
- Spring: 3.5.4
- Spring AI: 1.0.1
- Build Tool: Maven
- Development IDE: IntelliJ Community Edition
- Postman: This tool will allow us to test our API endpoints as we interact with the application
- OS: Windows (we work with what we have)
- Source Code: Article Source Code
Journey
- Configuration: Spring loads its application configuration from either a properties or a YAML file. Depending upon the flavor of your choice you can choose one you feel comfortable with. I am going with YAML. Our configuration looks like below:
- logging advisor: This enables viewing of the logs in the console when Spring AI interacts with the model. We have to explicitly enable in the code that we would be using the logging to view it though through means of an advisor.
- base-url: Ollama installed locally on our machine runs on port number 11434. The URI specified here helps Spring AI internally invoke our prompts to the underlying LLM model. If you are using third party provider directly, then this may not be required and you would have to configure the provided API key, or their specific configuration.
- model: This specifies the LLM we would be using. Feel free to download and play with other models offered.
Play 1: Generic Chat Response
- Chat Client: As previously discussed in Part 5 of the series, "Chat Client" forms the heart of the entire operation. It is using this client that we can empower ourselves to communicate with the underlying LLM for wide variety of tasks.
- SimpleLoggingAdvisor: Spring AI offers us the flexibility of advisors, that offer us as a way to inject our custom code before and after the request and response is made and obtained. The class used here is one that spring offers out of the box for us to use. As we continue on our journey, we will play with different advisors.
- http://localhost:8080/chat/generic: This is our first endpoint we create and we can use this to POST our custom prompts to. The code above simply takes our prompt and instructs Spring AI to invoke our request to the LLM using call( ). The content( ) function here returns the actual content obtained from the LLM.
- Sample Run: Great, we have everything we need, how simple is that ? So exciting...I know the fingers are itching to just start the application and do some dry runs and see our AI app we build shine. Below is sample from some of the run's for our basic application. In subsequent post, we will build on this to make it more powerful. Feel free to try your own prompts. 😊.
Play 2: Token Metadata
- http://localhost:8080/chat/tokens: This is our second endpoint that we expose via our awesome application. The difference between this and the first is merely the content of our response that we return. The first returned the direct response text, while invoking "chatResponse( )" returns the content as well as some metadata providing the token usage to us
- Sample Run:
If we look at a sample of the snippet response above, we are presented with some more information about the token statistics such as promptTokens, completionTokens from our invocation. This is useful to us as it indicates the amount of tokens we have utilized, and if not running LLM locally, and instead using a third party provider would have to incur costs for the usage based on the varying costing model each vendor provider employs.
.webp)
Comments
Post a Comment