Springing into AI - Part 7: Observability
Welcome back, and hope you having a wonderful day. In Part 6 of the series, we had a look at our first exciting chat application where we were finally able to get hands dirty and interact with the LLM that was running on our local machine. Before we get our hands more dirty with more topics, we pause a bit to understand an important concept of "Observability".
Backend applications like the one we created previously require love from us engineers. This isn't the same type of love that was shared in Titanic by Leo and Kate, but more of a monitoring love that empowers us to know the state of our application from some fine grained metrics that is of interest to us so that we may have a behavioral sense of our running application. A typical example, in a Java application maybe wanting to know the amount of memory used in JVM (Java Virtual Memory), or the number of threads in use, information about garbage collection, etc. When we speak of GenAI applications, tokens form our social currency, and it's usage can ramp up quite quickly providing us a fat bill to pay if we not careful. To avoid such a situation we can do something about it. Someone please picture Darth Vader going like "If only you knew the power of Observability".
Observability - Theory
- Spring Actuator: Spring framework provides us actuator endpoints that provide us some key metrics about the state of our backend application, e.g.: health, information. In our application we also enable "prometheus" here. This is of vital important to the whole system, as the exposed endpoint would be used by "Prometheus" tool to collect various metrics for further usage. SpringAI offers variety of such metrics from AI perspective such as model parameters, token usage, etc. More about the various exposed metrics can be found here. The image below shows a metrics observed by our application that is gathered by invoking few prompts.
- Prometheus: As mentioned above, the metrics exposed on a particular endpoint is scraped by this tool. For our AI application, we would definitely like to know the use of number of tokens (input, output, total) so that we can take actions (limit the context window on model parameter as an example) to prevent cost incurring beyond our pocket can afford. The image shown below is from the user interface Prometheus has where we have searched for a particular metric amongst many other options available as an example for illustration purposes.
- Grafana: Okay, so Spring actuators expose the metrics to outside world. Prometheus has scraped these metrics, stored and enabled us to query. Wouldn't it be nice if we can visualize these metrics in form of a dashboard that we can visit and gain insight via different display widgets ?, enters "Grafana" into the picture. The image below shows a sample dashboard observed from our application where we can already see some key information. As we use our application more and more, it would update almost real-time based on the scraping interval.
Special mention here to Dan Vega who is a Spring advocate developer. His work has helped me in this journey learn through his medium of tutorials, videos and articles. The above dashboard that you see is sourced from his work where he had created a typical prometheus-grafana dashboard setup for us to use. For a more in depth tutorial, you can view his youtube video tutorial. In above dashboard, we can see the amount of invocations we have done using our endpoint to prompt with LLM, the response times, the token usage, response time duration amongst others to help us know how insights of our application usage.
Observability - Playground
- Source Code: Can be accessed here
- Dependencies:
- spring-boot-starter-actuator: It enables actuator endpoints that can be scraped by "Prometheus". Through configuration we enable prometheus as one of the endpoint.
- spring-boot-docker-compose: Out setup makes use of docker-compose file namely "compose.yaml" to setup and run Prometheus and Grafana respectively. As part of the container setup, we also load pre built Grafana dashboard template and a configured Prometheus datasource used by it for data population. Using the mentioned dependency allows us to automatically run these containers at startup instead of us manually managing these containers every time.
- Configuration:
- Management: Using spring properties we enable certain metrics and setup time capture intervals that would be used by our tools to help our vision be realized.
- Endpoints:
- Prometheus Spring Actuator: http://localhost:8080/actuator/prometheus
- Prometheus Tool: http://localhost:9090/
- Grafana: http://localhost:3000/

Comments
Post a Comment