Springing into AI - Part 7: Observability
Problem
Tokens as we know form the social currency for our GenAI applications. When we run LLM's locally on our machine this is fine for experimentation, but in production grade applications we would often rely on external providers, and costs start to incur. Wouldn't it be cool if we had some insights, visibility of how our Gen AI application is being used in terms of tokens being used, requests being made so we can take and remediate actions should we wish ?
Solution
Enters Observability, which is a fundamental utility to have in our arsenal that can be our eyes for our application. This cannot be stressed enough on how powerful Observability can be for enterprise applications. Someone please picture Darth Vader going like "If only you knew the power of Observability". In the development community, there exists plethora of tools, some commercial and some open sourced.
- Spring Actuator: Spring framework provides us actuator endpoints that provide us some key metrics about the state of our backend application, e.g.: health, information. In our application we also enable "prometheus" here. This is of vital important to the whole system, as the exposed endpoint would be used by "Prometheus" tool to collect various metrics for further usage. SpringAI offers variety of such metrics from AI perspective such as model parameters, token usage, etc. More about the various exposed metrics can be found here. The image below shows a metrics observed by our application that is gathered by invoking few prompts.
- Prometheus: As mentioned above, the metrics exposed on a particular endpoint is scraped by this tool. For our AI application, we would definitely like to know the use of number of tokens (input, output, total) so that we can take actions (limit the context window on model parameter as an example) to prevent cost incurring beyond our pocket can afford. The image shown below is from the user interface Prometheus has, where we have searched for a particular metric amongst many other options available as an example for illustration purposes.
- Grafana: Okay, so Spring actuators expose the metrics to outside world. Prometheus has scraped these metrics, stored and enabled us to query. Wouldn't it be nice if we can visualize these metrics in form of a dashboard that we can visit and gain insight via different display widgets ?, enters "Grafana" into the picture. The image below shows a sample dashboard observed from our application where we can already see some key information. As we use our application more and more, it would update almost real-time based on the scraping interval.
- Total AI Requests: Number of requests made by the application to LLM.
- Average Response Time: Typical average response time obtained from LLM.
- Success Rate: This helps us to know if we had any errors as the success rate would drop
- Token Usage: On real time basis, the tokens used per request by the LLM.
- Response Time Distribution: This would help us to know for major spikes if any maybe during load as we offer our application in production for large scale users.
- Total Tokens Used: This is the total tokens in overall used by the application.
Playground
- Source Code: Can be accessed here
- Dependencies:
- spring-boot-starter-actuator: It enables actuator endpoints that can be scraped by "Prometheus". Through configuration we enable prometheus as one of the endpoint.
- spring-boot-docker-compose: Out setup makes use of docker-compose file namely "compose.yaml" to setup and run Prometheus and Grafana respectively. As part of the container setup, we also load pre built Grafana dashboard template and a configured Prometheus datasource used by it for data population. Using the mentioned dependency allows us to automatically run these containers at startup instead of us manually managing these containers every time.
- Configuration:
- Management: Using spring properties we enable certain metrics and setup time capture intervals that would be used by our tools to help our vision be realized.
- Endpoints:
- Prometheus Spring Actuator: http://localhost:8080/actuator/prometheus
- Prometheus Tool: http://localhost:9090/
- Grafana: http://localhost:3000/

Comments
Post a Comment