Springing into AI - Part 8: Chat Memory
Welcome back. Growing up, or at some stage of your life if you ever watched Batman or numerous series and movies made of that superhero, you would have encountered the famous line being re-iterated every time "I am Batman" to criminals (Joker excluded). Like these forgetful criminals who needed that reminder every time of his identity, when we interact with LLM's, our intelligent models also forget our interactions and have no recollection of what was asked to them previously, thereby making them completely stateless and preventing end user to have a "conversational" chat with it. In this part of the series we solve this by learning about "Chat Memory", so let's get into it. You are welcome to skip the theory section and jump to playground directly.
Chat Memory - Theory
- In-Memory: This mode of persistence stores the context of conversation for a particular user session in a ConcurrentHashMap that has a signature of type 'ConcurrentHashMap<String, List<Messages>>'.
- The key 'String' here holds the conversationId isolated to each user for a session.
- The value 'List<Messages>' holds conversations belonging to a user session including both request/response to and from the LLM. As mentioned earlier, whenever a new user prompt is sent to the LLM, the conversational history is included as part of it to enable LLM to have more profound context for it to response accordingly.
- By default, a maximum of 20 messages can be held for a session per conversationId. This is obviously configurable based on your requirement.
- Caveats: For production, it maybe worth considering some pitfalls of this approach:
- Distributed Systems: In a distributed environment, each instance of the application would contain their own state. When a traffic is routed to a particular instance, it's state of in-memory may differ from another instance leading to erratic experience.
- State: Should a particular instance crash out, the entire history would be lost
- Memory: Since both request and response are stored in memory, it can grow quite large depending upon the volume of interactions between user and your application and won't scale for high demand.
- Persistence: This mode of database storage offers a single source of truth which leads to far better consistency and state of data. In a distributed system, all instances of the same application would be retrieving same level of information. SpringAI offers a wide variety of options to chose from for either relational or non-relational databases. For demonstration purposes we would be using Postgres relational database for our playground, as it quite commonly adopted in enterprise production application.
- conversation_id: This holds a unique conversation identifier value per user and can be used to identify a user activity on our application especially when we support multi user interaction. If we don't explicitly specify the value, a value of "default" is used.
- content: As name suggests the holds the text for both request and response to and from the LLM. Even though the content is "text", we should be observant of the length of context window we allow for our LLM. This can be dictated by configuring ChatModelOption where we can specify size configuration for context window
- type: This holds the value for the type of record and can be of "USER, ASSISTANT, SYSTEM, TOOL". The examples to follow we will see some of these being observed.
- timestamp: Merely an audit for when the particular activity took place in application.
.webp)
- Advisor: We enrich the request with conversational history using Advisor. Spring offers two that we can use here namely, MessageChatMemoryAdvisor and PromptChatMemoryAdvisor. The difference here is that, the latter builds conversational history into system prompts, while the former builds it as part of user prompts.
- ChatMemory: By default, spring offers a concrete implementation for this type using the MessageWindowChatMemory, which has a default rolling window of 20 message(s) that is configurable based on your need. At any given time, it will contain the default/configurable number of messages and keep discarding the older from its buffer/persistence. There are caveats here to consider depending on the application use case, as you may or may not want to hold sufficiently large/small amount of messages. Keeping it for long may provide more tokens into the LLM, thereby incurring cost since it also provides entire conversational history. This would result in unnecessary cost (vendor dependent) as tokens once again are the social currency of GenAI applications. Internally this holds a contract for ChatMemoryRepository.
- ChatMemoryRepository: As discussed above, we two different varieties of it. In the figure above, we can see the various concrete implementations. For relational, you can see the different vendors as of this writing available for us to use. The non relational ones such as "Apache Cassandra", "Neo4J" are their own unique implementations. For each of the supported relational database, SpringAI has the relevant schema setup out of the box (schema and database interaction), provided we configure the database configurations correctly. A sample schema used for Postgres is shown in the "Persistence" section.
Chat Memory - Playground
- Source Code: Found here
- Added Endpoints:
- http://localhost:8080/chat/generic
- http://localhost:8080/chat/in-memory
- http://localhost:8080/chat/db-memory
- http://localhost:8080/chat/db-user-memory
- Added Dependencies:
- postgres
- spring-ai-starter-model-chat-memory-repository-jdbc
- Added container:
- Configuration:
Run 2: http://localhost:8080/chat/in-memory
Run 3: http://localhost:8080/chat/db-memory
Run 4: http://localhost:8080/chat/db-user-memory
You always will have a use case where multiple users use your application. Typically we can use some sort of sessionId or a userId to identify these users in the backend. A similar setup is done in "ChatClient". What we doing in our case is we taking a "user-id" as an incoming header request and then using that value to vary the value for conversation_id. Code for such a setup is shown below:
The highlighted section above, shows that we varying the conversation_id now based on the passed userId from our header in the request. Now since Zack Snyder's Justice League 2 never got made and we have to settle for James Gun vision now, our two users for demo will be Superman and Batman. The images below show the outcome of each run for a different user. Pay special attention to the curl where I indicate the "user-id" value. (Please ignore the Cookie JSESSIONID - it has no bearing).
User: Batman
.webp)
Comments
Post a Comment