Springing into AI - Part 11: Model Context Protocol (MCP) - Theory
In the previous posts, we went through a journey of having fun with learning some of the capabilities provided by SpringAI. This varied in its applicability for different use cases such as RAG, Tool Calling, Conversational Chat Memory, AI Observability, Basic Chat Client. In this post we continue to have fun by learning about Model Context Protocol (MCP). The "in" thing that has revolutionized modern day AI enterprise applications and empowered community to grow tremendously in its offering. For ease and not to get overwhelmed, the post is divided into two parts, the theory and the playground.
Theory
MCP is a protocol specification that was designed by Anthropic and later open sourced. In it's basic form, it helps us to integrate external systems and enrich its capabilities with our client driven AI application offering both bi-directional, ie. client-server, server-client communication and notification approach. These capabilities range from offering static to dynamic content in form of resources, to pre-created prompts, to exposing business functionalities via tools to name a few amongst many other. How powerful this is if you think about it where each server is dedicated to particular unit of work acting as an AI agent capable of performing particular set of tasks and performing them well. The figure below depicts the architectural overview of MCP showcasing various components at play.
Time to pack up, too many components, massive brain explosion. Wait!!, before we give into the complexities and wrap it up, trust me to make this easy for you to comprehend and you will see everything will be 200 OK 😛. I humbly request do not succumb to the length of content, it may look a lot, but it is simple once understood. So let's dive into each of the components below:
- MCP Host: This is the AI application that the user interacts with and under the hood is responsible for co-ordinating and managing one or more MCP Clients. When a request comes in, it is the MCP host that determines which MCP Client should be used and then manages the execution of that request via that client to obtain a response from the MCP Server. Typical example of this includes the Claude, IDE's like Visual Studio Code, Cursor, custom apps etc. You can get a comprehensive list of MCP Client here with features matrix.
- MCP Client: These are instantiated by the Host, and are used to keep 1:1 connection with the Server. The client is responsible for keeping the direct communication with the Server. It differs from Host in the fact that it offers a protocol level component enabling server communication. In addition to this it also provides:
- Sampling: Till now, we as users were in control of asking the Client to communicate with LLM to provide us relevant responses. In Sampling, those roles reverse in such a way that now, the Server requests the Client to communicate with LLM for task completion in helping it analyze the response, and in some cases also get the end user to act upon for input/confirmation/action. This drives an Agentic behaviour while also giving end user i.e us, the complete authority of actioning the task for desired output. Lets understand this a bit better with an example of hotel booking.
- You decided to go on holiday after year of turmoil, and you should after all life is precious and small. You start to plan and decide to use an AI to find hotel accomodation in your travels to majestic and beautiful CapeTown in South Africa. You go to AI application and ask "Hey buddy, how about booking me a nice hotel near a beach in Cape Town, South Africa and ensure that it fits in my budget of a certain amount from X to Y.".
- At this point the Client invokes a Tool (we will discuss this shortly) on a Server, let's say it's called "bookHotelAccomodations" which interacts with a generic API to get hotels and prices.
- At this point the Server needs help from LLM for analysis of the result it obtained containing list of various hotels and price information, attraction near them etc. So, it requests the Client via sampling/createMessage to propel that to LLM for help.
- Human checkpoint 1: Before the client can send that to LLM, it goes to the end user and provides the user the prompt, the context so that they may edit it if required before presenting to LLM. This act empowers us to be in control of it and how we want it to behave.
- Upon human interaction, the request is sent to LLM. This should be familiar to us now as we have been doing this pattern in a series of previous posts. The LLM in turn, processes the request and provides a response.
- Human checkpoint 2: Again before the Client can send that response to Server, it presents the response to the end user, say giving the user in this example, the list of top three hotels based by price and local attraction. This again puts us in control and makes us choose what hotel we want to select and then confirm that user action so that Client can recieve the request and sent to Server for finally acting upon it.
- The Server can then finally act upon the choice user made and make a booking for us. The only thing as human we must do now ourselves is relax, be excited, pack our bags and ready for that holiday that we owed to ourselves. Enjoy :).
- Roots: Your application has resources, but not everything must be open to users. The roots help provide security in a way where Client communicate to Server the intended boundaries they should operate upon. These boundaries are usually expressed for resources by means of URI. The Server can choose to limit itself within the intended boundary or act openly. It is to be noted that these roots can change dynamically as the end user works with different resources, directories, projects, folders. In this case Server receive notification from Client via the roots/list_changed. As a best practice, when working with resources, the Server should query for Roots it can operate upon. Typical examples of intented boundaries through URI can look like below:
- Database: db://customer/orders
- File: file://user/portfolio/investment
- SourceCode: src://java/src/main/
- Web Sites: https://api.something.com/v1
- Elicitation: This helps the Server gather additional information from an end User so it may fill its objective of completing the relevant task using Client as the mediator between the two. Prior to elicitation this was a big mess and developers had to invent their own ways of getting that information from an end User. Let's see how this works with an example use case that we carry over of hotel booking:
- The Server obtained the hotel information thus far, but now to book it needs some more information from the end User, like say confirmation to proceed or if they are interested in travel discounts too. Enter Elicitation whereby the Server sends a elicitation/create message to the Client.
- The Client tries to obtain the relevant information requested by the Server through means of presenting an Elicitation UI component to the end User so that they may act on it in the session required to complete the task. Here care must be taken when presenting the elicitation to the end User that clear context is provided of why it is being request, the context behind it and how it will be used. No Personal Identify Information should be exchanged in this step as it would be a massive security loop hole.
- The Client receives the request and provides that information as additional context to the Server.
- The Server now has everything it needs and can carry on with its merry ways to fulfil the hote booking for your holiday trip.
- MCP Server: These are programs that expose certain business functionality to the end User. We have been doing this in previous post when we learned Tool Calling. The benefit here in MCP architecture is that now can instead of writing custom tools across the product, we can have micro-services of various systems each with their own tool offering in the domain boundary in which they operate. Server provides this to the end user through means of certain "capabilities" namely Prompts, Tools and Resources.
- Prompts: These allows the Server author to provide pre-configured parameterized Prompt Templates to the end user so that they can select the relevant template and provide the parameters required for that template.
- Tools: These are business operations exposed to the LLM through a well defined schema interface comprising of its inputs, outputs and additional metadata that LLM may require so it may invoke the right tool for the job. The LLM knows of the tools on the Server as Client receives a tool/list command to register these tools. The MCP validates the schema provided to ensure it adheres to a particular defined standard. When a User is trying to achieve a task through prompts, the Client receives the request and invokes the LLM. The tool aware LLM infers the user prompt with the description of the tool to achieve completion by invoking tool/call command
- Resources: These are mere structured information presented to the User so that they may select the resource and present it to the LLM for analysis for a certain task. If you thinking RAG from previous post, you not wrong, but from the Server point of view this is means of just providing the relevant information. The Resources provide two patterns, Direct Resources and Resource Templates. Each resource is composed of a mimeType, uriTemplate, description, name, title.
- Direct Resources: These are mere static information relayed to the user in form of a fixed URI. Typical example of this could be like file://company/about.
- Resource Templates: These are more dynamic in nature in sense that these can be parameterized in URI so that information pertaining to certain entity maybe retrieved and presented. Typical example of this is like file://employee/{employName}/about
- Transport Layer: The communication between Client and Server happens via the JSON-RPC message exchange. In terms of actual network communication, we have options of stdio, streamable http (SSE) and http.
- stdio: In stdio, the server applicated is usually packed and deployed in same machine as the MCP Host. If we are using a third party client like Claude Desktop, we would have to provide additional config element where we list the application startup command, its location etc so that the Host can interact with it.
- streamable http (SSE): Here, the Server resides in a different machine from the MCP Host offering several enterprise architectural benefits of risk, scalability, maintainability, loose coupling, etc. The server uses Server Sent Events to emulate real time responses as it keeps on processing the request and obtaining information from it providing a more asynchronous behaviour.
- http: This merely is a typical request-response pattern approach. Just like SSE, here the Server also resides on a different machine other than the MCP Host providing the benefits of doing so from an enterprise perspective.
If you are already snoring, or eyes semi-closed, time to wake up as we are done with the theory of MCP, and although we didn't dive into every nook and corner of it i.e. looking at various commands and its request and responses etc, it should on a surface level help you now solidify your understanding of how MCP works and what all benefits and components it offers. Hope you can appreciate the power of it, and how its re-shaping the way modern day applications are now being developed with more and more of AI revolutionizing the world. In the next part, we flex our theory muscle and put it into practice with development of creating a MCP Server that offers the functionality that we discussed above. Stay tuned....

Comments
Post a Comment