Problem

As we introduce more LLM models, we have to adapt our custom tools to each kind of model. This make it very tedious to develop, difficult to maintain and introduces a level of complexity. Often referred to as N x M problem where N is the number of models and M is the number of tools. So, say we using 5 different models, and have 10 different tools we would like to expose. This means we would end up with 50 permutations that we would have to manage, ouch !!.

Solution

Anthropic designed an open source standard known as Model Context Protocol (MCP) that in its simplest form acts as a universal interface allowing LLM's to discover, communicate and securely use our external tools, resources, etc. This reduces the N x M problem by N + M problem. So, given the problem statement, instead of having an overhead of 50 permutations to manage, it reduces drastically to mere 15 ( 5 models and 10 tools ) permutations.

It achieves this by having a client-server architecture where LLM's on the client side speak standardised language of MCP to interact with server that offers vast range of capabilities. The figure below depicts the architectural overview of MCP showcasing various components at play.

Breaking down individual components of the architecture, we learn the following:

MCP Host: This is the AI application that the user interacts with and under the hood is responsible for co-ordinating and managing one or more MCP Clients. When a request comes in, it is the MCP host that determines which MCP Client should be used and then manages the execution of that request via that client to obtain a response from the MCP Server. Typical example of this includes the Claude, IDE's like Visual Studio Code, Cursor, custom apps etc. You can get a comprehensive list of MCP Client here with features matrix.

MCP Client: These are instantiated by the MCP Host, and are used to keep 1:1 connection with the MCP Server using different communication protocols based on JSON-RPC 2.0 standard to encode messages. The three different forms of communication protocol are:

STDIO: Here, the server application is deployed on same machine as the MCP Host. In case if we using Claude Desktop, then we configure our application in its config with additional startup commands, location etc.

Streamable HTTP: This works on the principle of using HTTP POST and GET for client-server communications.

SSE: Like Streamble HTTP, it offers the ability to stream multiple response messages from server to client in real time.

MCP Server: These are our independent deployed isolated services offering business capabilities to empower our GenAI applications for wide range of use cases. The key core key capabilities that we typically offer from the server side revolve around:

Prompts: This allows server to expose prompt templates to clients that can aid in interacting effectivly with the LLM. Clients can discover prompts, retrieve their content and supply argument to customize them.

Tools: These are business operations exposed to the LLM through a well defined schema interface comprising of its inputs, outputs and additional metadata that LLM may require so it may invoke the right tool for the job. Through a series of MCP protocol specifications, the MCP Client obtains the list of tools that are then registered with it to be made aware to the LLM so it can chose the right tool for the job.

Resources: MCP provides a standardised way to expose resources to clients. This allows us to share data be it static, dynamic that provides context to the LLM. The type of content can range based the business use case. Typical examples of this would include company policies, database schemas, application specific instructions, etc. Each resource is identified by a URI. The Resources provide two patterns, Direct Resources and Resource Templates.

Direct Resources: These are mere static information relayed to the user in form of a fixed URI. Typical example of this could be like file://company/about.

Resource Templates: These are more dynamic in nature in sense that these can be parameterized in URI so that information pertaining to certain entity maybe retrieved and presented. Typical example of this is like file://employee/{employName}/about

Additional Capabilities: Some of the additional capabilities that MCP Server provide in making it such a powerful framework also includes:

Elicitation: Typically, we are used to uni-directional approach where Client makes request to the Server. In Elicitation, we experience bi-directional communication where a MCP Server depending on the ask, may require additional information from an end user to complete the task at hand. Imagine air flight booking system, where the to book the flights the Server is requesting information from end user so it can proceed with booking.

Roots: Your application has resources, but not everything must be open to users. The roots help provide security in a way where Client communicate to Server the intended boundaries they should operate upon. These boundaries are usually expressed for resources by means of URI. The Server can choose to limit itself within the intended boundary or act openly. It is to be noted that these roots can change dynamically as the end user works with different resources, directories, projects, folders. Typical examples of intented boundaries through URI can look like below:

Database: db://customer/orders
File: file://user/portfolio/investment
SourceCode: src://java/src/main/
Web Sites: https://api.something.com/v1

Sampling: Similar to Elicitation, we have bi-directional communication the difference being that in sampling, the server is requesting the LLM for inference based on the data provided to it. This is extremly useful and powerful when doing Agentic based applications. Example of this can be the server providing the source code to the LLM and asking it find a bug in it based on the chosen LLM.

Search This Blog

Everything will be 200 OK :)

Springing into AI - Part 11: Model Context Protocol (MCP) - Theory

Problem

Solution

Comments

Post a Comment

Popular posts from this blog

Springing into AI - Part 4: LLM - Machine Setup

Springing into AI - Part 1: 1000 feet overview

Springing into AI - Part 2: Generative AI