Springing into AI - Part 2: Generative AI
Ah yes, the wonderful world of Generative AI. We use this now in our daily life for wide range of applications each solving different use cases depending on the manner on how we choose to adopt it. In part 2 of the series we will look at some of the finer details of how this all actually works, and some factors to consider on choosing the foundation model for the job.
At the heart of generative AI, resides our best friend called Foundation Model. These are basically Large Language Models (LLMs) that are pre-trained neural networks on information from wide range of sources, such as internet, publications, research materials, etc. In the ever growing rate race, there are wide variety of vendors each offering their own LLM that differ in it's performance, accuracy, predictability, size, number of input parameters, cost to name a few. Some of the examples are AWS offering Titan, Facebook Meta offering LLAMA, Anthropic offering Claude, Google offering Bard, etc. The image below presents a historic summary of the LLM evolution.
That's a lot, and it continues to grow faster than my body can gain or loose weight. If only (sigh) I had an LLM that can help me fine tune my body. Keeping focus back in check, the figure below depicts a high level view of the LLM eco-system and some additional components to it such as prompt-engineering, configuration parameters that can aid in how we best get a diverse response from it. Once again, this isn't Peter Jackson's Lord of the trilogy that comes with extended edition which will contain every little detail, but enough to solidify our understanding of it.

- Tokenization: In this process, the prompt is received, and the words are broken down into series of "tokens". Head over to Tokenizer to get a feel of tokens based on the input you enter.
- Context Window: The context window is basically the maximum number of tokens that the LLM model can work with. One of the key factors when choosing the right model for the job is to look at the size capacity of the number of tokens a LLM supports. A bigger sized model allows to have vast range of input and vice versa for small sized specification. It is to be noted that tokens act as a currency when working with LLM and you would pay for the usage of these models based on their respective cost structure for it.
- Neural Network: The magic box, to put it simply is massive amount of inter connected pre-trained neurons as mentioned earlier on vast range of information to work collectively together to produce an output. The dimensions in which these network operate is something that our brain can't even comprehend as maximum we can understand is 2D, 3D. So the resultant outcomes of these network presents an endless possibility with varying amount of probability. The "configuration" element that you see there helps control the probability, creativity of the output these network presents as mentioned earlier. We will look at these configurations in the next part of the series along with prompt engineering.
- De-Tokenization: The output generated here is nothing but mere numbers to the network. These to us represent TokenId's that we saw earlier in tokenization. In this process, these are then re-constructed to words, and the resultant output then presented to the end user.
- Model capabilities, i.e. is a multimodal (image, video, text, etc. ), or purely textual based
- Model Performance, i.e. how is the latency of it
- Maximum number of tokens it can support for input and output
- Licensing costs
- Operations the model can do, for example is it capable of doing function, tool calling. This is useful when we develop our own custom MCP's (Model Context Protocol) servers. This will be discussing in one of the latest series.
- Model accuracy and inference
Comments
Post a Comment