Springing into AI - Part 2: Generative AI

     Welcome, In this post of the series we continue our journey by looking into the wonderful world of Generative AI. Specifically focussing our core towards Large Language Model (LLM) that form the backbone of GenAI applications. We will understand on high level their way of working, factors to consider when choosing a foundation model, and certain options available in fine tuning its behaviour. 

Large Language Model (LLM)  

    These are AI models composed of billions and billions of neurons that are pre-trained upto a certain moment in time from wide range of sources, such as internet, publications, research materials, etc. While they are very powerful, their pre-trained nature imposes a limitation as depending on a use case they may not be capable of answering anything beyond its finite knowledge base. In subsequent posts we will cover sections that can help mitigate this based on our use cases. The figure below presents an evolution of models available in market.



    

    LLM's continue to be adapted at an alarming rate, and as a result lot of vendors now exist each offering their own flavours of foundation models variying in their capabilities. Some of the examples are AWS offering Titan, Facebook Meta offering LLAMA, Anthropic offering Claude, Google offering Bard, etc. The figure below presents a landscape of how under the hood our awesome AI application that we develop would look like when interacting with these LLM's showcasing some of its capabilities. 


    
    From figure above, an end user submits a Prompt to our AI application. These prompts are mostly textual in nature and contain information the user is requesting of LLM. This can range from contextually relevant information, to content generation, etc. The LLM then internally carries out a process of Tokenization, Processing and De-Tokenization as its underlying operation to answer user request. The figure below provides an illustration of this process.




        Right then, so if we break it down:
  • Tokenization: In this process, the prompt is received, and the words are broken down into series of "tokens". Head over to Tokenizer to get a feel of tokens based on the input you enter. 
  • Context Window: The context window is basically the maximum number of tokens that the LLM model can work with. One of the key factors when choosing the right model for the job is to look at the size capacity of the number of tokens a LLM supports. A bigger sized model allows to have vast range of input and vice versa for small sized specification. It is to be noted that tokens act as a currency when working with LLM and you would pay for the usage of these models based on their respective cost structure for it.
  • Neural Network: The magic box, to put it simply is massive amount of inter connected pre-trained neurons as mentioned earlier on vast range of information to work collectively together to produce an output. The dimensions in which these network operate is something that our brain can't even comprehend as maximum we can understand is 2D, 3D. So the resultant outcomes of these network presents an endless possibility with varying amount of probability. 
  • De-Tokenization: The output generated here is nothing but mere numbers to the network. These to us represent TokenId's that we saw earlier in tokenization. In this process, these are then re-constructed to words, and the resultant output then presented to the end user. 

Model Evaluation

        As discussed above, there are plethora of vendors in the AI market today, each offering their own foundation model. Choosing the right one varies from personal preference, to trial and error approach, to recommended based on use case from industry. Regardless of your comfort level, there are certain factors that should be considered when choosing the foundation model:

  • Capabilities: Some models are only textual while other are multi-modal in sense that they can do textual, image, video etc. While selecting the model, have a look at its capability to see if it matches your use case.
  • Performance: This is a measure of how fast is the processing of the model. As we know for end users using our application, the sole responsibility of providing response back to them would fall on how fast the underlying foundation model we have chosen is.
  • Context Window Size: This dictates the size of the input tokens being presented to the model. Large foundation models will have a large value for it resulting in more dimensions on which it can operate upon. Usually larger models will have a larger context window
  • Costs: Use of tokens incurs cost, varying from vendors. These usage can accrue over a period of time, so it is essential to also have a look at the pricing model from the vendor for the foundation model selected. In most cases, this is down to input and output to and fo from LLM. The section of Model tuning next, helps provide some options that can help us restrict the length of the tokens to mitigate cost from burning our pockets.
  • Operations: As we dive more into the series in following posts, we will learn of more capabilities the foundation model can do. Not all models operate in same way. Some models can only act as basic chat bot, while others have more advanced functionalities like took calling, RAG, MCP etc.
  • Accuracy and Inference: Depending upon the model, sometimes the coherent contextual information presented may or may not resonate to the request infered by the LLM. This factor helps us have confidence in the model selected to know that would be capable of a good accuracy.

Model Tuning

Remember back in the days, before MP3 players when we had a music player that we could then fine tune to our liking, be it the bass, volume, audio output type etc. In context of LLM, when we interact with it, we can relive our childhood but this time instead of fine tuning the music player, we can fine tune certain parameters that control the diversity, variety, creativity of its response. Figure below presents the LLM juke box if you will listing certain parameters.



    From figure above:
  • Temperature: This helps us adjust the predictability of a LLM's output. Higher value would result in more creativity and less predictability and vice versa for lower temperature value. 
  • Top-P: Also termed nucleus sampling, helps with managing the randomness of the LLM output. It achieves this by establishing a probability threshold and selecting tokens whose combined sum suprasses that limit. An example can be say "You are feeling ____", with great (probability 0.5), wonderful (probability 0.4), happy (probability 0.9), excited (probability 0.3). From here, bombination of great, excited equals 80%, so with Top-P threshold of say 75%, it would then select either of the words at random. A lower value for it would result in less randomness and same vice versa.
  • Top-K: This helps us adjust the number of tokens considered for evaluation. Each token can be a word based on the n-gram selection of the token. A large value of Top-K would provide lot more options to the model to choose from its large pool size, and small value vice versa.
  • Stop Sequences: When an output is generated from the foundation model, some words can be a bit lengthy eventually resulting in larger tokens being generated. As we know tokens act as a currency when working in GenAI. To stop the foundation model from generating further text, we can use a series of stop sequences that when encountered instructs the model for further text generation.
  • Max Tokens: Tokens form a social currency when working with LLM applications. This parameter helps control the limit of maximum tokens that can be used in the request to it.
  • Frequency Penalty: This helps in discouraging repetitions of the generated text by penalising tokens in how frequently they appear.
  • Presence Penalty: Very similar to frequency penalty with difference being in fact whether the token has occured or not, ignoring the proportionality.

    Okay, this should conclude our basic understanding of LLM's, and a look at the core functioning of it. In the next part of the series, we will look at Prompt Engineering that deals with how best to effectivly communicate with the LLM. Stay tuned...

Comments

Popular posts from this blog

Springing into AI - Part 4: LLM - Machine Setup

Springing into AI - Part 1: 1000 feet overview