Springing into AI - Part 3: Prompt Engineering

     Communication forms an essential part of our daily existence. Often, when there is an interaction occurring between individuals, person A provides an input on a particular topic of interest varying in variety only to be absorbed and processed in person B's mind, who then provides a response to it. The quality of this communication is dependent on how well and the manner in which we articulate it, and this is no different to when as humans we interact with the Foundation Model that we learned of in Part 2 of the series. In the third part of the series, we will look at Prompt Engineering, and some fine tuning parameters that we can use to tweak the output of the result the model provides to us.

Prompt Engineering

        When we interact with the foundation model, we often interact with it in form of prompts. These prompts are merely a user's input presented to a pre-trained model that makes a best attempt to answer the prompt it receives. Prompt engineering teaches us how to best interact with the foundation model for something we need,, so that it can provide us a fine quality response in the structure that we require it. The figure below illustrates core elements of a prompt. 



    From the figure above:
  • Context: Here we help the the foundation model be steered towards a particular direction. In a way this helps the foundation model narrow down its scope from vast range of information to only be able to provide an output that feels more resonant towards a topic of interest. An example like "We are trying to build an e-commerce website that can help us in maximizing our profits for sales of second hand clothing items, and providing an end user a world class quality service".
  • Instruction: Input that we provide to the foundation model asking it to help us generate output based on what we ask of it. There are different prompting techniques that can be applied here. Most commonly used are:
    • Zero shot prompting: This is as simple as asking the model directly for an output based on the input we provide. Example: "Please help me understand fundamentals of calculus"
    • One/Two shot prompting: Here, we provide an example to the foundation model along with our prompt so it can know how we expect the output to be for the response it generates. An example could be "Please help me understand process of ordering food at restaurant. Example 1: Ask the waiter for the menu if not presented already, make your selection and present it to the waiter. Example 2: Ask the waiter for chef's special, and if any recommendations they would suggest, and confirm that with water". 
    • Chain of Thought: In this type of prompting technique, we let the model reason its output and present it to us in a step by step format. An example could be "I am newbie developing a website, help me build an address form. Please present me instruction(s) in step by step format that is easier for a novice to follow in sequential way".
    • Negative Prompting: Here we basically bound the constrains of the prompt to be limited to certain behavior. Use cases may arise for example if you trying to create a chatbot for your company, you may not want the underlying model to be offering responses that are beyond the scope of your offering. Example as part of the prompt could be "Only limit the bounds of the response to subjects offered at Cape Peninsula University of Technology".
  • Example: When we communicate with the model, we can provide some examples to it so it can understand better what it is we are looking for. Imagine this as a teacher explaining a student some fundamentals of laws of motion in science with practical examples. Some of this can be seen above in prompting techniques. 
  • Output Type: This helps us dictate to the foundation model, the format in which we want our output presented to us. For enthusiastic developers, this can be output in a structured JSON format as an example.

Configuration

So far, we have looked and understood the way the foundation model works, and some of the techniques that we could adapt for effective results from it for our queries that we present to it. Each foundation model also has some parameters that can control the diversity, variety, creativity of its response. How we configure these parameters helps us control the output the model generates. Some cases may require the model to be having a great degree of creativity freedom in the text is produces, while in some cases we may want to be a bit more stricter in the diverse range of words it chooses to provide an output. Some of the common tweaking parameters that we can control are:

  • Temperature: We discussed the process of tokenization and de-tokenization in part 2 of the series. Temperature helps us adjust the randomness of the tokens generated on the output that resonate similarity to the possible combination of output the model can generate. An example can be say "You are feeling great", "You are feeling wonderful", "You are feeling top of the world". Each time it uses the output, the randomness of the token used would be random hence generating new response. A lower value for it would result in less randomness and same vice versa.
  • Top-P: Also termed nucleus sampling, this can help adjust the number of tokens considered based on cumulative probability by the model on the output before finalizing the response it sends back. You can picture this as a pool of certain type of words, and letting the foundation model choose the selection of it based on its probability model.
  • Top-K: This helps us adjust the number of tokens considered for evaluation. In situations the model can have wide variety of tokens as it creates its response. Remember, each token can be a word base on the n-gram selection of the token. So as the model is helping us in real time with its response, its doing these selections, predictions of these tokens looking at these configurations. A large value of Top-K would provide lot more options to the model to choose from its large pool size, and small value vice versa.
  • Stop Sequences: When an output is generated from the foundation model, some words can be a bit lengthy eventually resulting in larger tokens being generated. As we know tokens act as a currency when working in GenAI. To stop the foundation model from generating further text, we can use a series of stop sequences that when encountered instructs the model for further text generation.
    Okay, there is a lot more involved, but at bare minimum the knowledge above should help solidify our understanding of it to an extent so we are empowered to start developing our custom applications. In the next part of the series, we will look at some of the tools at our disposal that we can use as we now reach the phase where we start to get our hands dirty.  Can barely wait, unable to control excitement ? Me too... stay tuned.... 

Comments

Popular posts from this blog

Springing into AI - Part 4: LLM - Machine Setup

Springing into AI - Part 1: 1000 feet overview

Springing into AI - Part 2: Generative AI