Springing into AI - Part 4: LLM

Welcome back, wherever you are, hope you having a wonderful day. So far in the series, we had a theoretical look at the eco-system of AI, and then focusing our journey into Generative AI where we now at least know on a surface level of its working and some techniques in how we can best interact with it through effective prompting. Before we start getting our hands dirty into writing our own AI applications, we need to look at options available to us when working with LLM. These can range from using an external provider or using our own local machine as a base for running a LLM. The figure below provides an illustrative example of some options (more may follow in future).

From the figure above, some of the options we have.

Cloud Providers: You can leverage the offering via various cloud providers like AWS, Azure etc. to interact with the foundation models. For example, AWS presents Bedrock that offers the ability for you to select the foundation model you want to use for your application and then have your mind blown when you finally feel that self accomplishment that YOU did something cool using a cloud provider and a Foundation model from your application be it composed of Lambda's or a server side driven hosted application. It it is to be noted that these do come with costs, and tokens are the social currency when working with Foundation Models. AWS also offers PartyRock as a playground for you to play around and have a great time.

Containers: So far, from what I have experimented, I found two options of running LLM locally in a containerized application. Benefit of doing this is firstly, you don't have to incur cost since you self hosting the LLM on your machine, and secondly, as playground you have the freedom to, oh boy I always wanted to phrase this how Michael Keaton said it in Batman "You wanna get nuts, let's get nuts". Just one of those things you know, so coming back on track, you would have the freedom to experiment as much as you want and not limit yourself monetarily. The two options here are:

Docker Model Runner: Using docker desktop, you can install the docker model plugin and enable it. This plugin allows you a massive benefit of pulling the model images locally onto your machine form docker hub, that you can then choose to interact with. Do note, you would need a juiced up machine to be able to run some of the models, and also the machine maybe constrained by the resource capacity of the container.

Rancher Desktop: Here for development purposes if you don't want to use Docker Desktop, you can use Rancher Desktop to install Docker, along with Kubernetes (should you wish). The software offers wide variety of extensions, the one in particular you would be looking for is "OpenWebUI". This under the hood will install some docker images on the machine. Coupled with Ollama installed on your machine, either separately or through the software itself, it offers you then the ability to choose whatever model you want from it's library and interact with it. Instruction for installations and setup can be found at Rancher Desktop Installation. The instructions are pretty self explanatory to help you setup.

Software Setup: Instead of going the container or a cloud provider route, you can manually install

Claude for Desktop provided from Anthropic enables you to use an interface empowers you to then interact with the foundation model. It also has capability to work with own tool and functional calling. (We will discuss this later in the series when we cross that bridge).
Ollama provides a vast library of models at your disposal that you can download on your machine and then use that as a basis for interacting, developing your applications with. In Part 2 of the series we discussed model evaluation which can help you decide some criteria for selecting the model for your experimentation. Depending on the speed of your network and machine power, you may opt for a heavy model or a simple basic model for your playground. But don't worry as Gandhalf said "Even the smallest person can change the course of the future", and a small scale model is no different for you to practice and polish your skills. Once the software is installed you can run various commands, some of which are below:

API Based Accounts: Various vendors also do provide their own API key that you can obtain when you register with them. In the series to follow, you can then use the API key and use the models from their offering. Typical example of this would be OpenAI. This can work nicely, but the only caveat here would be that you would have to pay a cost after a certain amount of token usage or be bounded with a daily quota limit. Based on your interest and playground this may potentially be something you can try if you want to avoid the options discussed above.

For my experimentations, I would be chosing the option of manually installing Ollama on my local machine and then pulling basic models from there that would serve as a baseline for AI applications moving forward as we get hands dirty. Feel free to choose the one that works best for you. Do note, if you do go the external provider route, you may incur cost as you would have to based on their pricing model dish out some chilli changa's from your pocket to them for its usage. In the next part of the series, we will look at creating a basic chat application using Java Spring AI, that would then interact with our local running LLM. Stay tuned....

Search This Blog

Everything will be 200 OK :)

Springing into AI - Part 4: LLM - Machine Setup

Comments

Post a Comment

Popular posts from this blog

Springing into AI - Part 1: 1000 feet overview

Springing into AI - Part 2: Generative AI