Ollama allows to run your own LLM locally on your device.

Installation

  • Download and install the latest version of Ollama from the official website.
  • Run the llama3:8b model
ollama run llama3:8b

Customization

Change Storage Location of Models

By default Ollama is installed on your C:\ drive. You can change this if you set the OLLAMA_MODELS environment variable.

  • Start Edit the system environment variables Environment Variables New OLLAMA_MODELS D:\Software\Ollama

Use Models from Registry

Ollama has a registry where you can pull and push models from/to.

You can find the list of models in their registry via the library. To run a model from a registry you can specify the run command.

ollama run rouge/daybreak-kunoichi-2dpo-7b

Create Custom Model

  • Create a modelfile based on an existing model
ollama show llama3:8b --modelfile  >.\storyteller.modelfile
  • Add your custom instructions to the modelfile
SYSTEM """
You are a medieval storyteller. You will create an immersive and detailed story from every message you receive. Use a maximum of 6 sentences.
Respond to my messages and craft a visual and engaging story of them.
"""
  • Create a new model based on the modelfile
ollama create storyteller --file .\storyteller.modelfile
  • Test your new model
ollama run storyteller

Reference

Running Models from Hugging Face

Hugging Face is a website where people can upload models. You can download these models and run them locally.

Download the model the website. You need to use a gguf extension. At this moment I have no clue why.

Create a modelfile.

touch lexi-llama3-uncensored.modelfile

Add the following and make sure that your FROM points to the location of your downloaded model.

# Modelfile  
FROM "./Lexi-Llama-3-8B-Uncensored_F16.gguf"  
  
PARAMETER stop "<|im_start|>"  
PARAMETER stop "<|im_end|>"  
  
TEMPLATE """  
<|im_start|>system  
{{ .System }}<|im_end|>  
<|im_start|>user  
{{ .Prompt }}<|im_end|>  
<|im_start|>assistant  
"""

Create a model in ollama from your modelfile.

ollama create lexi-llama3-uncensored -f lexi-llama3-uncensored.modelfile

Run the model.

ollama run lexi-llama3-uncensored

Ollama + HuggingFace ✅🔥. Create Custom Models From Huggingface… | by Sudarshan Koirala | Medium

API

Once your Ollama application is running, you can query it via the API. The default port is 11434 but this can be changed using the OLLAMA_HOST environment variable.

Completion

You can query a completion via the following commands.

curl http://localhost:11434//api/generate -d '{
  "model": "storyteller",
  "prompt": "The Duke of Cambridge enters my court hall. He kneels in front of me."
}'

Alternatively in Powershell:

Invoke-RestMethod -Uri "http://localhost:11434/api/generate" -Method Post -Body '{"model": "storyteller", "prompt": "The Duke of Cambridge enters my court hall. He kneels in front of me."}' -ContentType "application/json"

Note by default the response is streamed. If you want to receive a single response with the full text you can specify the stream parameter and set it to false.

Invoke-RestMethod -Uri "http://localhost:11434/api/generate" -Method Post -Body '{"model": "storyteller", "prompt": "The Duke of Cambridge enters my court hall. He kneels in front of me.", "stream": false}' -ContentType "application/json"

Optionally you can overwrite the system message in the prompt to specify what instructions the model should follow.

Invoke-RestMethod -Uri "http://localhost:11434/api/generate" -Method Post -Body '{"model": "storyteller", "prompt": "The Duke of Cambridge enters my court hall. He kneels in front of me.", "stream": false, "system": "You are a dog and you only bark"}' -ContentType "application/json"
 
model                : storyteller
created_at           : 2024-06-11T22:52:39.9366121Z
response             : RUFF RUFF RUFF! *ears perked up* WOOOOO! *barks excitedly, trying to get the Duke's attention* RUFF RUFF RUFF! *wags tail* WOOOO! *tries to lick the Duke's face* RUFF RUFF RUFF!
done                 : True
done_reason          : stop
context              : {128006, 9125, 128007, 881...}
total_duration       : 1003303100
load_duration        : 2494700
prompt_eval_count    : 36
prompt_eval_duration : 375542000
eval_count           : 65
eval_duration        : 624148000

Chat

More often you want to perform a chat operation, where you keep a set of messages as history that can be used as context.

Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post ` -Body '{"model": "storyteller", "messages": [ {"role": "user", "content": "The Duke of Cambridge enters my court hall. He kneels in front of me."}, {"role": "user", "content": "I take my knife and cut off his ear"}, {"role": "user", "content": "Lex the dog enters the room and starts rolling around in the blood."}], "stream": false}' ` -ContentType "application/json"

Open WebUI

GitHub - open-webui/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI)