Ollama allows to run your own LLM locally on your device.
Installation
- Download and install the latest version of Ollama from the official website.
- Run the
llama3:8bmodel
ollama run llama3:8bCustomization
Change Storage Location of Models
By default Ollama is installed on your C:\ drive. You can change this if you set the OLLAMA_MODELS environment variable.
- Start → Edit the system environment variables → Environment Variables → New →
OLLAMA_MODELS→D:\Software\Ollama
Use Models from Registry
Ollama has a registry where you can pull and push models from/to.
You can find the list of models in their registry via the library.
To run a model from a registry you can specify the run command.
ollama run rouge/daybreak-kunoichi-2dpo-7bCreate Custom Model
- Create a modelfile based on an existing model
ollama show llama3:8b --modelfile >.\storyteller.modelfile- Add your custom instructions to the modelfile
SYSTEM """
You are a medieval storyteller. You will create an immersive and detailed story from every message you receive. Use a maximum of 6 sentences.
Respond to my messages and craft a visual and engaging story of them.
"""- Create a new model based on the modelfile
ollama create storyteller --file .\storyteller.modelfile- Test your new model
ollama run storytellerRunning Models from Hugging Face
Hugging Face is a website where people can upload models. You can download these models and run them locally.
Download the model the website. You need to use a gguf extension. At this moment I have no clue why.
Create a modelfile.
touch lexi-llama3-uncensored.modelfileAdd the following and make sure that your FROM points to the location of your downloaded model.
# Modelfile
FROM "./Lexi-Llama-3-8B-Uncensored_F16.gguf"
PARAMETER stop "<|im_start|>"
PARAMETER stop "<|im_end|>"
TEMPLATE """
<|im_start|>system
{{ .System }}<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""Create a model in ollama from your modelfile.
ollama create lexi-llama3-uncensored -f lexi-llama3-uncensored.modelfileRun the model.
ollama run lexi-llama3-uncensoredOllama + HuggingFace ✅🔥. Create Custom Models From Huggingface… | by Sudarshan Koirala | Medium
API
Once your Ollama application is running, you can query it via the API.
The default port is 11434 but this can be changed using the OLLAMA_HOST environment variable.
Completion
You can query a completion via the following commands.
curl http://localhost:11434//api/generate -d '{
"model": "storyteller",
"prompt": "The Duke of Cambridge enters my court hall. He kneels in front of me."
}'Alternatively in Powershell:
Invoke-RestMethod -Uri "http://localhost:11434/api/generate" -Method Post -Body '{"model": "storyteller", "prompt": "The Duke of Cambridge enters my court hall. He kneels in front of me."}' -ContentType "application/json"Note by default the response is streamed. If you want to receive a single response with the full text you can specify the stream parameter and set it to false.
Invoke-RestMethod -Uri "http://localhost:11434/api/generate" -Method Post -Body '{"model": "storyteller", "prompt": "The Duke of Cambridge enters my court hall. He kneels in front of me.", "stream": false}' -ContentType "application/json"Optionally you can overwrite the system message in the prompt to specify what instructions the model should follow.
Invoke-RestMethod -Uri "http://localhost:11434/api/generate" -Method Post -Body '{"model": "storyteller", "prompt": "The Duke of Cambridge enters my court hall. He kneels in front of me.", "stream": false, "system": "You are a dog and you only bark"}' -ContentType "application/json"
model : storyteller
created_at : 2024-06-11T22:52:39.9366121Z
response : RUFF RUFF RUFF! *ears perked up* WOOOOO! *barks excitedly, trying to get the Duke's attention* RUFF RUFF RUFF! *wags tail* WOOOO! *tries to lick the Duke's face* RUFF RUFF RUFF!
done : True
done_reason : stop
context : {128006, 9125, 128007, 881...}
total_duration : 1003303100
load_duration : 2494700
prompt_eval_count : 36
prompt_eval_duration : 375542000
eval_count : 65
eval_duration : 624148000Chat
More often you want to perform a chat operation, where you keep a set of messages as history that can be used as context.
Invoke-RestMethod -Uri "http://localhost:11434/api/chat" -Method Post ` -Body '{"model": "storyteller", "messages": [ {"role": "user", "content": "The Duke of Cambridge enters my court hall. He kneels in front of me."}, {"role": "user", "content": "I take my knife and cut off his ear"}, {"role": "user", "content": "Lex the dog enters the room and starts rolling around in the blood."}], "stream": false}' ` -ContentType "application/json"Open WebUI
GitHub - open-webui/open-webui: User-friendly WebUI for LLMs (Formerly Ollama WebUI)