Running ollama on Pelle¶
Summary¶
- Ollama needs to be installed in the project folder.
- The location of the ollama models needs to be redirected in to a folder within the project
export OLLAMA_MODELS=/proj/naiss-XXXXX/.... ollama servemust run on an allocated GPU or CPU node.- Interaction with your
ollama servecould be done- on the allocated node itself
- on the login node
- on your own computer
- Running non-interactive tasks
User installation of ollama¶
- Login to Pelle and select a folder under your project allocation
- Download
ollama
- Make folder where you will unpack
Your ollama binary is in /proj/naiss-XXXXX/userid/nobackup/bin/ollama
Running ollama on a compute node¶
- Start interactive node with GPU
interactive -A nais-XXXXX -p gpu -t 1:00:00 --gpus=l40s:1
# or on the old haswell nodes
interactive -A naiss-XXXXX -p haswell -c 16 -t 1:00:00 --gpus=t4:1
# exampe output from running the last command
salloc: Pending job allocation 4068884
salloc: job 4068884 queued and waiting for resources
salloc: job 4068884 has been allocated resources
salloc: Granted job allocation 4068884
salloc: Waiting for resource configuration
salloc: Nodes p2033 are ready for job
Take a note of the JOBID 4068884 and the node address: p2033
- Start
ollama servein the interactive session
export OLLAMA_MODELS=/proj/nais-XXXXXX/user/nobackup/ollama_models
mkdir -p $OLLAMA_MODELS
# start ollama serve
unset CUDA_VISIBLE_DEVICES
/proj/naiss-XXXXXX/userid/nobackup/ollama/bin/ollama serve
- Look at the end that ollama discovered the GPU
...
time=2026-03-20T07:58:02.030+01:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-92ada8a8-9cda-effe-830e-0bdfcd3e0a96 filter_id="" library=CUDA compute=7.5 name=CUDA0 description="Tesla T4" libdirs=ollama,cuda_v13 driver=13.1 pci_id=0000:81:00.0 type=discrete total="15.0 GiB" available="14.6 GiB"
time=2026-03-20T07:58:02.030+01:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="15.0 GiB" default_num_ctx=4096
- Find the JOBID of the reservation if you lost track
# Check the JOBID
squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4068884 haswell interact pmitev R 0:06 1 p2034
- Get additional shell in the same allocation
This will start bash shell in the same job reservation.
- Test that you can communicate with the ollama service
/proj/naiss-XXXXXX/userid/nobackup/ollama/bin/ollama list
NAME ID SIZE MODIFIED
qwen2.5-coder:14b 9ec8897f747e 9.0 GB 23 hours ago
gemma3:latest a2af6cc3eb7f 3.3 GB 36 hours ago
mistral:latest 6577803aa9a0 4.4 GB 37 hours ago
llama3.2-vision:latest 6f2f9757ae97 7.8 GB 37 hours ago
llama4:latest bf31604e25c2 67 GB 39 hours ago
llama3.2:latest a80c4f17acd5 2.0 GB 39 hours ago
llama4:scout bf31604e25c2 67 GB 40 hours ago
Expose ollama serve to Pelle login node or personal computer¶
Warning: This approach exposes the ollama instance to everyone on Pelle - Use with caution.
- Start interactive node with GPU as before
interactive -A nais-XXXXX -p gpu -t 1:00:00 --gpus=l40s:1
# or on the old haswell nodes
interactive -A naiss-XXXXX -p haswell -c 16 -t 1:00:00 --gpus=t4:1
- On the node start
ollama servewith additionalexport OLLAMA_HOST=0.0.0.0
export OLLAMA_HOST=0.0.0.0
export OLLAMA_MODELS=/proj/nais-XXXXXX/user/nobackup/ollama_models
mkdir -p $OLLAMA_MODELS
# start ollama serve
unset CUDA_VISIBLE_DEVICES
/proj/naiss-XXXXXX/userid/nobackup/ollama/bin/ollama serve
- On the login node you can, check on which node you got allocation
squeue --me
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
4068884 haswell interact pmitev R 0:06 1 p2034
- Run ollama client on Pelle login node
export OLLAMA_HOST=p2034
# start ollama client
/proj/naiss-XXXXXX/userid/nobackup/ollama/bin/ollama
# check models
/proj/naiss-XXXXXX/userid/nobackup/ollama/bin/ollama list
- Run on personal computer with
ollama serverunning on Pelle GPU node To exposeollama serveto you personal computer run in a shell
Note that in the example above we select port 21434 on the local computer, to avoid potential conflict with local ollama installations running on your computer. Make sure you change p2034 to the node with your ollama serve.
On your computer, make sure you connect to localhost:21434 when running ollama.
NAME ID SIZE MODIFIED
qwen2.5-coder:14b 9ec8897f747e 9.0 GB 23 hours ago
gemma3:latest a2af6cc3eb7f 3.3 GB 36 hours ago
mistral:latest 6577803aa9a0 4.4 GB 37 hours ago
llama3.2-vision:latest 6f2f9757ae97 7.8 GB 37 hours ago
llama4:latest bf31604e25c2 67 GB 39 hours ago
llama3.2:latest a80c4f17acd5 2.0 GB 39 hours ago
llama4:scout bf31604e25c2 67 GB 40 hours ago
Running non-interactive jobs¶
Example sbatch submit.sh template
#!/bin/bash -l
#SBATCH -A naiss-XXXX
#SBATCH -p gpu --gpus=l40s:1
#SBATCH -t 10:00:00
# Get the first free port within a range
OPORT=$(comm -23 <(seq 21434 21444 | sort) <(ss -tan | awk '{print $4}' | cut -d':' -f2 | sort -u) | head -n 1)
export OLLAMA_HOST=localhost:$OPORT
# start the service
export OLLAMA_MODELS=/proj/nais-XXXXXX/user/nobackup/ollama_models
ollama serve &
# Wait a bit for the service to be ready
sleep 5
# Run!
ollama list
# or your python program
source venv/activate.sh
python myprogram.py
Submit the job.