Programming

Making VLLM work on WSL2

Running a small llama3 model for demonstration purposes on WSL2 using VLLM 1. Requirements WSL version 2 Python: 3.8–3.12 GPU ABOVE GTX1080 (Did not manage to make it work on 1080 as it told me the hardware was too old. :-( Ollama is less picky). 2. Preflight checks Checking NVCC In WSL, do: nvcc --version ▶️ Command not found? Fixing NVCC: Nvidia Drivers Installation Visit Nvidia's official website to download and install the Nvidia drivers for WSL. Choose Linux > x86_64 > WSL-Ubuntu > 2.0 > deb (network) Follow the instructions provided on the page. Add the following lines to your .bashrc: export PATH="/usr/local/cuda-12.6/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH" ⚠️ ⚠️ ⚠️ Check the content of "/usr/local" to be sure that you do have the "cuda-12.6" folder. Yours might have a different version. Reload your configuration and check that all is working as expected source ~/.bashrc nvcc --version nvidia-smi.exe ℹ️ "nvidia-smi" isn't available on WSL so just verify that the .exe one detects your hardware. Both commands should displayed gibberish but no apparent errors. 3. Creating the environment python3 --version # copy the version conda create -n myenv python=3.10 -y # Update the python version with your own ▶️ Don't have conda? mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm ~/miniconda3/miniconda.sh source ~/miniconda3/bin/activate conda create -n myenv python=3.10 -y # Update the python version with your own 4. Installing VLLM Installing VLLM: pip install vllm Trying to start the inference server with a tiny LLM: vllm serve facebook/opt-125m ▶️ Runtime crash of VLLM? from torch._C import * # noqa: F403 ^^^^^^^^^^^^^^^^^^^^^^ ImportError: /home/xxxx/vllm_serve/lib/python3.11/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12 Well, this is where sh*t hits the fan. I recommend that you try the fix below. However, if it doesn't work, I can only wish you good luck. It's another one of those technologies that have error messages written by a depressive data scientist. So just scroll to the top of the stacktrace and hope for the best. Google is your friend. Have faith. Potential VLLM serve fix: python -m pip uninstall torch torchvision torchaudio python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121 vllm serve facebook/opt-125m # Should be working now... https://github.com/pytorch/pytorch/issues/111469 5. Running VLLM Let's try with a tiny Facebook LLM. Create an account on Hugging Face and then create a api-key. Then go to the model page you want to try out. For us: https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct At the top of the page there will be some kind of form you must apply to. When done, you'll get an email 10 minutes later telling you you've got access.

Jan 17, 2025 - 14:55

Running a small llama3 model for demonstration purposes on WSL2 using VLLM

1. Requirements

WSL version 2
Python: 3.8–3.12
GPU ABOVE GTX1080 (Did not manage to make it work on 1080 as it told me the hardware was too old. :-( Ollama is less picky).

2. Preflight checks

Checking NVCC

In WSL, do:

nvcc --version

▶️ Command not found?

Fixing NVCC: Nvidia Drivers Installation

Visit Nvidia's official website to download and install the Nvidia drivers for WSL. Choose Linux > x86_64 > WSL-Ubuntu > 2.0 > deb (network)

Follow the instructions provided on the page.

Add the following lines to your .bashrc:

export PATH="/usr/local/cuda-12.6/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda-12.6/lib64:$LD_LIBRARY_PATH"

⚠️ ⚠️ ⚠️ Check the content of "/usr/local" to be sure that you do have the "cuda-12.6" folder. Yours might have a different version.

Reload your configuration and check that all is working as expected

source ~/.bashrc
nvcc --version
nvidia-smi.exe

ℹ️ "nvidia-smi" isn't available on WSL so just verify that the .exe one detects your hardware. Both commands should displayed gibberish but no apparent errors.

3. Creating the environment

python3 --version # copy the version
conda create -n myenv python=3.10 -y # Update the python version with your own

▶️ Don't have conda?

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm ~/miniconda3/miniconda.sh
source ~/miniconda3/bin/activate

conda create -n myenv python=3.10 -y # Update the python version with your own

4. Installing VLLM

Installing VLLM:

pip install vllm

Trying to start the inference server with a tiny LLM:

vllm serve facebook/opt-125m

▶️ Runtime crash of VLLM?

    from torch._C import *  # noqa: F403
    ^^^^^^^^^^^^^^^^^^^^^^
ImportError: /home/xxxx/vllm_serve/lib/python3.11/site-packages/torch/lib/../../nvidia/cusparse/lib/libcusparse.so.12: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

Well, this is where sh*t hits the fan. I recommend that you try the fix below. However, if it doesn't work, I can only wish you good luck. It's another one of those technologies that have error messages written by a depressive data scientist. So just scroll to the top of the stacktrace and hope for the best. Google is your friend. Have faith.

Potential VLLM serve fix:

python -m pip uninstall torch torchvision torchaudio
python -m pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
vllm serve facebook/opt-125m # Should be working now...

https://github.com/pytorch/pytorch/issues/111469

5. Running VLLM

Let's try with a tiny Facebook LLM.

Create an account on Hugging Face and then create a api-key.
Then go to the model page you want to try out.
For us: https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct
At the top of the page there will be some kind of form you must apply to.

When done, you'll get an email 10 minutes later telling you you've got access.

From Rookie to AWS Enthusiast: How I Tackled ...

10 Must-Know Tailwind CSS Classes for Effortl...

Do you want to change Storageclass to your PVs?

Streamline Your Background Tasks with the Tas...

Paperguide Review: The AI Tool Every Research...

Amazon Nova Foundation Models: Redefining Pri...

NVIDIA AI Introduces Omni-RGPT: A Unified Mul...

This AI Paper from Alibaba Unveils WebWalker:...

Harnessing AI to Boost The Adoption of In-Sto...

The best iPad Air cases of 2025: Expert tested

Samsung Galaxy S25 family stars in leaked pro...

OK Go creates manual music video visual effec...

HomeKit Weekly: Combat dry winter air with th...

Today’s Android app deals and freebies: Agath...

Making VLLM work on WSL2

1. Requirements

2. Preflight checks

Checking NVCC

Fixing NVCC: Nvidia Drivers Installation

3. Creating the environment

4. Installing VLLM

Potential VLLM serve fix:

5. Running VLLM

Tags:

Scalable Python backend: Building a containerized FastAPI Application with uv, D...

10 Git Commands Every Developer Should Know

Quick Start: Elasticsearch + OpenTelemetry Collector

UX Writing Challenge: Day 10

Streaming input and output using WebSockets

Popular Posts

Introducing vulne-soldier: A Modern AWS EC2 Vulner...

Best monitors 2025: Gaming, 4K, HDR, and more

Microsoft is axing support for its own apps on Win...

From Rookie to AWS Enthusiast: How I Tackled S3, I...

11 Must-Know Websites Every Developer Should Bookmark

The Intelligence Age by Sam Altman

Spicychat Alternatives

Making VLLM work on WSL2

1. Requirements

2. Preflight checks

Checking NVCC

Fixing NVCC: Nvidia Drivers Installation

3. Creating the environment

4. Installing VLLM

Potential VLLM serve fix:

5. Running VLLM

Tags:

Related Posts

Popular Posts