Running Qwen3-VL with LM Studio

Running Qwen3-VL with LM Studio

LM Studio provides a user-friendly interface and a CLI for running local language models, making it easy to integrate with Eclaire. This guide walks you through installing, configuring, and running models in Eclaire with LM Studio.

Requirements

This guide requires Eclaire v0.4.0 or above.

What is LM Studio?

LM Studio is a comprehensive platform that simplifies running large language models locally on your machine through three main interfaces: a desktop UI for browsing and managing models, a CLI for automation and scripting, and an OpenAI-compatible API for seamless integration with applications. It provides an intuitive interface for downloading models from Hugging Face and supports both MLX-optimized models for Apple Silicon and GGUF format models, offering flexibility in model selection and deployment.

How Eclaire Uses LM Studio and Qwen3-VL

Eclaire can leverage LM Studio as an AI provider for both backend services and workers. By running LM Studio’s local server, you can process requests entirely on your machine without sending data to external APIs. This approach ensures privacy while giving you full control over model selection, performance tuning, and resource allocation.

With Qwen3-VL, you get a powerful multimodal model that handles multiple AI capabilities in a single package, that Eclaire can leverage for:

This means you can run one model that covers all these use cases, simplifying your setup and reducing resource requirements.

Eclaire running Qwen3-VL with LM Studio

Eclaire UI showing AI assistant using Qwen3-VL to analyse photos, perform OCR and answer questions.

Step 1: Installation

Install LM Studio using Homebrew (for macOS):

brew update
brew install --cask lm-studio

This installs both the LM Studio desktop application and the lms command-line tool.

LM Studio is available for macOS, Linux, and Windows. For installation instructions specific to your platform, check the official LM Studio documentation.

Step 2: Configure Settings

After installation, launch the LM Studio application and configure the essential settings:

  1. Open the LM Studio app
  2. Click the Settings ⚙️ (gear) icon in the bottom-right
  3. Under Developer / Local LLM Service (headless), check “Enable Local LLM Service” if you want to run it as a server without the LM Studio app open

Hardware View

Before downloading models, it’s important to check your available resources. In the Settings view, navigate to the Hardware tab to see your VRAM capacity and available memory. When selecting models to download, ensure you have enough memory available to run them efficiently. Models that exceed your available VRAM will run slowly or may fail to load.

LM Studio Hardware View

Step 3: Running Models

LM Studio offers two approaches for running models: a graphical user interface for easier model selection and a command-line interface.

Option A: Running Models from the UI

The LM Studio interface makes it easy to discover and run models:

Browse and Search Models

Navigate to the “Discover” tab to browse and search available models from the LM Studio community on Hugging Face. The models browser provides detailed information to help you make informed choices:

This information helps you select the right model based on your hardware capabilities and use case requirements.

LM Studio Models Browser

Developer View

The Developer view provides a complete interface for running and managing your local AI server:

This centralized view makes it easy to manage your local AI infrastructure without touching the command line.

LM Studio Developer View

This approach is ideal when you’re experimenting with different models or need visual feedback on resource usage.

Option B: Running Models from the CLI

The LM Studio CLI also allows for browsing, downloading models and controlling the inference server.

Check Available Models

List models already downloaded to your system:

lms ls

Download New Models

Browse and download models from the lmstudio-community on Hugging Face:

lms get "https://huggingface.co/lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit"
Model Format Options

Use lms get --mlx to filter for MLX-optimized models (recommended for Apple Silicon) or lms get --gguf to see GGUF format models compatible with llama.cpp.

Start the Server

Start the LM Studio server on a specific port:

lms server start --port 11434
Port Configuration

Port 11434 is Eclaire’s default port for backend AI providers. For workers, use port 11435. These are configured in Eclaire’s .env.* files as AI_LOCAL_PROVIDER_URL. In this example, we will have both the backend and workers use the same LM Studio server at port 11434.

Load a Model

Load an existing model into memory:

lms load qwen/qwen3-vl-8b

The model name should match the identifier shown in lms ls.

Verify the Setup

Test that the server and model are working properly:

curl -X POST "http://127.0.0.1:11434/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ],
    "stream": false,
    "max_tokens": 1000
}'

You should receive a JSON response with the model’s reply.

Stopping the Server

When you’re done, stop the server:

lms server stop

Step 4: Import and Activate the Model in Eclaire

Once the LM Studio server is running with your chosen model, import it into Eclaire.

Import the model:

./tools/model-cli/run.sh import https://huggingface.co/lmstudio-community/Qwen3-VL-8B-Instruct-MLX-4bit

Activate the model:

./tools/model-cli/run.sh activate

The CLI will interactively guide you through the activation process, allowing you to select:

To display the active configuration:

./tools/model-cli/run.sh list

Qwen3-VL configured with LM Studio for backend and workers

This screenshot shows Qwen3-VL configured with LM Studio as the active provider for both backend and workers.

Restart Eclaire:

After activating the model, restart Eclaire (or the specific service) to pick up the new configuration.

Port Conflicts

If you’re running multiple AI providers, ensure they’re on different ports. Only one service can listen on each port at a time. If you get a “port already in use” error, check running services with lsof -i :[port] or use pm2 list if managing services with PM2.

What’s Next?

With LM Studio integrated, you can now leverage a wide variety of local language models with Eclaire while maintaining full privacy and control. LM Studio’s intuitive UI makes it easy to experiment with different models, while the CLI tools enable automation and seamless integration.

Try exploring different models from the lmstudio-community to find the best balance of performance, accuracy, and resource usage for your use case.

Resources