Share feedback
Answers are generated based on the documentation.

DMR REST API

Once Model Runner is enabled, new API endpoints are available. You can use these endpoints to interact with a model programmatically. Docker Model Runner provides compatibility with OpenAI, Anthropic, and Ollama API formats.

Determine the base URL

The base URL to interact with the endpoints depends on how you run Docker and which API format you're using.

Access fromBase URL
Containershttp://model-runner.docker.internal
Host processes (TCP)http://localhost:12434
メモ

TCP host access must be enabled. See Enable Docker Model Runner.

Access fromBase URL
Containershttp://172.17.0.1:12434
Host processeshttp://localhost:12434
メモ

The 172.17.0.1 interface may not be available by default to containers within a Compose project. In this case, add an extra_hosts directive to your Compose service YAML:

extra_hosts:
  - "model-runner.docker.internal:host-gateway"

Then you can access the Docker Model Runner APIs at http://model-runner.docker.internal:12434/

Base URLs for third-party tools

When configuring third-party tools that expect OpenAI-compatible APIs, use these base URLs:

Tool typeBase URL format
OpenAI SDK / clientshttp://localhost:12434/engines/v1
Anthropic SDK / clientshttp://localhost:12434
Ollama-compatible clientshttp://localhost:12434

See IDE and tool integrations for specific configuration examples.

Supported APIs

Docker Model Runner supports multiple API formats:

APIDescriptionUse case
OpenAI APIOpenAI-compatible chat completions, embeddingsMost AI frameworks and tools
Anthropic APIAnthropic-compatible messages endpointTools built for Claude
Ollama APIOllama-compatible endpointsTools built for Ollama
DMR APINative Docker Model Runner endpointsModel management

OpenAI-compatible API

DMR implements the OpenAI API specification for maximum compatibility with existing tools and frameworks.

Endpoints

EndpointMethodDescription
/engines/v1/modelsGETList models
/engines/v1/models/{namespace}/{name}GETRetrieve model
/engines/v1/chat/completionsPOSTCreate chat completion
/engines/v1/completionsPOSTCreate completion
/engines/v1/embeddingsPOSTCreate embeddings
メモ

You can optionally include the engine name in the path: /engines/llama.cpp/v1/chat/completions. This is useful when running multiple inference engines.

Model name format

When specifying a model in API requests, use the full model identifier including the namespace:

{
  "model": "ai/smollm2",
  "messages": [...]
}

Common model name formats:

  • Docker Hub models: ai/smollm2, ai/llama3.2, ai/qwen2.5-coder
  • Tagged versions: ai/smollm2:360M-Q4_K_M
  • Custom models: myorg/mymodel

Supported parameters

The following OpenAI API parameters are supported:

ParameterTypeDescription
modelstringRequired. The model identifier.
messagesarrayRequired for chat completions. The conversation history.
promptstringRequired for completions. The prompt text.
max_tokensintegerMaximum tokens to generate.
temperaturefloatSampling temperature (0.0-2.0).
top_pfloatNucleus sampling parameter (0.0-1.0).
streamBooleanEnable streaming responses.
stopstring/arrayStop sequences.
presence_penaltyfloatPresence penalty (-2.0 to 2.0).
frequency_penaltyfloatFrequency penalty (-2.0 to 2.0).

Limitations and differences from OpenAI

Be aware of these differences when using DMR's OpenAI-compatible API:

FeatureDMR behavior
API keyNot required. DMR ignores the Authorization header.
Function callingSupported with llama.cpp for compatible models.
VisionSupported for multi-modal models (e.g., LLaVA).
JSON modeSupported via response_format: {"type": "json_object"}.
LogprobsSupported.
Token countingUses the model's native token encoder, which may differ from OpenAI's.

Anthropic-compatible API

DMR provides Anthropic Messages API compatibility for tools and frameworks built for Claude.

Endpoints

EndpointMethodDescription
/anthropic/v1/messagesPOSTCreate a message
/anthropic/v1/messages/count_tokensPOSTCount tokens

Supported parameters

The following Anthropic API parameters are supported:

ParameterTypeDescription
modelstringRequired. The model identifier.
messagesarrayRequired. The conversation messages.
max_tokensintegerMaximum tokens to generate.
temperaturefloatSampling temperature (0.0-1.0).
top_pfloatNucleus sampling parameter.
top_kintegerTop-k sampling parameter.
streamBooleanEnable streaming responses.
stop_sequencesarrayCustom stop sequences.
systemstringSystem prompt.

Example: Chat with Anthropic API

curl http://localhost:12434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "max_tokens": 1024,
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Example: Streaming response

curl http://localhost:12434/v1/messages \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "max_tokens": 1024,
    "stream": true,
    "messages": [
      {"role": "user", "content": "Count from 1 to 10"}
    ]
  }'

Ollama-compatible API

DMR also provides Ollama-compatible endpoints for tools and frameworks built for Ollama.

Endpoints

EndpointMethodDescription
/api/tagsGETList available models
/api/showPOSTShow model information
/api/chatPOSTGenerate chat completion
/api/generatePOSTGenerate completion
/api/embeddingsPOSTGenerate embeddings

Example: Chat with Ollama API

curl http://localhost:12434/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "ai/smollm2",
    "messages": [
      {"role": "user", "content": "Hello!"}
    ]
  }'

Example: List models

curl http://localhost:12434/api/tags

DMR native endpoints

These endpoints are specific to Docker Model Runner for model management:

EndpointMethodDescription
/models/createPOSTPull/create a model
/modelsGETList local models
/models/{namespace}/{name}GETGet model details
/models/{namespace}/{name}DELETEDelete a local model

REST API examples

Request from within a container

To call the chat/completions OpenAI endpoint from within another container using curl:

#!/bin/sh

curl http://model-runner.docker.internal/engines/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Request from the host using TCP

To call the chat/completions OpenAI endpoint from the host via TCP:

  1. Enable the host-side TCP support from the Docker Desktop GUI, or via the Docker Desktop CLI. For example: docker desktop enable model-runner --tcp <port>.

    If you are running on Windows, also enable GPU-backed inference. See Enable Docker Model Runner.

  2. Interact with it as documented in the previous section using localhost and the correct port.

#!/bin/sh

curl http://localhost:12434/engines/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "ai/smollm2",
      "messages": [
          {
              "role": "system",
              "content": "You are a helpful assistant."
          },
          {
              "role": "user",
              "content": "Please write 500 words about the fall of Rome."
          }
      ]
  }'

Request from the host using a Unix socket

To call the chat/completions OpenAI endpoint through the Docker socket from the host using curl:

#!/bin/sh

curl --unix-socket $HOME/.docker/run/docker.sock \
    localhost/exp/vDD4.40/engines/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "ai/smollm2",
        "messages": [
            {
                "role": "system",
                "content": "You are a helpful assistant."
            },
            {
                "role": "user",
                "content": "Please write 500 words about the fall of Rome."
            }
        ]
    }'

Streaming responses

To receive streaming responses, set stream: true:

curl http://localhost:12434/engines/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
      "model": "ai/smollm2",
      "stream": true,
      "messages": [
          {"role": "user", "content": "Count from 1 to 10"}
      ]
  }'

Using with OpenAI SDKs

Python

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:12434/engines/v1",
    api_key="not-needed"  # DMR doesn't require an API key
)

response = client.chat.completions.create(
    model="ai/smollm2",
    messages=[
        {"role": "user", "content": "Hello!"}
    ]
)

print(response.choices[0].message.content)

Node.js

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'http://localhost:12434/engines/v1',
  apiKey: 'not-needed',
});

const response = await client.chat.completions.create({
  model: 'ai/smollm2',
  messages: [{ role: 'user', content: 'Hello!' }],
});

console.log(response.choices[0].message.content);

What's next