Multi-Backend Support

UniLM.jl supports multiple LLM service backends through the ServiceEndpoint type hierarchy. Switching backends requires only changing the service parameter.

Available Backends

BackendTypeEnv Variables
OpenAI (default)OPENAIServiceEndpointOPENAI_API_KEY
Azure OpenAIAZUREServiceEndpointAZURE_OPENAI_BASE_URL, AZURE_OPENAI_API_KEY, AZURE_OPENAI_API_VERSION
Google GeminiGEMINIServiceEndpointGEMINI_API_KEY
DeepSeekDeepSeekEndpointDEEPSEEK_API_KEY
MistralMistralEndpointMISTRAL_API_KEY
Ollama (local)OllamaEndpoint(none)
GenericGenericOpenAIEndpoint(passed to constructor)

OpenAI (Default)

using UniLM
using JSON

# OpenAI is the default — no need to specify service
chat = Chat(model="gpt-5.2")
println("Service: ", chat.service)
println("Model: ", chat.model)
Service: OPENAIServiceEndpoint
Model: gpt-5.2

Azure OpenAI

# Set environment variables
ENV["AZURE_OPENAI_BASE_URL"] = "https://your-resource.openai.azure.com"
ENV["AZURE_OPENAI_API_KEY"] = "your-key"
ENV["AZURE_OPENAI_API_VERSION"] = "2024-02-01"
ENV["AZURE_OPENAI_DEPLOY_NAME_GPT_5_2"] = "your-gpt52-deployment"

# Use Azure
chat = Chat(service=AZUREServiceEndpoint, model="gpt-5.2")
push!(chat, Message(Val(:system), "Hello from Azure!"))
push!(chat, Message(Val(:user), "Hi!"))
result = chatrequest!(chat)

Custom Deployment Names

If your Azure deployment has a custom name:

UniLM.add_azure_deploy_name!("my-custom-model", "my-deployment-name")
println("Registered deployments: ", collect(keys(UniLM._MODEL_ENDPOINTS_AZURE_OPENAI)))
delete!(UniLM._MODEL_ENDPOINTS_AZURE_OPENAI, "my-custom-model")  # cleanup
Dict{String, String} with 1 entry:
  "gpt-5.2" => "/openai/deployments/my-gpt52-deploy"

Google Gemini

ENV["GEMINI_API_KEY"] = "your-gemini-key"

chat = Chat(service=GEMINIServiceEndpoint, model="gemini-2.5-flash")
push!(chat, Message(Val(:system), "You are a helpful assistant."))
push!(chat, Message(Val(:user), "Hello!"))
result = chatrequest!(chat)

Available Gemini models:

  • "gemini-2.5-flash"
  • "gemini-2.5-pro"

Responses API Backend

The Responses API also supports the service parameter:

r = Respond(
    service=UniLM.OPENAIServiceEndpoint,
    model="gpt-5.2",
    input="Hello!",
)
println("Service: ", r.service)
println("Model: ", r.model)
Service: OPENAIServiceEndpoint
Model: gpt-5.2

OpenAI-Compatible Providers (Generic Endpoint)

Any provider that implements the OpenAI-compatible /v1/chat/completions endpoint can be used with GenericOpenAIEndpoint. This includes Ollama, vLLM, LM Studio, Mistral, and many others.

Ollama (local)

ep = OllamaEndpoint()  # defaults to http://localhost:11434
chat = Chat(service=ep, model="llama3.1")
println("URL: ", UniLM.get_url(chat))
URL: http://localhost:11434/v1/chat/completions

Mistral

chat = Chat(service=MistralEndpoint(), model="mistral-large-latest")
result = chatrequest!(chat)

DeepSeek

chat = Chat(service=DeepSeekEndpoint(), model="deepseek-chat")       # V3.2
chat = Chat(service=DeepSeekEndpoint(), model="deepseek-reasoner")   # V3.2 thinking mode

vLLM / LM Studio

# vLLM
chat = Chat(service=GenericOpenAIEndpoint("http://localhost:8000", ""), model="meta-llama/Llama-3.1-8B")

# LM Studio
chat = Chat(service=GenericOpenAIEndpoint("http://localhost:1234", ""), model="loaded-model")

Anthropic (compatibility layer)

Anthropic provides an OpenAI-compatible endpoint for evaluation purposes. Note: Anthropic considers this "not a long-term or production-ready solution" — features like response_format and strict are ignored.

chat = Chat(
    service=GenericOpenAIEndpoint("https://api.anthropic.com/v1", ENV["ANTHROPIC_API_KEY"]),
    model="claude-sonnet-4-6"
)

Custom Provider

ep = GenericOpenAIEndpoint("https://my-llm-server.example.com", "sk-my-key")
chat = Chat(service=ep, model="my-model")
println("URL: ", UniLM.get_url(chat))
println("Has auth: ", any(p -> p.first == "Authorization", UniLM.auth_header(ep)))
URL: https://my-llm-server.example.com/v1/chat/completions
Has auth: true

Embeddings with Generic Endpoint

Embeddings also support the service parameter:

emb = Embeddings("test"; service=OllamaEndpoint(), model="nomic-embed-text")
println("URL: ", UniLM.get_url(emb))
URL: http://localhost:11434/v1/embeddings

API Compatibility Tiers

API SurfaceStandard StatusSupported Providers
Chat CompletionsDe facto standardOpenAI, Azure, Gemini, Mistral, DeepSeek, Ollama, vLLM, LM Studio, Anthropic*
EmbeddingsWidely adoptedOpenAI, Gemini, Mistral, Ollama, vLLM
Responses APIEmerging (Open Responses)OpenAI, Ollama, vLLM, Amazon Bedrock
FIM CompletionProvider-specificDeepSeek (beta), Ollama, vLLM
Image GenerationLimitedOpenAI, Gemini, Ollama

*Anthropic compat layer is not production-recommended by Anthropic.

Querying Provider Capabilities

Use has_capability to check what a provider supports before making requests:

for (name, svc) in [
    ("OpenAI", OPENAIServiceEndpoint),
    ("DeepSeek", DeepSeekEndpoint("k")),
    ("Ollama", OllamaEndpoint())
]
    caps = join(sort(collect(provider_capabilities(svc))), ", ")
    println("$name: $caps")
end
OpenAI: chat, embeddings, images, json_output, responses, tools
DeepSeek: chat, fim, json_output, prefix_completion, tools
Ollama: chat, embeddings, fim, responses, tools
# Check specific capabilities
println("DeepSeek FIM: ", has_capability(DeepSeekEndpoint("k"), :fim))
println("OpenAI FIM: ", has_capability(OPENAIServiceEndpoint, :fim))
DeepSeek FIM: true
OpenAI FIM: false

See Also