Multi-Backend Support

UniLM.jl supports multiple LLM service backends through the ServiceEndpoint type hierarchy. Switching backends requires only changing the service parameter.

Available Backends

BackendTypeEnv Variables
OpenAI (default)OPENAIServiceEndpointOPENAI_API_KEY
Azure OpenAIAZUREServiceEndpointAZURE_OPENAI_BASE_URL, AZURE_OPENAI_API_KEY, AZURE_OPENAI_API_VERSION
Google GeminiGEMINIServiceEndpointGEMINI_API_KEY

OpenAI (Default)

using UniLM
using JSON

# OpenAI is the default — no need to specify service
chat = Chat(model="gpt-5.2")
println("Service: ", chat.service)
println("Model: ", chat.model)
Service: OPENAIServiceEndpoint
Model: gpt-5.2

Azure OpenAI

# Set environment variables
ENV["AZURE_OPENAI_BASE_URL"] = "https://your-resource.openai.azure.com"
ENV["AZURE_OPENAI_API_KEY"] = "your-key"
ENV["AZURE_OPENAI_API_VERSION"] = "2024-02-01"
ENV["AZURE_OPENAI_DEPLOY_NAME_GPT_5_2"] = "your-gpt52-deployment"

# Use Azure
chat = Chat(service=AZUREServiceEndpoint, model="gpt-5.2")
push!(chat, Message(Val(:system), "Hello from Azure!"))
push!(chat, Message(Val(:user), "Hi!"))
result = chatrequest!(chat)

Custom Deployment Names

If your Azure deployment has a custom name:

UniLM.add_azure_deploy_name!("my-custom-model", "my-deployment-name")
println("Registered deployments: ", collect(keys(UniLM._MODEL_ENDPOINTS_AZURE_OPENAI)))
delete!(UniLM._MODEL_ENDPOINTS_AZURE_OPENAI, "my-custom-model")  # cleanup
Dict{String, String} with 1 entry:
  "gpt-5.2" => "/openai/deployments/my-gpt52-deploy"

Google Gemini

ENV["GEMINI_API_KEY"] = "your-gemini-key"

chat = Chat(service=GEMINIServiceEndpoint, model="gemini-2.5-flash")
push!(chat, Message(Val(:system), "You are a helpful assistant."))
push!(chat, Message(Val(:user), "Hello!"))
result = chatrequest!(chat)

Available Gemini models:

  • "gemini-2.5-flash"
  • "gemini-2.5-pro"

Responses API Backend

The Responses API also supports the service parameter:

r = Respond(
    service=UniLM.OPENAIServiceEndpoint,
    model="gpt-5.2",
    input="Hello!",
)
println("Service: ", r.service)
println("Model: ", r.model)
Service: OPENAIServiceEndpoint
Model: gpt-5.2

OpenAI-Compatible Providers (Generic Endpoint)

Any provider that implements the OpenAI-compatible /v1/chat/completions endpoint can be used with GenericOpenAIEndpoint. This includes Ollama, vLLM, LM Studio, Mistral, and many others.

Ollama (local)

ep = OllamaEndpoint()  # defaults to http://localhost:11434
chat = Chat(service=ep, model="llama3.1")
println("URL: ", UniLM.get_url(chat))
URL: http://localhost:11434/v1/chat/completions

Mistral

chat = Chat(service=MistralEndpoint(), model="mistral-large-latest")
result = chatrequest!(chat)

vLLM / LM Studio

# vLLM
chat = Chat(service=GenericOpenAIEndpoint("http://localhost:8000", ""), model="meta-llama/Llama-3.1-8B")

# LM Studio
chat = Chat(service=GenericOpenAIEndpoint("http://localhost:1234", ""), model="loaded-model")

Anthropic (compatibility layer)

Anthropic provides an OpenAI-compatible endpoint for evaluation purposes. Note: Anthropic considers this "not a long-term or production-ready solution" — features like response_format and strict are ignored.

chat = Chat(
    service=GenericOpenAIEndpoint("https://api.anthropic.com/v1", ENV["ANTHROPIC_API_KEY"]),
    model="claude-sonnet-4-6"
)

Custom Provider

ep = GenericOpenAIEndpoint("https://my-llm-server.example.com", "sk-my-key")
chat = Chat(service=ep, model="my-model")
println("URL: ", UniLM.get_url(chat))
println("Has auth: ", any(p -> p.first == "Authorization", UniLM.auth_header(ep)))
URL: https://my-llm-server.example.com/v1/chat/completions
Has auth: true

Embeddings with Generic Endpoint

Embeddings also support the service parameter:

emb = Embeddings("test"; service=OllamaEndpoint())
println("URL: ", UniLM.get_url(emb))
URL: http://localhost:11434/v1/embeddings

API Compatibility Tiers

API SurfaceStandard StatusSupported Providers
Chat CompletionsDe facto standardOpenAI, Azure, Gemini, Mistral, Ollama, vLLM, LM Studio, Anthropic*
EmbeddingsWidely adoptedOpenAI, Gemini, Mistral, Ollama, vLLM
Responses APIEmerging (Open Responses)OpenAI, Ollama, vLLM, Amazon Bedrock
Image GenerationLimitedOpenAI, Gemini, Ollama

*Anthropic compat layer is not production-recommended by Anthropic.

See Also