UniLM.jl — LLM Reference
Single-file reference for LLM code-generation systems. This is a Julia package. UniLM.jl v0.9.1 · Julia ≥ 1.12 · Deps:
HTTP.jl,JSON.jl,Base64Repo: https://github.com/algunion/UniLM.jl
Installation
using Pkg
Pkg.add("UniLM")
using UniLMEnvironment Variables
| Variable | Required by | Description |
|---|---|---|
OPENAI_API_KEY | OPENAIServiceEndpoint (default) | OpenAI API key |
AZURE_OPENAI_BASE_URL | AZUREServiceEndpoint | Azure deployment base URL |
AZURE_OPENAI_API_KEY | AZUREServiceEndpoint | Azure API key |
AZURE_OPENAI_API_VERSION | AZUREServiceEndpoint | Azure API version string |
AZURE_OPENAI_DEPLOY_NAME_GPT_5_2 | AZUREServiceEndpoint | Auto-registers Azure deployment for "gpt-5.2" |
GEMINI_API_KEY | GEMINIServiceEndpoint | Google Gemini API key |
DEEPSEEK_API_KEY | DeepSeekEndpoint | DeepSeek API key |
MISTRAL_API_KEY | MistralEndpoint | Mistral AI API key |
Four APIs
UniLM.jl wraps four OpenAI API surfaces plus FIM completion:
- Chat Completions (
Chat+chatrequest!) — stateful, message-based conversations with tool calling, streaming, structured output. Supports OpenAI, Azure, Gemini, DeepSeek, Ollama, Mistral, and any OpenAI-compatible provider. - Responses API (
Respond+respond) — newer, more flexible API with built-in tools (web search, file search), multi-turn chaining viaprevious_response_id, reasoning support for O-series models, structured output. OpenAI only for full feature set. - Image Generation (
ImageGeneration+generate_image) — text-to-image withgpt-image-1.5. OpenAI only. - Embeddings (
Embeddings+embeddingrequest!) — vector embeddings. Multi-provider viaserviceparameter. - FIM Completion (
FIMCompletion+fim_complete) — code infilling. DeepSeek, Ollama, vLLM.
Which API to use:
- Chat Completions — best for multi-turn conversations; broadest provider support. Use for chat, tool calling, or streaming across any supported backend.
- Responses API — simpler for single-shot or chained requests; built-in web search, file search, MCP, computer use tools. Currently OpenAI-only for full feature set.
- FIM Completion — code infilling between prefix and suffix. DeepSeek, Ollama, vLLM only.
Service Endpoints
abstract type ServiceEndpoint end
struct OPENAIServiceEndpoint <: ServiceEndpoint end # default — uses OPENAI_API_KEY
struct AZUREServiceEndpoint <: ServiceEndpoint end # uses AZURE_OPENAI_* env vars
struct GEMINIServiceEndpoint <: ServiceEndpoint end # uses GEMINI_API_KEY
struct GenericOpenAIEndpoint <: ServiceEndpoint # any OpenAI-compatible provider
base_url::String
api_key::String
end
# Convenience constructors
OllamaEndpoint(; base_url="http://localhost:11434") # Ollama local
MistralEndpoint(; api_key=ENV["MISTRAL_API_KEY"]) # Mistral AI
DeepSeekEndpoint(; api_key=ENV["DEEPSEEK_API_KEY"]) # DeepSeek
# Type alias for service fields — accepts both marker types and instances:
const ServiceEndpointSpec = Union{Type{<:ServiceEndpoint}, ServiceEndpoint}
# Built-in types: Chat(service=OPENAIServiceEndpoint) — passed as the type
# Instance types: Chat(service=DeepSeekEndpoint()) — passed as a constructed valueProvider Compatibility
| API Surface | Status | Providers |
|---|---|---|
| Chat Completions | De facto standard | OpenAI, Azure, Gemini, Mistral, Ollama, vLLM, LM Studio, Anthropic* |
| Embeddings | Widely adopted | OpenAI, Gemini, Mistral, Ollama, vLLM |
| Responses API | Emerging (Open Responses) | OpenAI, Ollama, vLLM, Amazon Bedrock |
| Image Generation | Limited | OpenAI, Gemini, Ollama |
*Anthropic compat layer not production-recommended by Anthropic.
Register additional Azure deployments at runtime:
add_azure_deploy_name!(model::String, deploy_name::String)
# e.g. add_azure_deploy_name!("gpt-5.2", "my-deployment")Pass the backend via the service keyword on Chat, Respond, or ImageGeneration:
Chat(service=AZUREServiceEndpoint, model="gpt-5.2")
Respond(service=OPENAIServiceEndpoint, input="Hello")Chat Completions API
Chat
@kwdef struct Chat
service::ServiceEndpointSpec = OPENAIServiceEndpoint
model::String = "gpt-5.2"
messages::Vector{Message} = Message[]
history::Bool = true
tools::Union{Vector{GPTTool},Nothing} = nothing
tool_choice::Union{String,GPTToolChoice,Nothing} = nothing
parallel_tool_calls::Union{Bool,Nothing} = false
temperature::Union{Float64,Nothing} = nothing # 0.0–2.0, mutually exclusive with top_p
top_p::Union{Float64,Nothing} = nothing # 0.0–1.0, mutually exclusive with temperature
n::Union{Int64,Nothing} = nothing
stream::Union{Bool,Nothing} = nothing
stop::Union{Vector{String},String,Nothing} = nothing # max 4 sequences
max_tokens::Union{Int64,Nothing} = nothing
presence_penalty::Union{Float64,Nothing} = nothing # -2.0 to 2.0
response_format::Union{ResponseFormat,Nothing} = nothing
frequency_penalty::Union{Float64,Nothing} = nothing # -2.0 to 2.0
logit_bias::Union{AbstractDict{String,Float64},Nothing} = nothing
user::Union{String,Nothing} = nothing
seed::Union{Int64,Nothing} = nothing
end- Model defaults:
"gpt-5.2"for OpenAI,"gemini-2.5-flash"for Gemini,"deepseek-chat"for DeepSeek. ForGenericOpenAIEndpoint/OllamaEndpoint, model must be specified explicitly. history=true: responses are automatically appended tomessages.temperatureandtop_pare mutually exclusive (constructor throwsArgumentError).parallel_tool_callsis auto-set tonothingwhentoolsisnothing.- Parameter validation: the constructor validates ranges at construction time —
temperature∈ [0.0, 2.0],top_p∈ [0.0, 1.0],n∈ [1, 10],presence_penalty∈ [-2.0, 2.0],frequency_penalty∈ [-2.0, 2.0]. Out-of-range values throwArgumentError.
Message
@kwdef struct Message
role::String # RoleSystem, RoleUser, RoleAssistant, or "tool"
content::Union{String,Nothing} = nothing
name::Union{String,Nothing} = nothing
finish_reason::Union{String,Nothing} = nothing # "stop", "tool_calls", "content_filter"
refusal_message::Union{String,Nothing} = nothing
tool_calls::Union{Nothing,Vector{GPTToolCall}} = nothing
tool_call_id::Union{String,Nothing} = nothing # required when role == "tool"
endValidation: at least one of content, tool_calls, or refusal_message must be non-nothing. tool_call_id is required when role == "tool".
Convenience constructors:
Message(Val(:system), "You are a helpful assistant")
Message(Val(:user), "Hello!")Role constants: RoleSystem = "system", RoleUser = "user", RoleAssistant = "assistant".
chatrequest!
# Mutating form — sends chat.messages, appends response when history=true
chatrequest!(chat::Chat; retries::Int=0, callback=nothing) -> LLMSuccess | LLMFailure | LLMCallError | Task
# Keyword-argument convenience form — builds a Chat internally
chatrequest!(; service=OPENAIServiceEndpoint, model="gpt-5.2",
systemprompt, userprompt, messages=Message[], history=true,
tools=nothing, tool_choice=nothing, temperature=nothing, ...) -> same- Non-streaming: returns
LLMSuccess,LLMFailure, orLLMCallError. - Streaming (
stream=true): returns aTask. Pass acallback(chunk::Union{String,Message}, close::Ref{Bool}). - Auto-retries on HTTP 429/500/503 with exponential backoff and jitter (up to 30 attempts). Respects
Retry-Afterheaders on 429 responses.
Conversation Management
push!(chat, message) # append a Message
pop!(chat) # remove last message
update!(chat, message) # append if history=true
issendvalid(chat) -> Bool # check conversation rules (≥1 message, system first if present, etc.)
length(chat) # number of messages
isempty(chat) # true if no messages
chat[i] # index into messagesTool Calling Types
# Define a function the model can call
@kwdef struct GPTFunctionSignature
name::String
description::Union{String,Nothing} = nothing
parameters::Union{AbstractDict,Nothing} = nothing # JSON Schema dict
end
# Wrap it for the tools parameter
@kwdef struct GPTTool
type::String = "function"
func::GPTFunctionSignature
end
GPTTool(d::AbstractDict) # construct from dict with keys "name", "description", "parameters"
# Returned by model when it wants to call a function
@kwdef struct GPTToolCall
id::String
type::String = "function"
func::GPTFunction # has .name::String and .arguments::AbstractDict
end
# Your result after executing the function
struct GPTFunctionCallResult{T}
name::Union{String,Symbol}
origincall::GPTFunction
result::T
endResponseFormat (Structured Output)
@kwdef struct ResponseFormat
type::String = "json_object" # "json_object" or "json_schema"
json_schema::Union{JsonSchemaAPI,AbstractDict,Nothing} = nothing
end
ResponseFormat(json_schema) # shorthand, sets type="json_schema"Chat Completions Example
using UniLM
# Build conversation
chat = Chat(model="gpt-5.2")
push!(chat, Message(Val(:system), "You are a helpful assistant."))
push!(chat, Message(Val(:user), "What is the capital of France?"))
result = chatrequest!(chat)
if result isa LLMSuccess
println(result.message.content) # "Paris..."
# chat.messages already has the response appended (history=true)
end
# One-shot via keywords
result = chatrequest!(
systemprompt="You are a translator.",
userprompt="Translate 'hello' to French.",
model="gpt-5.2"
)Tool Calling Example (Chat)
weather_tool = GPTTool(func=GPTFunctionSignature(
name="get_weather",
description="Get current weather",
parameters=Dict(
"type" => "object",
"properties" => Dict("location" => Dict("type" => "string")),
"required" => ["location"]
)
))
chat = Chat(model="gpt-5.2", tools=[weather_tool])
push!(chat, Message(Val(:system), "You help with weather."))
push!(chat, Message(Val(:user), "Weather in Paris?"))
result = chatrequest!(chat)
if result isa LLMSuccess && result.message.finish_reason == "tool_calls"
for tc in result.message.tool_calls
# tc.func.name == "get_weather", tc.func.arguments == Dict("location" => "Paris")
answer = "22°C, sunny" # your function result
push!(chat, Message(role="tool", content=answer, tool_call_id=tc.id))
end
result2 = chatrequest!(chat)
println(result2.message.content)
endStreaming Example (Chat)
chat = Chat(model="gpt-5.2", stream=true)
push!(chat, Message(Val(:system), "You are helpful."))
push!(chat, Message(Val(:user), "Tell me a story."))
task = chatrequest!(chat) do chunk, close_ref
if chunk isa String
print(chunk) # partial text delta
elseif chunk isa Message
println("\n[Done]") # final assembled message
# close_ref[] = true # to stop early
end
end
result = fetch(task) # LLMSuccess when completeResponses API
Respond
@kwdef struct Respond
service::ServiceEndpointSpec = OPENAIServiceEndpoint
model::String = "gpt-5.2"
input::Union{String, Vector} # String or Vector{InputMessage}
instructions::Union{String,Nothing} = nothing
tools::Union{Vector,Nothing} = nothing # Vector of ResponseTool subtypes
tool_choice::Union{String,Nothing} = nothing # "auto", "none", "required"
parallel_tool_calls::Union{Bool,Nothing} = nothing
temperature::Union{Float64,Nothing} = nothing # 0.0–2.0, mutually exclusive with top_p
top_p::Union{Float64,Nothing} = nothing # 0.0–1.0
max_output_tokens::Union{Int64,Nothing} = nothing
stream::Union{Bool,Nothing} = nothing
text::Union{TextConfig,Nothing} = nothing # output format
reasoning::Union{Reasoning,Nothing} = nothing # O-series models
truncation::Union{String,Nothing} = nothing # "auto" or "disabled"
store::Union{Bool,Nothing} = nothing # store for later retrieval
metadata::Union{AbstractDict,Nothing} = nothing
previous_response_id::Union{String,Nothing} = nothing # multi-turn chaining
user::Union{String,Nothing} = nothing
background::Union{Bool,Nothing} = nothing
include::Union{Vector{String},Nothing} = nothing
max_tool_calls::Union{Int64,Nothing} = nothing
service_tier::Union{String,Nothing} = nothing # "auto","default","flex","priority"
top_logprobs::Union{Int64,Nothing} = nothing # 0–20
prompt::Union{AbstractDict,Nothing} = nothing
prompt_cache_key::Union{String,Nothing} = nothing
prompt_cache_retention::Union{String,Nothing} = nothing # "in-memory","24h"
safety_identifier::Union{String,Nothing} = nothing
conversation::Union{Any,Nothing} = nothing
context_management::Union{Vector,Nothing} = nothing
stream_options::Union{AbstractDict,Nothing} = nothing
endInput Helpers
# Structured input messages
InputMessage(role="user", content="Hello")
InputMessage(role="user", content=[input_text("Describe:"), input_image("https://...")])
# Content part constructors
input_text(text::String) # → Dict(:type=>"input_text", :text=>...)
input_image(url::String; detail=nothing) # → Dict(:type=>"input_image", ...) detail: "auto","low","high"
input_file(; url=nothing, id=nothing) # → Dict(:type=>"input_file", ...) provide url or file idTool Types
abstract type ResponseTool end
@kwdef struct FunctionTool <: ResponseTool
name::String
description::Union{String,Nothing} = nothing
parameters::Union{AbstractDict,Nothing} = nothing
strict::Union{Bool,Nothing} = nothing
end
@kwdef struct WebSearchTool <: ResponseTool
search_context_size::String = "medium" # "low","medium","high"
user_location::Union{AbstractDict,Nothing} = nothing
end
@kwdef struct FileSearchTool <: ResponseTool
vector_store_ids::Vector{String}
max_num_results::Union{Int,Nothing} = nothing
ranking_options::Union{AbstractDict,Nothing} = nothing
filters::Union{AbstractDict,Nothing} = nothing
end@kwdef struct MCPTool <: ResponseTool
server_label::String
server_url::String
require_approval::Union{String, AbstractDict, Nothing} = "never"
allowed_tools::Union{Vector{String}, Nothing} = nothing
headers::Union{AbstractDict, Nothing} = nothing
end
@kwdef struct ComputerUseTool <: ResponseTool
display_width::Int = 1024
display_height::Int = 768
environment::Union{String, Nothing} = nothing
end
@kwdef struct ImageGenerationTool <: ResponseTool
background::Union{String, Nothing} = nothing
output_format::Union{String, Nothing} = nothing
output_compression::Union{Int, Nothing} = nothing
quality::Union{String, Nothing} = nothing
size::Union{String, Nothing} = nothing
end
@kwdef struct CodeInterpreterTool <: ResponseTool
container::Union{AbstractDict, Nothing} = nothing
file_ids::Union{Vector{String}, Nothing} = nothing
endConvenience constructors:
function_tool(name, description=nothing; parameters=nothing, strict=nothing)
function_tool(d::AbstractDict) # from dict with keys "name", "description", "parameters"
web_search(; context_size="medium", location=nothing)
file_search(store_ids; max_results=nothing, ranking=nothing, filters=nothing)
mcp_tool(label, url; require_approval="never", allowed_tools=nothing, headers=nothing)
computer_use(; display_width=1024, display_height=768, environment=nothing)
image_generation_tool(; kwargs...)
code_interpreter(; container=nothing, file_ids=nothing)Text Format / Structured Output
@kwdef struct TextFormatSpec
type::String = "text" # "text","json_object","json_schema"
name::Union{String,Nothing} = nothing
description::Union{String,Nothing} = nothing
schema::Union{AbstractDict,Nothing} = nothing
strict::Union{Bool,Nothing} = nothing
end
@kwdef struct TextConfig
format::TextFormatSpec = TextFormatSpec()
endConvenience constructors:
text_format(; kwargs...) # generic TextConfig
json_schema_format(name, description, schema; strict=nothing) # JSON Schema output
json_schema_format(d::AbstractDict) # from dict with keys "name", "description", "schema"
json_object_format() # unstructured JSONReasoning (O-series models)
@kwdef struct Reasoning
effort::Union{String,Nothing} = nothing # "none","low","medium","high"
generate_summary::Union{String,Nothing} = nothing # "auto","concise","detailed"
summary::Union{String,Nothing} = nothing # deprecated alias
endRespond(input="Hard math problem", model="o3", reasoning=Reasoning(effort="high"))respond
# Struct form
respond(r::Respond; retries=0, callback=nothing) -> ResponseSuccess | ResponseFailure | ResponseCallError | Task
# Convenience — builds Respond internally
respond(input; kwargs...) -> same
# do-block streaming — auto-sets stream=true
respond(callback::Function, input; kwargs...) -> Task- Streaming callback signature:
callback(chunk::Union{String, ResponseObject}, close::Ref{Bool}) - Auto-retries on HTTP 429/500/503 with exponential backoff and jitter (up to 30 attempts). Respects
Retry-Afterheaders. - Parameter validation:
temperature∈ [0.0, 2.0],top_p∈ [0.0, 1.0],max_output_tokens≥ 1,top_logprobs∈ [0, 20]. Out-of-range values throwArgumentError.
Response Accessors
output_text(result::ResponseSuccess)::String # concatenated text output
output_text(result::ResponseFailure)::String # error message
output_text(result::ResponseCallError)::String # error message
function_calls(result::ResponseSuccess)::Vector{Dict{String,Any}}
# Each dict has: "id", "call_id", "name", "arguments" (JSON string), "status"Response Management Functions
get_response(id::String; service=OPENAIServiceEndpoint) -> ResponseSuccess | ResponseFailure | ResponseCallError
delete_response(id::String; service=OPENAIServiceEndpoint) -> Dict | ResponseFailure | ResponseCallError
list_input_items(id::String; limit=20, order="desc", after=nothing, service=OPENAIServiceEndpoint) -> Dict | ...
cancel_response(id::String; service=OPENAIServiceEndpoint) -> ResponseSuccess | ...
compact_response(; model="gpt-5.2", input, service=OPENAIServiceEndpoint) -> Dict | ...
count_input_tokens(; model="gpt-5.2", input, instructions=nothing, tools=nothing, service=OPENAIServiceEndpoint) -> Dict | ...ResponseObject
@kwdef struct ResponseObject
id::String
status::String
model::String
output::Vector{Any}
usage::Union{Dict{String,Any},Nothing} = nothing
error::Union{Any,Nothing} = nothing
metadata::Union{Dict{String,Any},Nothing} = nothing
raw::Dict{String,Any}
endResponses API Examples
using UniLM
# Basic
result = respond("Tell me a joke")
if result isa ResponseSuccess
println(output_text(result))
end
# With instructions
result = respond("Hello", instructions="You are a pirate. Respond in pirate speak.")
# Multi-turn via chaining
r1 = respond("Tell me a joke")
r2 = respond("Tell me another", previous_response_id=r1.response.id)
# Structured output
schema = Dict(
"type" => "object",
"properties" => Dict(
"name" => Dict("type" => "string"),
"age" => Dict("type" => "integer")
),
"required" => ["name", "age"],
"additionalProperties" => false
)
result = respond("Extract: John is 30 years old",
text=json_schema_format("person", "A person", schema, strict=true))
parsed = JSON.parse(output_text(result))
# Web search
result = respond("Latest Julia language news", tools=[web_search()])
# Function calling
tool = function_tool("get_weather", "Get weather",
parameters=Dict(
"type" => "object",
"properties" => Dict("location" => Dict("type" => "string")),
"required" => ["location"]
))
result = respond("Weather in NYC?", tools=ResponseTool[tool])
for call in function_calls(result)
println(call["name"], ": ", call["arguments"])
end
# Streaming (do-block)
respond("Tell me a story") do chunk, close_ref
if chunk isa String
print(chunk)
elseif chunk isa ResponseObject
println("\nDone: ", chunk.status)
end
end
# Reasoning (O-series)
result = respond("Prove that √2 is irrational", model="o3",
reasoning=Reasoning(effort="high", generate_summary="concise"))
# Multimodal input
result = respond([
InputMessage(role="user", content=[
input_text("What's in this image?"),
input_image("https://example.com/photo.jpg")
])
])
# Count tokens without generating
tokens = count_input_tokens(model="gpt-5.2", input="Hello world")
println(tokens["input_tokens"])Image Generation API
ImageGeneration
@kwdef struct ImageGeneration
service::ServiceEndpointSpec = OPENAIServiceEndpoint
model::String = "gpt-image-1.5"
prompt::String
n::Union{Int,Nothing} = nothing # 1–10
size::Union{String,Nothing} = nothing # "1024x1024","1536x1024","1024x1536","auto"
quality::Union{String,Nothing} = nothing # "low","medium","high","auto"
background::Union{String,Nothing} = nothing # "transparent","opaque","auto"
output_format::Union{String,Nothing} = nothing # "png","webp","jpeg"
output_compression::Union{Int,Nothing} = nothing # 0–100 (webp/jpeg only)
user::Union{String,Nothing} = nothing
endgenerate_image
generate_image(ig::ImageGeneration; retries=0) -> ImageSuccess | ImageFailure | ImageCallError
generate_image(prompt::String; kwargs...) -> same # convenienceAuto-retries on 429/500/503 with exponential backoff and jitter (up to 30 attempts). Respects Retry-After headers.
Response Types
struct ImageObject
b64_json::Union{String,Nothing}
revised_prompt::Union{String,Nothing}
end
struct ImageResponse
created::Int64
data::Vector{ImageObject}
usage::Union{Dict{String,Any},Nothing}
raw::Dict{String,Any}
endAccessors
image_data(result::ImageSuccess)::Vector{String} # base64-encoded image strings
image_data(result::ImageFailure)::String[] # empty
image_data(result::ImageCallError)::String[] # empty
save_image(img_b64::String, filepath::String) # decode + write to disk, returns filepathImage Generation Example
using UniLM
result = generate_image("A watercolor painting of a Julia butterfly",
size="1024x1024", quality="high")
if result isa ImageSuccess
imgs = image_data(result)
save_image(imgs[1], "butterfly.png")
println("Saved! Revised prompt: ", result.response.data[1].revised_prompt)
end
# Multiple images with transparent background
result = generate_image("Minimalist logo",
n=3, background="transparent", output_format="png")Embeddings API
Embeddings
struct Embeddings
service::ServiceEndpointSpec # default: OPENAIServiceEndpoint
model::String # default resolved per provider
input::Union{String,Vector{String}}
embeddings::Union{Vector{Float64},Vector{Vector{Float64}}}
user::Union{String,Nothing}
end
Embeddings(input::String; service=OPENAIServiceEndpoint, model="text-embedding-3-small")
Embeddings(input::Vector{String}; service=OPENAIServiceEndpoint, model="text-embedding-3-small")Model defaults: "text-embedding-3-small" for OpenAI, "gemini-embedding-001" for Gemini. For generic/DeepSeek endpoints, model must be specified explicitly.
embeddingrequest!
embeddingrequest!(emb::Embeddings; retries=0) -> (response_dict, emb) | nothingFills emb.embeddings in-place. Auto-retries on 429/500/503 with exponential backoff and jitter (up to 30 attempts). Respects Retry-After headers.
Embeddings Example
using UniLM, LinearAlgebra
emb = Embeddings("What is Julia?")
embeddingrequest!(emb)
println(emb.embeddings[1:5]) # first 5 dimensions
# Batch + cosine similarity
emb = Embeddings(["cat", "dog", "airplane"])
embeddingrequest!(emb)
similarity = dot(emb.embeddings[1], emb.embeddings[2]) /
(norm(emb.embeddings[1]) * norm(emb.embeddings[2]))Cost Tracking
TokenUsage
@kwdef struct TokenUsage
prompt_tokens::Int = 0
completion_tokens::Int = 0
total_tokens::Int = 0
endFunctions
token_usage(result::LLMSuccess)::Union{TokenUsage, Nothing} # extract TokenUsage from a Chat result
token_usage(result::ResponseSuccess)::Union{TokenUsage, Nothing} # extract from Responses API result
estimated_cost(result; model=nothing, pricing=DEFAULT_PRICING) # per-call cost estimate (Float64)
cumulative_cost(chat::Chat)::Float64 # running total for a Chat instance
DEFAULT_PRICING # Dict{String, Tuple{Float64, Float64}} — model → (input_price, output_price) per tokenCost Tracking Example
chat = Chat(model="gpt-5.2")
push!(chat, Message(Val(:system), "You are helpful."))
push!(chat, Message(Val(:user), "Hello!"))
result = chatrequest!(chat)
if result isa LLMSuccess
usage = token_usage(result)
cost = estimated_cost(result)
println("Tokens: $(usage.total_tokens), Cost: \$$(round(cost; digits=6))")
println("Cumulative: \$$(round(cumulative_cost(chat); digits=6))")
endConversation Forking
fork(chat::Chat)::Chat # deep-copy a Chat, resetting cumulative cost
fork(chat::Chat, n::Int)::Vector{Chat} # create n independent forksFork Example
chat = Chat(model="gpt-5.2")
push!(chat, Message(Val(:system), "You are a creative writer."))
push!(chat, Message(Val(:user), "Start a story about a robot."))
chatrequest!(chat)
# Fork into 3 independent continuations
forks = fork(chat, 3)
for (i, f) in enumerate(forks)
push!(f, Message(Val(:user), "Continue the story with ending $i."))
chatrequest!(f)
endTool Loop
Automated tool dispatch for both APIs. Wraps a tool schema with a callable function.
CallableTool
struct CallableTool{T}
tool::T # GPTTool or FunctionTool
callable::Function # (name::String, args::Dict{String,Any}) -> String
endto_tool
to_tool(x) # identity for GPTTool, FunctionTool, CallableTool; converts AbstractDict to GPTToolToolCallOutcome / ToolLoopResult
# Per-call record
struct ToolCallOutcome
tool_name::String
arguments::Dict{String,Any}
result::Union{GPTFunctionCallResult,Nothing}
success::Bool
error::Union{String,Nothing}
end
# Loop result
struct ToolLoopResult
response::LLMRequestResponse
tool_calls::Vector{ToolCallOutcome}
turns_used::Int
completed::Bool
llm_error::Union{String,Nothing}
endtool_loop! (Chat Completions)
tool_loop!(chat, dispatcher; max_turns=10, retries=0) -> ToolLoopResult
tool_loop!(chat; tools::Vector{<:CallableTool}, kwargs...) -> ToolLoopResulttool_loop (Responses API)
tool_loop(r::Respond, dispatcher; max_turns=10, retries=0) -> ToolLoopResult
tool_loop(r::Respond; max_turns=10, retries=0) -> ToolLoopResult # extracts callables from r.tools
tool_loop(input, dispatcher; tools, kwargs...) -> ToolLoopResult # convenience formMCP Client
Native MCP client (JSON-RPC 2.0 over stdio or HTTP, spec 2025-11-25).
Types
MCPSession # live connection — manages transport + cached tools/resources/prompts
MCPToolInfo # tool definition from tools/list
MCPResourceInfo # resource definition from resources/list
MCPPromptInfo # prompt definition from prompts/list
MCPServerCapabilities # capabilities from initialize
MCPTransport # abstract (subtypes: StdioTransport, HTTPTransport)
MCPError <: Exception # JSON-RPC error (code, message, data)Lifecycle
mcp_connect(command::Cmd; ...) -> MCPSession # stdio subprocess
mcp_connect(url::String; headers=[], ...) -> MCPSession # HTTP
mcp_connect(f::Function, args...; ...) # do-block, auto-disconnect
mcp_disconnect!(session)Discovery
list_tools!(session) -> Vector{MCPToolInfo}
list_resources!(session) -> Vector{MCPResourceInfo}
list_prompts!(session) -> Vector{MCPPromptInfo}Operations
call_tool(session, name, arguments) -> String
read_resource(session, uri) -> String
get_prompt(session, name, arguments) -> Vector{Dict}
ping(session)Tool Bridge
mcp_tools(session) -> Vector{CallableTool{GPTTool}} # for tool_loop!
mcp_tools_respond(session) -> Vector{CallableTool{FunctionTool}} # for tool_loopClient Example
session = mcp_connect(`npx -y @modelcontextprotocol/server-filesystem /tmp`)
tools = mcp_tools(session)
chat = Chat(model="gpt-5.2", tools=map(t -> t.tool, tools))
push!(chat, Message(Val(:user), "List files"))
result = tool_loop!(chat; tools)
mcp_disconnect!(session)MCP Server
Build MCP servers that expose tools, resources, and prompts.
Types
MCPServer(name, version; description=nothing)
MCPServerPrimitive # abstract (MCPServerTool, MCPServerResource, MCPServerResourceTemplate, MCPServerPrompt)Registration
register_tool!(server, name, description, schema, handler)
register_tool!(server, name, description, handler) # auto-schema from signature
register_tool!(server, ct::CallableTool{GPTTool}) # bridge from Chat API
register_tool!(server, ct::CallableTool{FunctionTool}) # bridge from Responses API
register_resource!(server, uri, name, handler; mime_type="text/plain", description=nothing)
register_resource_template!(server, uri_template, name, handler; ...)
register_prompt!(server, name, handler; description=nothing, arguments=[])Macros
@mcp_tool server function name(args...) body end
@mcp_resource server uri function(args...) body end
@mcp_prompt server name function(args...) body endServing
serve(server; transport=:stdio) # default — stdio
serve(server; transport=:http, host="127.0.0.1", port=8080) # HTTPServer Example
server = MCPServer("calc", "1.0.0")
@mcp_tool server function add(a::Float64, b::Float64)::String
string(a + b)
end
serve(server)FIM Completion
Fill-in-the-Middle: generate text between a prompt (prefix) and suffix. Supported by DeepSeek (beta), Ollama, vLLM.
@kwdef struct FIMCompletion
service::ServiceEndpointSpec
model::String = "deepseek-chat"
prompt::String
suffix::Union{String,Nothing} = nothing
max_tokens::Union{Int,Nothing} = 128
# temperature, top_p, stream, stop, echo, logprobs, frequency_penalty, presence_penalty
end
struct FIMChoice; text, index, finish_reason; end
struct FIMResponse; choices, usage, model, raw; end
struct FIMSuccess <: LLMRequestResponse; response::FIMResponse; end
struct FIMFailure <: LLMRequestResponse; response, status; end
struct FIMCallError <: LLMRequestResponse; error, status; end
fim_complete(fim::FIMCompletion; retries=0) -> LLMRequestResponse
fim_complete(prompt; suffix=nothing, kwargs...) -> LLMRequestResponse # convenience
fim_text(result) -> String # extract generated textFIM Example
result = fim_complete("def fib(a):",
service=DeepSeekEndpoint(), suffix=" return fib(a-1) + fib(a-2)",
max_tokens=128, stop=["\n\n"])
println(fim_text(result))Chat Prefix Completion
Continue from a partial assistant message. The model generates text continuing from the assistant's prefix. DeepSeek beta feature.
prefix_complete(chat::Chat; retries=0) -> LLMRequestResponse
# Last message must be role=assistant with the prefix textPrefix Example
chat = Chat(service=DeepSeekEndpoint(), model="deepseek-chat")
push!(chat, Message(Val(:system), "You are a coding assistant."))
push!(chat, Message(Val(:user), "Write quicksort in Python"))
push!(chat, Message(role=RoleAssistant, content="```python\n"))
result = prefix_complete(chat)Provider Capabilities
Each endpoint declares supported features. Request functions validate before dispatch.
provider_capabilities(service) -> Set{Symbol}
has_capability(service, cap::Symbol) -> BoolCapabilities by Provider
| Provider | Capabilities |
|---|---|
| OpenAI | :chat, :responses, :embeddings, :images, :tools, :json_output |
| Azure | :chat, :tools |
| Gemini | :chat, :embeddings, :tools, :json_output |
| DeepSeek | :chat, :tools, :fim, :prefix_completion, :json_output |
| Generic | :chat, :embeddings, :fim, :tools, :responses |
Request functions call validate_capability internally and throw ArgumentError if the provider does not support the requested feature. You do not need to check capabilities manually before making requests.
Result Type Hierarchy
All API call results inherit from LLMRequestResponse:
LLMRequestResponse (abstract)
├── LLMSuccess — Chat Completions success (.message::Message, .self::Chat)
├── LLMFailure — Chat Completions HTTP error (.response::String, .status::Int, .self::Chat)
├── LLMCallError — Chat Completions exception (.error::String, .status, .self::Chat)
├── ResponseSuccess — Responses API success (.response::ResponseObject)
├── ResponseFailure — Responses API HTTP error (.response::String, .status::Int)
├── ResponseCallError — Responses API exception (.error::String, .status)
├── ImageSuccess — Image Gen success (.response::ImageResponse)
├── ImageFailure — Image Gen HTTP error (.response::String, .status::Int)
├── ImageCallError — Image Gen exception (.error::String, .status)
├── FIMSuccess — FIM success (.response::FIMResponse)
├── FIMFailure — FIM HTTP error (.response::String, .status::Int)
└── FIMCallError — FIM exception (.error::String, .status)Standard pattern-matching idiom:
result = chatrequest!(chat)
if result isa LLMSuccess
println(result.message.content)
elseif result isa LLMFailure
@error "HTTP $(result.status): $(result.response)"
elseif result isa LLMCallError
@error "Exception: $(result.error)"
end
result = respond("Hello")
if result isa ResponseSuccess
println(output_text(result))
elseif result isa ResponseFailure
@error "HTTP $(result.status)"
elseif result isa ResponseCallError
@error result.error
end
result = generate_image("A cat")
if result isa ImageSuccess
save_image(image_data(result)[1], "cat.png")
elseif result isa ImageFailure
@error "HTTP $(result.status)"
elseif result isa ImageCallError
@error result.error
endAPI Constants
const OPENAI_BASE_URL = "https://api.openai.com"
const CHAT_COMPLETIONS_PATH = "/v1/chat/completions"
const EMBEDDINGS_PATH = "/v1/embeddings"
const RESPONSES_PATH = "/v1/responses"
const IMAGES_GENERATIONS_PATH = "/v1/images/generations"
const COMPLETIONS_PATH = "/v1/completions" # FIM endpoint
const DEEPSEEK_BASE_URL = "https://api.deepseek.com"
const DEEPSEEK_BETA_BASE_URL = "https://api.deepseek.com/beta"
const GEMINI_CHAT_URL = "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions"Exceptions
struct InvalidConversationError <: Exception
reason::String
endThrown by issendvalid / internal validation when conversation structure is invalid (e.g., missing system message position, consecutive same-role messages).
Complete Exports List
Chat Completions: Chat, Message, RoleSystem, RoleUser, RoleAssistant, GPTTool, GPTToolCall, GPTFunctionSignature, GPTFunctionCallResult, InvalidConversationError, issendvalid, chatrequest!, update!, ResponseFormat
Responses API: Respond, InputMessage, ResponseTool, FunctionTool, WebSearchTool, FileSearchTool, MCPTool, ComputerUseTool, ImageGenerationTool, CodeInterpreterTool, TextConfig, TextFormatSpec, Reasoning, ResponseObject, ResponseSuccess, ResponseFailure, ResponseCallError, respond, get_response, delete_response, list_input_items, cancel_response, compact_response, count_input_tokens, output_text, function_calls, input_text, input_image, input_file, function_tool, web_search, file_search, mcp_tool, computer_use, image_generation_tool, code_interpreter, text_format, json_schema_format, json_object_format
Image Generation: ImageGeneration, ImageObject, ImageResponse, ImageSuccess, ImageFailure, ImageCallError, generate_image, image_data, save_image
Embeddings: Embeddings, embeddingrequest!
Cost Tracking: TokenUsage, token_usage, estimated_cost, cumulative_cost, DEFAULT_PRICING
Service Endpoints: ServiceEndpoint, ServiceEndpointSpec, OPENAIServiceEndpoint, AZUREServiceEndpoint, GEMINIServiceEndpoint, GenericOpenAIEndpoint, OllamaEndpoint, MistralEndpoint, DeepSeekEndpoint, add_azure_deploy_name!
Forking: fork
Tool Loop: CallableTool, ToolCallOutcome, ToolLoopResult, tool_loop!, tool_loop, to_tool
MCP Client: MCPSession, MCPToolInfo, MCPResourceInfo, MCPPromptInfo, MCPServerCapabilities, MCPTransport, StdioTransport, HTTPTransport, MCPError, mcp_connect, mcp_disconnect!, mcp_tools, mcp_tools_respond, list_tools!, list_resources!, list_prompts!, call_tool, read_resource, get_prompt, ping
MCP Server: MCPServer, MCPServerTool, MCPServerResource, MCPServerResourceTemplate, MCPServerPrompt, MCPServerPrimitive, register_tool!, register_resource!, register_resource_template!, register_prompt!, serve, @mcp_tool, @mcp_resource, @mcp_prompt
FIM / Completions: FIMCompletion, FIMChoice, FIMResponse, FIMSuccess, FIMFailure, FIMCallError, fim_complete, fim_text, prefix_complete
Provider Capabilities: has_capability, provider_capabilities
Result Types: LLMRequestResponse, LLMSuccess, LLMFailure, LLMCallError