UniLM.jl — LLM Reference
Single-file reference for LLM code-generation systems. UniLM.jl v0.8.0 · Julia ≥ 1.12 · Deps:
HTTP.jl,JSON.jl,Base64Repo: https://github.com/algunion/UniLM.jl
Installation
using Pkg
Pkg.add("UniLM")
using UniLMEnvironment Variables
| Variable | Required by | Description |
|---|---|---|
OPENAI_API_KEY | OPENAIServiceEndpoint (default) | OpenAI API key |
AZURE_OPENAI_BASE_URL | AZUREServiceEndpoint | Azure deployment base URL |
AZURE_OPENAI_API_KEY | AZUREServiceEndpoint | Azure API key |
AZURE_OPENAI_API_VERSION | AZUREServiceEndpoint | Azure API version string |
AZURE_OPENAI_DEPLOY_NAME_GPT_5_2 | AZUREServiceEndpoint | Auto-registers Azure deployment for "gpt-5.2" |
GEMINI_API_KEY | GEMINIServiceEndpoint | Google Gemini API key |
Four APIs
UniLM.jl wraps four OpenAI API surfaces:
- Chat Completions (
Chat+chatrequest!) — stateful, message-based conversations with tool calling, streaming, structured output. Supports OpenAI, Azure, and Gemini backends. - Responses API (
Respond+respond) — newer, more flexible API with built-in tools (web search, file search), multi-turn chaining viaprevious_response_id, reasoning support for O-series models, structured output. OpenAI only. - Image Generation (
ImageGeneration+generate_image) — text-to-image withgpt-image-1.5. OpenAI only. - Embeddings (
Embeddings+embeddingrequest!) — vector embeddings withtext-embedding-3-small.
Service Endpoints
abstract type ServiceEndpoint end
struct OPENAIServiceEndpoint <: ServiceEndpoint end # default — uses OPENAI_API_KEY
struct AZUREServiceEndpoint <: ServiceEndpoint end # uses AZURE_OPENAI_* env vars
struct GEMINIServiceEndpoint <: ServiceEndpoint end # uses GEMINI_API_KEY
struct GenericOpenAIEndpoint <: ServiceEndpoint # any OpenAI-compatible provider
base_url::String
api_key::String
end
# Convenience constructors
OllamaEndpoint(; base_url="http://localhost:11434") # Ollama local
MistralEndpoint(; api_key=ENV["MISTRAL_API_KEY"]) # Mistral AIProvider Compatibility
| API Surface | Status | Providers |
|---|---|---|
| Chat Completions | De facto standard | OpenAI, Azure, Gemini, Mistral, Ollama, vLLM, LM Studio, Anthropic* |
| Embeddings | Widely adopted | OpenAI, Gemini, Mistral, Ollama, vLLM |
| Responses API | Emerging (Open Responses) | OpenAI, Ollama, vLLM, Amazon Bedrock |
| Image Generation | Limited | OpenAI, Gemini, Ollama |
*Anthropic compat layer not production-recommended by Anthropic.
Register additional Azure deployments at runtime:
add_azure_deploy_name!(model::String, deploy_name::String)
# e.g. add_azure_deploy_name!("gpt-5.2", "my-deployment")Pass the backend via the service keyword on Chat, Respond, or ImageGeneration:
Chat(service=AZUREServiceEndpoint, model="gpt-5.2")
Respond(service=OPENAIServiceEndpoint, input="Hello")Chat Completions API
Chat
@kwdef struct Chat
service::Type{<:ServiceEndpoint} = OPENAIServiceEndpoint
model::String = "gpt-5.2"
messages::Vector{Message} = Message[]
history::Bool = true
tools::Union{Vector{GPTTool},Nothing} = nothing
tool_choice::Union{String,GPTToolChoice,Nothing} = nothing
parallel_tool_calls::Union{Bool,Nothing} = false
temperature::Union{Float64,Nothing} = nothing # 0.0–2.0, mutually exclusive with top_p
top_p::Union{Float64,Nothing} = nothing # 0.0–1.0, mutually exclusive with temperature
n::Union{Int64,Nothing} = nothing
stream::Union{Bool,Nothing} = nothing
stop::Union{Vector{String},String,Nothing} = nothing # max 4 sequences
max_tokens::Union{Int64,Nothing} = nothing
presence_penalty::Union{Float64,Nothing} = nothing # -2.0 to 2.0
response_format::Union{ResponseFormat,Nothing} = nothing
frequency_penalty::Union{Float64,Nothing} = nothing # -2.0 to 2.0
logit_bias::Union{AbstractDict{String,Float64},Nothing} = nothing
user::Union{String,Nothing} = nothing
seed::Union{Int64,Nothing} = nothing
endhistory=true: responses are automatically appended tomessages.temperatureandtop_pare mutually exclusive (constructor throwsArgumentError).parallel_tool_callsis auto-set tonothingwhentoolsisnothing.- Parameter validation: the constructor validates ranges at construction time —
temperature∈ [0.0, 2.0],top_p∈ [0.0, 1.0],n∈ [1, 10],presence_penalty∈ [-2.0, 2.0],frequency_penalty∈ [-2.0, 2.0]. Out-of-range values throwArgumentError.
Message
@kwdef struct Message
role::String # RoleSystem, RoleUser, RoleAssistant, or "tool"
content::Union{String,Nothing} = nothing
name::Union{String,Nothing} = nothing
finish_reason::Union{String,Nothing} = nothing # "stop", "tool_calls", "content_filter"
refusal_message::Union{String,Nothing} = nothing
tool_calls::Union{Nothing,Vector{GPTToolCall}} = nothing
tool_call_id::Union{String,Nothing} = nothing # required when role == "tool"
endValidation: at least one of content, tool_calls, or refusal_message must be non-nothing. tool_call_id is required when role == "tool".
Convenience constructors:
Message(Val(:system), "You are a helpful assistant")
Message(Val(:user), "Hello!")Role constants: RoleSystem = "system", RoleUser = "user", RoleAssistant = "assistant".
chatrequest!
# Mutating form — sends chat.messages, appends response when history=true
chatrequest!(chat::Chat; retries::Int=0, callback=nothing) -> LLMSuccess | LLMFailure | LLMCallError | Task
# Keyword-argument convenience form — builds a Chat internally
chatrequest!(; service=OPENAIServiceEndpoint, model="gpt-5.2",
systemprompt, userprompt, messages=Message[], history=true,
tools=nothing, tool_choice=nothing, temperature=nothing, ...) -> same- Non-streaming: returns
LLMSuccess,LLMFailure, orLLMCallError. - Streaming (
stream=true): returns aTask. Pass acallback(chunk::Union{String,Message}, close::Ref{Bool}). - Auto-retries on HTTP 429/500/503 with exponential backoff and jitter (up to 30 attempts). Respects
Retry-Afterheaders on 429 responses.
Conversation Management
push!(chat, message) # append a Message
pop!(chat) # remove last message
update!(chat, message) # append if history=true
issendvalid(chat) -> Bool # check conversation rules (≥1 message, system first if present, etc.)
length(chat) # number of messages
isempty(chat) # true if no messages
chat[i] # index into messagesTool Calling Types
# Define a function the model can call
@kwdef struct GPTFunctionSignature
name::String
description::Union{String,Nothing} = nothing
parameters::Union{AbstractDict,Nothing} = nothing # JSON Schema dict
end
# Wrap it for the tools parameter
@kwdef struct GPTTool
type::String = "function"
func::GPTFunctionSignature
end
GPTTool(d::AbstractDict) # construct from dict with keys "name", "description", "parameters"
# Returned by model when it wants to call a function
@kwdef struct GPTToolCall
id::String
type::String = "function"
func::GPTFunction # has .name::String and .arguments::AbstractDict
end
# Your result after executing the function
struct GPTFunctionCallResult{T}
name::Union{String,Symbol}
origincall::GPTFunction
result::T
endResponseFormat (Structured Output)
@kwdef struct ResponseFormat
type::String = "json_object" # "json_object" or "json_schema"
json_schema::Union{JsonSchemaAPI,AbstractDict,Nothing} = nothing
end
ResponseFormat(json_schema) # shorthand, sets type="json_schema"Chat Completions Example
using UniLM
# Build conversation
chat = Chat(model="gpt-5.2")
push!(chat, Message(Val(:system), "You are a helpful assistant."))
push!(chat, Message(Val(:user), "What is the capital of France?"))
result = chatrequest!(chat)
if result isa LLMSuccess
println(result.message.content) # "Paris..."
# chat.messages already has the response appended (history=true)
end
# One-shot via keywords
result = chatrequest!(
systemprompt="You are a translator.",
userprompt="Translate 'hello' to French.",
model="gpt-5.2"
)Tool Calling Example (Chat)
weather_tool = GPTTool(func=GPTFunctionSignature(
name="get_weather",
description="Get current weather",
parameters=Dict(
"type" => "object",
"properties" => Dict("location" => Dict("type" => "string")),
"required" => ["location"]
)
))
chat = Chat(model="gpt-5.2", tools=[weather_tool])
push!(chat, Message(Val(:system), "You help with weather."))
push!(chat, Message(Val(:user), "Weather in Paris?"))
result = chatrequest!(chat)
if result isa LLMSuccess && result.message.finish_reason == "tool_calls"
for tc in result.message.tool_calls
# tc.func.name == "get_weather", tc.func.arguments == Dict("location" => "Paris")
answer = "22°C, sunny" # your function result
push!(chat, Message(role="tool", content=answer, tool_call_id=tc.id))
end
result2 = chatrequest!(chat)
println(result2.message.content)
endStreaming Example (Chat)
chat = Chat(model="gpt-5.2", stream=true)
push!(chat, Message(Val(:system), "You are helpful."))
push!(chat, Message(Val(:user), "Tell me a story."))
task = chatrequest!(chat) do chunk, close_ref
if chunk isa String
print(chunk) # partial text delta
elseif chunk isa Message
println("\n[Done]") # final assembled message
# close_ref[] = true # to stop early
end
end
result = fetch(task) # LLMSuccess when completeResponses API
Respond
@kwdef struct Respond
service::Type{<:ServiceEndpoint} = OPENAIServiceEndpoint
model::String = "gpt-5.2"
input::Union{String, Vector} # String or Vector{InputMessage}
instructions::Union{String,Nothing} = nothing
tools::Union{Vector,Nothing} = nothing # Vector of ResponseTool subtypes
tool_choice::Union{String,Nothing} = nothing # "auto", "none", "required"
parallel_tool_calls::Union{Bool,Nothing} = nothing
temperature::Union{Float64,Nothing} = nothing # 0.0–2.0, mutually exclusive with top_p
top_p::Union{Float64,Nothing} = nothing # 0.0–1.0
max_output_tokens::Union{Int64,Nothing} = nothing
stream::Union{Bool,Nothing} = nothing
text::Union{TextConfig,Nothing} = nothing # output format
reasoning::Union{Reasoning,Nothing} = nothing # O-series models
truncation::Union{String,Nothing} = nothing # "auto" or "disabled"
store::Union{Bool,Nothing} = nothing # store for later retrieval
metadata::Union{AbstractDict,Nothing} = nothing
previous_response_id::Union{String,Nothing} = nothing # multi-turn chaining
user::Union{String,Nothing} = nothing
background::Union{Bool,Nothing} = nothing
include::Union{Vector{String},Nothing} = nothing
max_tool_calls::Union{Int64,Nothing} = nothing
service_tier::Union{String,Nothing} = nothing # "auto","default","flex","priority"
top_logprobs::Union{Int64,Nothing} = nothing # 0–20
prompt::Union{AbstractDict,Nothing} = nothing
prompt_cache_key::Union{String,Nothing} = nothing
prompt_cache_retention::Union{String,Nothing} = nothing # "in-memory","24h"
safety_identifier::Union{String,Nothing} = nothing
conversation::Union{Any,Nothing} = nothing
context_management::Union{Vector,Nothing} = nothing
stream_options::Union{AbstractDict,Nothing} = nothing
endInput Helpers
# Structured input messages
InputMessage(role="user", content="Hello")
InputMessage(role="user", content=[input_text("Describe:"), input_image("https://...")])
# Content part constructors
input_text(text::String) # → Dict(:type=>"input_text", :text=>...)
input_image(url::String; detail=nothing) # → Dict(:type=>"input_image", ...) detail: "auto","low","high"
input_file(; url=nothing, id=nothing) # → Dict(:type=>"input_file", ...) provide url or file idTool Types
abstract type ResponseTool end
@kwdef struct FunctionTool <: ResponseTool
name::String
description::Union{String,Nothing} = nothing
parameters::Union{AbstractDict,Nothing} = nothing
strict::Union{Bool,Nothing} = nothing
end
@kwdef struct WebSearchTool <: ResponseTool
search_context_size::String = "medium" # "low","medium","high"
user_location::Union{AbstractDict,Nothing} = nothing
end
@kwdef struct FileSearchTool <: ResponseTool
vector_store_ids::Vector{String}
max_num_results::Union{Int,Nothing} = nothing
ranking_options::Union{AbstractDict,Nothing} = nothing
filters::Union{AbstractDict,Nothing} = nothing
end@kwdef struct MCPTool <: ResponseTool
server_label::String
server_url::String
require_approval::Union{String, AbstractDict, Nothing} = "never"
allowed_tools::Union{Vector{String}, Nothing} = nothing
headers::Union{AbstractDict, Nothing} = nothing
end
@kwdef struct ComputerUseTool <: ResponseTool
display_width::Int = 1024
display_height::Int = 768
environment::Union{String, Nothing} = nothing
end
@kwdef struct ImageGenerationTool <: ResponseTool
background::Union{String, Nothing} = nothing
output_format::Union{String, Nothing} = nothing
output_compression::Union{Int, Nothing} = nothing
quality::Union{String, Nothing} = nothing
size::Union{String, Nothing} = nothing
end
@kwdef struct CodeInterpreterTool <: ResponseTool
container::Union{AbstractDict, Nothing} = nothing
file_ids::Union{Vector{String}, Nothing} = nothing
endConvenience constructors:
function_tool(name, description=nothing; parameters=nothing, strict=nothing)
function_tool(d::AbstractDict) # from dict with keys "name", "description", "parameters"
web_search(; context_size="medium", location=nothing)
file_search(store_ids; max_results=nothing, ranking=nothing, filters=nothing)
mcp_tool(label, url; require_approval="never", allowed_tools=nothing, headers=nothing)
computer_use(; display_width=1024, display_height=768, environment=nothing)
image_generation_tool(; kwargs...)
code_interpreter(; container=nothing, file_ids=nothing)Text Format / Structured Output
@kwdef struct TextFormatSpec
type::String = "text" # "text","json_object","json_schema"
name::Union{String,Nothing} = nothing
description::Union{String,Nothing} = nothing
schema::Union{AbstractDict,Nothing} = nothing
strict::Union{Bool,Nothing} = nothing
end
@kwdef struct TextConfig
format::TextFormatSpec = TextFormatSpec()
endConvenience constructors:
text_format(; kwargs...) # generic TextConfig
json_schema_format(name, description, schema; strict=nothing) # JSON Schema output
json_schema_format(d::AbstractDict) # from dict with keys "name", "description", "schema"
json_object_format() # unstructured JSONReasoning (O-series models)
@kwdef struct Reasoning
effort::Union{String,Nothing} = nothing # "none","low","medium","high"
generate_summary::Union{String,Nothing} = nothing # "auto","concise","detailed"
summary::Union{String,Nothing} = nothing # deprecated alias
endRespond(input="Hard math problem", model="o3", reasoning=Reasoning(effort="high"))respond
# Struct form
respond(r::Respond; retries=0, callback=nothing) -> ResponseSuccess | ResponseFailure | ResponseCallError | Task
# Convenience — builds Respond internally
respond(input; kwargs...) -> same
# do-block streaming — auto-sets stream=true
respond(callback::Function, input; kwargs...) -> Task- Streaming callback signature:
callback(chunk::Union{String, ResponseObject}, close::Ref{Bool}) - Auto-retries on HTTP 429/500/503 with exponential backoff and jitter (up to 30 attempts). Respects
Retry-Afterheaders. - Parameter validation:
temperature∈ [0.0, 2.0],top_p∈ [0.0, 1.0],max_output_tokens≥ 1,top_logprobs∈ [0, 20]. Out-of-range values throwArgumentError.
Response Accessors
output_text(result::ResponseSuccess)::String # concatenated text output
output_text(result::ResponseFailure)::String # error message
output_text(result::ResponseCallError)::String # error message
function_calls(result::ResponseSuccess)::Vector{Dict{String,Any}}
# Each dict has: "id", "call_id", "name", "arguments" (JSON string), "status"Response Management Functions
get_response(id::String; service=OPENAIServiceEndpoint) -> ResponseSuccess | ResponseFailure | ResponseCallError
delete_response(id::String; service=OPENAIServiceEndpoint) -> Dict | ResponseFailure | ResponseCallError
list_input_items(id::String; limit=20, order="desc", after=nothing, service=OPENAIServiceEndpoint) -> Dict | ...
cancel_response(id::String; service=OPENAIServiceEndpoint) -> ResponseSuccess | ...
compact_response(; model="gpt-5.2", input, service=OPENAIServiceEndpoint) -> Dict | ...
count_input_tokens(; model="gpt-5.2", input, instructions=nothing, tools=nothing, service=OPENAIServiceEndpoint) -> Dict | ...ResponseObject
@kwdef struct ResponseObject
id::String
status::String
model::String
output::Vector{Any}
usage::Union{Dict{String,Any},Nothing} = nothing
error::Union{Any,Nothing} = nothing
metadata::Union{Dict{String,Any},Nothing} = nothing
raw::Dict{String,Any}
endResponses API Examples
using UniLM
# Basic
result = respond("Tell me a joke")
if result isa ResponseSuccess
println(output_text(result))
end
# With instructions
result = respond("Hello", instructions="You are a pirate. Respond in pirate speak.")
# Multi-turn via chaining
r1 = respond("Tell me a joke")
r2 = respond("Tell me another", previous_response_id=r1.response.id)
# Structured output
schema = Dict(
"type" => "object",
"properties" => Dict(
"name" => Dict("type" => "string"),
"age" => Dict("type" => "integer")
),
"required" => ["name", "age"],
"additionalProperties" => false
)
result = respond("Extract: John is 30 years old",
text=json_schema_format("person", "A person", schema, strict=true))
parsed = JSON.parse(output_text(result))
# Web search
result = respond("Latest Julia language news", tools=[web_search()])
# Function calling
tool = function_tool("get_weather", "Get weather",
parameters=Dict(
"type" => "object",
"properties" => Dict("location" => Dict("type" => "string")),
"required" => ["location"]
))
result = respond("Weather in NYC?", tools=ResponseTool[tool])
for call in function_calls(result)
println(call["name"], ": ", call["arguments"])
end
# Streaming (do-block)
respond("Tell me a story") do chunk, close_ref
if chunk isa String
print(chunk)
elseif chunk isa ResponseObject
println("\nDone: ", chunk.status)
end
end
# Reasoning (O-series)
result = respond("Prove that √2 is irrational", model="o3",
reasoning=Reasoning(effort="high", generate_summary="concise"))
# Multimodal input
result = respond([
InputMessage(role="user", content=[
input_text("What's in this image?"),
input_image("https://example.com/photo.jpg")
])
])
# Count tokens without generating
tokens = count_input_tokens(model="gpt-5.2", input="Hello world")
println(tokens["input_tokens"])Image Generation API
ImageGeneration
@kwdef struct ImageGeneration
service::Type{<:ServiceEndpoint} = OPENAIServiceEndpoint
model::String = "gpt-image-1.5"
prompt::String
n::Union{Int,Nothing} = nothing # 1–10
size::Union{String,Nothing} = nothing # "1024x1024","1536x1024","1024x1536","auto"
quality::Union{String,Nothing} = nothing # "low","medium","high","auto"
background::Union{String,Nothing} = nothing # "transparent","opaque","auto"
output_format::Union{String,Nothing} = nothing # "png","webp","jpeg"
output_compression::Union{Int,Nothing} = nothing # 0–100 (webp/jpeg only)
user::Union{String,Nothing} = nothing
endgenerate_image
generate_image(ig::ImageGeneration; retries=0) -> ImageSuccess | ImageFailure | ImageCallError
generate_image(prompt::String; kwargs...) -> same # convenienceAuto-retries on 429/500/503 with exponential backoff and jitter (up to 30 attempts). Respects Retry-After headers.
Response Types
struct ImageObject
b64_json::Union{String,Nothing}
revised_prompt::Union{String,Nothing}
end
struct ImageResponse
created::Int64
data::Vector{ImageObject}
usage::Union{Dict{String,Any},Nothing}
raw::Dict{String,Any}
endAccessors
image_data(result::ImageSuccess)::Vector{String} # base64-encoded image strings
image_data(result::ImageFailure)::String[] # empty
image_data(result::ImageCallError)::String[] # empty
save_image(img_b64::String, filepath::String) # decode + write to disk, returns filepathImage Generation Example
using UniLM
result = generate_image("A watercolor painting of a Julia butterfly",
size="1024x1024", quality="high")
if result isa ImageSuccess
imgs = image_data(result)
save_image(imgs[1], "butterfly.png")
println("Saved! Revised prompt: ", result.response.data[1].revised_prompt)
end
# Multiple images with transparent background
result = generate_image("Minimalist logo",
n=3, background="transparent", output_format="png")Embeddings API
Embeddings
struct Embeddings
model::String # "text-embedding-3-small" (1536 dims)
input::Union{String,Vector{String}}
embeddings::Union{Vector{Float64},Vector{Vector{Float64}}}
user::Union{String,Nothing}
end
Embeddings(input::String) # single input, pre-allocates 1536-dim vector
Embeddings(input::Vector{String}) # batch input, pre-allocates one vector per inputembeddingrequest!
embeddingrequest!(emb::Embeddings; retries=0) -> (response_dict, emb) | nothingFills emb.embeddings in-place. Auto-retries on 429/500/503 with exponential backoff and jitter (up to 30 attempts). Respects Retry-After headers.
Embeddings Example
using UniLM, LinearAlgebra
emb = Embeddings("What is Julia?")
embeddingrequest!(emb)
println(emb.embeddings[1:5]) # first 5 dimensions
# Batch + cosine similarity
emb = Embeddings(["cat", "dog", "airplane"])
embeddingrequest!(emb)
similarity = dot(emb.embeddings[1], emb.embeddings[2]) /
(norm(emb.embeddings[1]) * norm(emb.embeddings[2]))Cost Tracking
TokenUsage
@kwdef struct TokenUsage
prompt_tokens::Int = 0
completion_tokens::Int = 0
total_tokens::Int = 0
endFunctions
token_usage(result::LLMSuccess)::Union{TokenUsage, Nothing} # extract TokenUsage from a Chat result
token_usage(result::ResponseSuccess)::Union{TokenUsage, Nothing} # extract from Responses API result
estimated_cost(result; model=nothing, pricing=DEFAULT_PRICING) # per-call cost estimate (Float64)
cumulative_cost(chat::Chat)::Float64 # running total for a Chat instance
DEFAULT_PRICING # Dict{String, Tuple{Float64, Float64}} — model → (input_price, output_price) per tokenCost Tracking Example
chat = Chat(model="gpt-5.2")
push!(chat, Message(Val(:system), "You are helpful."))
push!(chat, Message(Val(:user), "Hello!"))
result = chatrequest!(chat)
if result isa LLMSuccess
usage = token_usage(result)
cost = estimated_cost(result)
println("Tokens: $(usage.total_tokens), Cost: \$$(round(cost; digits=6))")
println("Cumulative: \$$(round(cumulative_cost(chat); digits=6))")
endConversation Forking
fork(chat::Chat)::Chat # deep-copy a Chat, resetting cumulative cost
fork(chat::Chat, n::Int)::Vector{Chat} # create n independent forksFork Example
chat = Chat(model="gpt-5.2")
push!(chat, Message(Val(:system), "You are a creative writer."))
push!(chat, Message(Val(:user), "Start a story about a robot."))
chatrequest!(chat)
# Fork into 3 independent continuations
forks = fork(chat, 3)
for (i, f) in enumerate(forks)
push!(f, Message(Val(:user), "Continue the story with ending $i."))
chatrequest!(f)
endTool Loop
Automated tool dispatch for both APIs. Wraps a tool schema with a callable function.
CallableTool
struct CallableTool{T}
tool::T # GPTTool or FunctionTool
callable::Function # (name::String, args::Dict{String,Any}) -> String
endto_tool
to_tool(x) # identity for GPTTool, FunctionTool, CallableTool; converts AbstractDict to GPTToolToolCallOutcome / ToolLoopResult
# Per-call record
struct ToolCallOutcome
tool_name::String
arguments::Dict{String,Any}
result::Union{GPTFunctionCallResult,Nothing}
success::Bool
error::Union{String,Nothing}
end
# Loop result
struct ToolLoopResult
response::LLMRequestResponse
tool_calls::Vector{ToolCallOutcome}
turns_used::Int
completed::Bool
llm_error::Union{String,Nothing}
endtool_loop! (Chat Completions)
tool_loop!(chat, dispatcher; max_turns=10, retries=0) -> ToolLoopResult
tool_loop!(chat; tools::Vector{<:CallableTool}, kwargs...) -> ToolLoopResulttool_loop (Responses API)
tool_loop(r::Respond, dispatcher; max_turns=10, retries=0) -> ToolLoopResult
tool_loop(r::Respond; max_turns=10, retries=0) -> ToolLoopResult # extracts callables from r.tools
tool_loop(input, dispatcher; tools, kwargs...) -> ToolLoopResult # convenience formMCP Client
Native MCP client (JSON-RPC 2.0 over stdio or HTTP, spec 2025-11-25).
Types
MCPSession # live connection — manages transport + cached tools/resources/prompts
MCPToolInfo # tool definition from tools/list
MCPResourceInfo # resource definition from resources/list
MCPPromptInfo # prompt definition from prompts/list
MCPServerCapabilities # capabilities from initialize
MCPTransport # abstract (subtypes: StdioTransport, HTTPTransport)
MCPError <: Exception # JSON-RPC error (code, message, data)Lifecycle
mcp_connect(command::Cmd; ...) -> MCPSession # stdio subprocess
mcp_connect(url::String; headers=[], ...) -> MCPSession # HTTP
mcp_connect(f::Function, args...; ...) # do-block, auto-disconnect
mcp_disconnect!(session)Discovery
list_tools!(session) -> Vector{MCPToolInfo}
list_resources!(session) -> Vector{MCPResourceInfo}
list_prompts!(session) -> Vector{MCPPromptInfo}Operations
call_tool(session, name, arguments) -> String
read_resource(session, uri) -> String
get_prompt(session, name, arguments) -> Vector{Dict}
ping(session)Tool Bridge
mcp_tools(session) -> Vector{CallableTool{GPTTool}} # for tool_loop!
mcp_tools_respond(session) -> Vector{CallableTool{FunctionTool}} # for tool_loopClient Example
session = mcp_connect(`npx -y @modelcontextprotocol/server-filesystem /tmp`)
tools = mcp_tools(session)
chat = Chat(model="gpt-5.2", tools=map(t -> t.tool, tools))
push!(chat, Message(Val(:user), "List files"))
result = tool_loop!(chat; tools)
mcp_disconnect!(session)MCP Server
Build MCP servers that expose tools, resources, and prompts.
Types
MCPServer(name, version; description=nothing)
MCPServerPrimitive # abstract (MCPServerTool, MCPServerResource, MCPServerResourceTemplate, MCPServerPrompt)Registration
register_tool!(server, name, description, schema, handler)
register_tool!(server, name, description, handler) # auto-schema from signature
register_tool!(server, ct::CallableTool{GPTTool}) # bridge from Chat API
register_tool!(server, ct::CallableTool{FunctionTool}) # bridge from Responses API
register_resource!(server, uri, name, handler; mime_type="text/plain", description=nothing)
register_resource_template!(server, uri_template, name, handler; ...)
register_prompt!(server, name, handler; description=nothing, arguments=[])Macros
@mcp_tool server function name(args...) body end
@mcp_resource server uri function(args...) body end
@mcp_prompt server name function(args...) body endServing
serve(server; transport=:stdio) # default — stdio
serve(server; transport=:http, host="127.0.0.1", port=8080) # HTTPServer Example
server = MCPServer("calc", "1.0.0")
@mcp_tool server function add(a::Float64, b::Float64)::String
string(a + b)
end
serve(server)Result Type Hierarchy
All API call results inherit from LLMRequestResponse:
LLMRequestResponse (abstract)
├── LLMSuccess — Chat Completions success (.message::Message, .self::Chat)
├── LLMFailure — Chat Completions HTTP error (.response::String, .status::Int, .self::Chat)
├── LLMCallError — Chat Completions exception (.error::String, .status, .self::Chat)
├── ResponseSuccess — Responses API success (.response::ResponseObject)
├── ResponseFailure — Responses API HTTP error (.response::String, .status::Int)
├── ResponseCallError — Responses API exception (.error::String, .status)
├── ImageSuccess — Image Gen success (.response::ImageResponse)
├── ImageFailure — Image Gen HTTP error (.response::String, .status::Int)
└── ImageCallError — Image Gen exception (.error::String, .status)Standard pattern-matching idiom:
result = chatrequest!(chat)
if result isa LLMSuccess
println(result.message.content)
elseif result isa LLMFailure
@error "HTTP $(result.status): $(result.response)"
elseif result isa LLMCallError
@error "Exception: $(result.error)"
end
result = respond("Hello")
if result isa ResponseSuccess
println(output_text(result))
elseif result isa ResponseFailure
@error "HTTP $(result.status)"
elseif result isa ResponseCallError
@error result.error
end
result = generate_image("A cat")
if result isa ImageSuccess
save_image(image_data(result)[1], "cat.png")
elseif result isa ImageFailure
@error "HTTP $(result.status)"
elseif result isa ImageCallError
@error result.error
endAPI Constants
const OPENAI_BASE_URL = "https://api.openai.com"
const CHAT_COMPLETIONS_PATH = "/v1/chat/completions"
const EMBEDDINGS_PATH = "/v1/embeddings"
const RESPONSES_PATH = "/v1/responses"
const IMAGES_GENERATIONS_PATH = "/v1/images/generations"
const GEMINI_CHAT_URL = "https://generativelanguage.googleapis.com/v1beta/openai/chat/completions"Exceptions
struct InvalidConversationError <: Exception
reason::String
endThrown by issendvalid / internal validation when conversation structure is invalid (e.g., missing system message position, consecutive same-role messages).
Complete Exports List
Chat Completions: Chat, Message, RoleSystem, RoleUser, RoleAssistant, GPTTool, GPTToolCall, GPTFunctionSignature, GPTFunctionCallResult, InvalidConversationError, issendvalid, chatrequest!, update!, ResponseFormat
Responses API: Respond, InputMessage, ResponseTool, FunctionTool, WebSearchTool, FileSearchTool, MCPTool, ComputerUseTool, ImageGenerationTool, CodeInterpreterTool, TextConfig, TextFormatSpec, Reasoning, ResponseObject, ResponseSuccess, ResponseFailure, ResponseCallError, respond, get_response, delete_response, list_input_items, cancel_response, compact_response, count_input_tokens, output_text, function_calls, input_text, input_image, input_file, function_tool, web_search, file_search, mcp_tool, computer_use, image_generation_tool, code_interpreter, text_format, json_schema_format, json_object_format
Image Generation: ImageGeneration, ImageObject, ImageResponse, ImageSuccess, ImageFailure, ImageCallError, generate_image, image_data, save_image
Embeddings: Embeddings, embeddingrequest!
Cost Tracking: TokenUsage, token_usage, estimated_cost, cumulative_cost, DEFAULT_PRICING
Service Endpoints: ServiceEndpoint, ServiceEndpointSpec, OPENAIServiceEndpoint, AZUREServiceEndpoint, GEMINIServiceEndpoint, GenericOpenAIEndpoint, OllamaEndpoint, MistralEndpoint, add_azure_deploy_name!
Forking: fork
Tool Loop: CallableTool, ToolCallOutcome, ToolLoopResult, tool_loop!, tool_loop, to_tool
MCP Client: MCPSession, MCPToolInfo, MCPResourceInfo, MCPPromptInfo, MCPServerCapabilities, MCPTransport, StdioTransport, HTTPTransport, MCPError, mcp_connect, mcp_disconnect!, mcp_tools, mcp_tools_respond, list_tools!, list_resources!, list_prompts!, call_tool, read_resource, get_prompt, ping
MCP Server: MCPServer, MCPServerTool, MCPServerResource, MCPServerResourceTemplate, MCPServerPrompt, MCPServerPrimitive, register_tool!, register_resource!, register_resource_template!, register_prompt!, serve, @mcp_tool, @mcp_resource, @mcp_prompt
Result Types: LLMRequestResponse, LLMSuccess, LLMFailure, LLMCallError