Responses API
The Responses API is OpenAI's newer, more flexible alternative to Chat Completions. Key advantages include built-in tools (web search, file search), stateless multi-turn via previous_response_id, and reasoning support for O-series models.
Basic Usage
The simplest call — just a string:
result = respond("Explain Julia's multiple dispatch in 2-3 sentences.")
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endJulia uses multiple dispatch: when you call a function, Julia chooses which method to run based on the types of *all* the arguments, not just the first one. You can define several methods with the same function name but different type signatures, and Julia picks the most specific match at runtime. This makes it easy to write generic code that becomes fast and specialized for different combinations of input types.if result isa ResponseSuccess
println("ID: ", result.response.id)
println("Status: ", result.response.status)
println("Model: ", result.response.model)
else
println("No response metadata available")
endID: resp_029c615b652fbe390069c7ae15125881908a991da02d6e249f
Status: completed
Model: gpt-5.2-2025-12-11The Respond Type
For full control, construct a Respond object:
r = Respond(
model="gpt-5.2",
input="Explain monads simply",
instructions="You are a functional programming expert. Be concise.",
temperature=0.5,
max_output_tokens=500,
)
println("Model: ", r.model)
println("Request body:")
println(JSON.json(r))Model: gpt-5.2
Request body:
{"temperature":0.5,"max_output_tokens":500,"model":"gpt-5.2","input":"Explain monads simply","instructions":"You are a functional programming expert. Be concise."}Instructions (System Prompt)
Unlike Chat Completions where you push a system Message, the Responses API uses the instructions parameter:
result = respond(
"Translate to French: The quick brown fox jumps over the lazy dog.",
instructions="You are a professional translator. Respond only with the translation."
)
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endLe rapide renard brun saute par-dessus le chien paresseux.Structured Input
For multimodal inputs, use InputMessage with content helpers:
# Text-only structured input
msgs = [
InputMessage(role="system", content="You analyze images."),
InputMessage(role="user", content=[
input_text("What do you see in this image?"),
input_image("https://example.com/photo.jpg"),
]),
]
r = Respond(input=msgs, model="gpt-5.2")
println("Input is structured: ", r.input isa Vector)
println("Number of input messages: ", length(r.input))Input is structured: true
Number of input messages: 2Input Helpers
| Function | Purpose |
|---|---|
input_text | Text content part |
input_image | Image URL content part |
input_file | File (URL or ID) content part |
Multi-Turn Conversations
Chain requests using previous_response_id — no need to re-send the full history:
r1 = respond("Tell me a one-liner programming joke.", instructions="Be concise.")
if r1 isa ResponseSuccess
println(output_text(r1))
else
println("Request failed — ", output_text(r1))
endThere are only 10 kinds of people in the world: those who understand binary and those who don’t.if r1 isa ResponseSuccess
r2 = respond("Explain why that's funny, in one sentence.", previous_response_id=r1.response.id)
if r2 isa ResponseSuccess
println(output_text(r2))
else
println("Request failed — ", output_text(r2))
end
else
println("Skipped — first request failed")
endIt’s funny because “10” is binary for 2, so the line itself only makes sense if you understand binary—creating a self-referential nerdy twist.Built-in Tools
Web Search
result = respond(
"What is the latest stable release of the Julia programming language?",
tools=[web_search()]
)
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endThe latest **stable** release of the Julia programming language is **Julia 1.12.4**. ([julialang.org](https://julialang.org/blog/2026/01/this-month-in-julia-world/))File Search
result = respond(
"Find information about error handling",
tools=[file_search(["vs_abc123"])]
)Function Tools
weather_tool = function_tool(
"get_weather",
"Get current weather for a location",
parameters=Dict(
"type" => "object",
"properties" => Dict(
"location" => Dict("type" => "string", "description" => "City name")
),
"required" => ["location"]
)
)
println("Tool name: ", weather_tool.name)
println("Tool JSON: ", JSON.json(JSON.lower(weather_tool)))Tool name: get_weather
Tool JSON: {"type":"function","name":"get_weather","description":"Get current weather for a location","parameters":{"properties":{"location":{"type":"string","description":"City name"}},"required":["location"],"type":"object"}}result = respond("What's the weather in Tokyo? Use celsius.", tools=[weather_tool])
calls = function_calls(result)
if !isempty(calls)
println("Function: ", calls[1]["name"])
println("Arguments: ", JSON.json(JSON.parse(calls[1]["arguments"]), 2))
else
println("No function calls — ", output_text(result))
endFunction: get_weather
Arguments: {
"location": "Tokyo"
}Reasoning (O-Series Models)
For models like o3 that support extended reasoning:
r = Respond(
input="Prove that √2 is irrational",
model="o3",
reasoning=Reasoning(effort="high", summary="detailed")
)
println("Model: ", r.model)
println("Reasoning effort: ", r.reasoning.effort)
println(JSON.json(r))Model: o3
Reasoning effort: high
{"reasoning":{"effort":"high","summary":"detailed"},"model":"o3","input":"Prove that √2 is irrational"}Structured Output
Force JSON-conformant output:
# JSON Schema format
fmt = json_schema_format(
"colors",
"A list of colors",
Dict(
"type" => "object",
"properties" => Dict(
"colors" => Dict(
"type" => "array",
"items" => Dict("type" => "string")
)
),
"required" => ["colors"],
"additionalProperties" => false
),
strict=true
)
println("Format type: ", fmt.format.type)
println("Schema name: ", fmt.format.name)Format type: json_schema
Schema name: colorsresult = respond("List 5 popular colors", text=fmt)
if result isa ResponseSuccess
println(JSON.json(JSON.parse(output_text(result)), 2))
else
println("Request failed — ", output_text(result))
end{
"colors": [
"Red",
"Blue",
"Green",
"Black",
"White"
]
}Response Accessors
result = respond("Hello!")
if result isa ResponseSuccess
r = result.response
output_text(result) # full text output
function_calls(result) # Vector of function call Dicts (empty if none)
r.id # "resp_00e791c8..."
r.status # "completed"
r.model # "gpt-5.2-2025-12-11"
r.output # full output array
r.usage # Dict with token counts
endManaging Stored Responses
When you pass store=true, the response is saved on OpeAnI's servers and can be retrieved, inspected, or deleted later:
r = respond("Say 'stored response test' and nothing else.", store=true)
if r isa ResponseSuccess
rid = r.response.id
println("Stored response ID: ", rid)
# Retrieve
retrieved = get_response(rid)
if retrieved isa ResponseSuccess
println("Retrieved text: ", output_text(retrieved))
end
# List input items
items = list_input_items(rid)
if items isa Dict
println("Input items: ", length(items["data"]))
end
# Delete
del = delete_response(rid)
if del isa Dict
println("Deleted: ", del["deleted"])
end
else
println("Request failed — ", output_text(r))
endStored response ID: resp_059c532252752d140069c7ae24c28881a180e4cab2d6d05cd7
Retrieved text: stored response test
Input items: 1
Deleted: trueMetadata
Attach arbitrary key-value metadata to any request for tracking, filtering, or debugging:
result = respond(
"Say 'metadata test' and nothing else.",
metadata=Dict("env" => "docs", "request_id" => "demo_123")
)
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endmetadata testService Tier
Control the processing tier for your request ("auto", "default", "flex", "priority"):
result = respond("Say 'tier test' and nothing else.", service_tier="auto")
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endtier testCounting Input Tokens
Estimate token usage before making a full request — useful for cost estimation or verifying that input fits within the context window:
result = count_input_tokens(input="Tell me a joke about programming")
if result isa Dict
println("Input tokens: ", result["input_tokens"])
else
println("Request failed — see result for details")
endInput tokens: 12With tools and instructions:
tool = function_tool("search", "Search for information",
parameters=Dict("type" => "object", "properties" => Dict(
"query" => Dict("type" => "string")
), "required" => ["query"], "additionalProperties" => false),
strict=true
)
result = count_input_tokens(
input="Search for Julia language news",
instructions="You are a helpful assistant.",
tools=[tool]
)
if result isa Dict
println("Tokens with tools: ", result["input_tokens"])
else
println("Request failed — see result for details")
endTokens with tools: 54Compacting Conversations
For long conversations, compact_response compresses the history into opaque, encrypted items that reduce token usage while preserving context:
items = [
Dict("role" => "user", "content" => "Hello, I want to learn about Julia."),
Dict("type" => "message", "role" => "assistant", "status" => "completed",
"content" => [Dict("type" => "output_text",
"text" => "Julia is a high-performance programming language for technical computing.")])
]
result = compact_response(input=items)
if result isa Dict
println("Compact succeeded")
println("Output items: ", length(result["output"]))
println("Usage: ", result["usage"])
else
println("Request failed — see result for details")
endCompact succeeded
Output items: 2
Usage: Dict{String, Any}("input_tokens" => 125, "input_tokens_details" => Dict{String, Any}("cached_tokens" => 0), "output_tokens_details" => Dict{String, Any}("reasoning_tokens" => 0), "total_tokens" => 537, "output_tokens" => 412)Cancelling Responses
Cancel an in-progress (background) response:
# Start a background response
result = respond("Write a very long essay about Julia", background=true)
# Cancel it
if result isa ResponseSuccess
cancel_result = cancel_response(result.response.id)
if cancel_result isa ResponseSuccess
println("Cancelled: ", cancel_result.response.status)
end
endParameters Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
model | String | "gpt-5.2" | Model to use |
input | Union{String, Vector} | (required) | String or Vector{InputMessage} |
instructions | String | — | System-level instructions |
tools | Vector | — | Available tools (function, web search, file search) |
tool_choice | String | — | "auto", "none", "required" |
parallel_tool_calls | Bool | — | Allow parallel tool calls |
temperature | Float64 | — | 0.0–2.0 (mutually exclusive with top_p) |
top_p | Float64 | — | 0.0–1.0 (mutually exclusive with temperature) |
max_output_tokens | Int64 | — | Maximum tokens in the response |
stream | Bool | — | Enable streaming |
text | TextConfig | — | Output format (text, jsonobject, jsonschema) |
reasoning | Reasoning | — | Reasoning config for O-series models |
truncation | String | — | "auto" or "disabled" |
store | Bool | — | Store response for later retrieval |
metadata | Dict | — | Arbitrary key-value metadata |
previous_response_id | String | — | Chain to a previous response for multi-turn |
user | String | — | End-user identifier |
background | Bool | — | Run in background (cancellable) |
include | Vector{String} | — | Extra data to include (e.g. "file_search_call.results") |
max_tool_calls | Int64 | — | Max number of tool calls per turn |
service_tier | String | — | "auto", "default", "flex", "priority" |
top_logprobs | Int64 | — | 0–20, top log probabilities |
prompt | Dict | — | Prompt template reference |
prompt_cache_key | String | — | Cache key for prompt caching |
prompt_cache_retention | String | — | "in-memory" or "24h" |
conversation | Any | — | Conversation context (String or Dict) |
context_management | Vector | — | Context management strategies |
stream_options | Dict | — | Streaming options (e.g. include_usage) |
Retry Behaviour
respond automatically retries on HTTP 429, 500, and 503 errors with exponential backoff and jitter (up to 30 attempts, max 60s delay). On 429 responses, the Retry-After header is respected. This applies to all Responses API functions (respond, get_response, delete_response, etc.).
Parameter Validation
The Respond constructor validates parameter ranges at construction time:
| Parameter | Valid Range |
|---|---|
temperature | 0.0–2.0 |
top_p | 0.0–1.0 |
max_output_tokens | ≥ 1 |
top_logprobs | 0–20 |
Out-of-range values throw ArgumentError. Additionally, temperature and top_p are mutually exclusive.
See Also
Respond— full type referenceResponseObject— response structure- Tool Calling — detailed tool calling guide
- Streaming — streaming with
do-blocks