Responses API
The Responses API is OpenAI's newer, more flexible alternative to Chat Completions. Key advantages include built-in tools (web search, file search), stateless multi-turn via previous_response_id, and reasoning support for O-series models.
Basic Usage
The simplest call — just a string:
result = respond("Explain Julia's multiple dispatch in 2-3 sentences.")
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endJulia’s multiple dispatch means that when you call a function, Julia chooses which method to run based on the types of *all* the arguments, not just the first one. This makes it easy to write generic code and then add specialized, efficient methods for particular type combinations (e.g., different implementations for `+` on `Int`, `Float64`, or custom numeric types).if result isa ResponseSuccess
println("ID: ", result.response.id)
println("Status: ", result.response.status)
println("Model: ", result.response.model)
else
println("No response metadata available")
endID: resp_0b5a281a117550930069c2b81cf9e481979919a4f79465b821
Status: completed
Model: gpt-5.2-2025-12-11The Respond Type
For full control, construct a Respond object:
r = Respond(
model="gpt-5.2",
input="Explain monads simply",
instructions="You are a functional programming expert. Be concise.",
temperature=0.5,
max_output_tokens=500,
)
println("Model: ", r.model)
println("Request body:")
println(JSON.json(r))Model: gpt-5.2
Request body:
{"temperature":0.5,"max_output_tokens":500,"model":"gpt-5.2","input":"Explain monads simply","instructions":"You are a functional programming expert. Be concise."}Instructions (System Prompt)
Unlike Chat Completions where you push a system Message, the Responses API uses the instructions parameter:
result = respond(
"Translate to French: The quick brown fox jumps over the lazy dog.",
instructions="You are a professional translator. Respond only with the translation."
)
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endLe rapide renard brun saute par-dessus le chien paresseux.Structured Input
For multimodal inputs, use InputMessage with content helpers:
# Text-only structured input
msgs = [
InputMessage(role="system", content="You analyze images."),
InputMessage(role="user", content=[
input_text("What do you see in this image?"),
input_image("https://example.com/photo.jpg"),
]),
]
r = Respond(input=msgs, model="gpt-5.2")
println("Input is structured: ", r.input isa Vector)
println("Number of input messages: ", length(r.input))Input is structured: true
Number of input messages: 2Input Helpers
| Function | Purpose |
|---|---|
input_text | Text content part |
input_image | Image URL content part |
input_file | File (URL or ID) content part |
Multi-Turn Conversations
Chain requests using previous_response_id — no need to re-send the full history:
r1 = respond("Tell me a one-liner programming joke.", instructions="Be concise.")
if r1 isa ResponseSuccess
println(output_text(r1))
else
println("Request failed — ", output_text(r1))
endThere are only 10 types of people in the world: those who understand binary and those who don’t.if r1 isa ResponseSuccess
r2 = respond("Explain why that's funny, in one sentence.", previous_response_id=r1.response.id)
if r2 isa ResponseSuccess
println(output_text(r2))
else
println("Request failed — ", output_text(r2))
end
else
println("Skipped — first request failed")
endIt’s funny because “10” looks like ten in decimal but equals two in binary, splitting people into those who get that and those who don’t.Built-in Tools
Web Search
result = respond(
"What is the latest stable release of the Julia programming language?",
tools=[web_search()]
)
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endThe latest **stable** release of the Julia programming language is **Julia v1.12.5 (February 9, 2026)**. ([julialang.org](https://www.julialang.org/downloads/manual-downloads/))File Search
result = respond(
"Find information about error handling",
tools=[file_search(["vs_abc123"])]
)Function Tools
weather_tool = function_tool(
"get_weather",
"Get current weather for a location",
parameters=Dict(
"type" => "object",
"properties" => Dict(
"location" => Dict("type" => "string", "description" => "City name")
),
"required" => ["location"]
)
)
println("Tool name: ", weather_tool.name)
println("Tool JSON: ", JSON.json(JSON.lower(weather_tool)))Tool name: get_weather
Tool JSON: {"type":"function","name":"get_weather","description":"Get current weather for a location","parameters":{"properties":{"location":{"type":"string","description":"City name"}},"required":["location"],"type":"object"}}result = respond("What's the weather in Tokyo? Use celsius.", tools=[weather_tool])
calls = function_calls(result)
if !isempty(calls)
println("Function: ", calls[1]["name"])
println("Arguments: ", JSON.json(JSON.parse(calls[1]["arguments"]), 2))
else
println("No function calls — ", output_text(result))
endFunction: get_weather
Arguments: {
"location": "Tokyo"
}Reasoning (O-Series Models)
For models like o3 that support extended reasoning:
r = Respond(
input="Prove that √2 is irrational",
model="o3",
reasoning=Reasoning(effort="high", summary="detailed")
)
println("Model: ", r.model)
println("Reasoning effort: ", r.reasoning.effort)
println(JSON.json(r))Model: o3
Reasoning effort: high
{"reasoning":{"effort":"high","summary":"detailed"},"model":"o3","input":"Prove that √2 is irrational"}Structured Output
Force JSON-conformant output:
# JSON Schema format
fmt = json_schema_format(
"colors",
"A list of colors",
Dict(
"type" => "object",
"properties" => Dict(
"colors" => Dict(
"type" => "array",
"items" => Dict("type" => "string")
)
),
"required" => ["colors"],
"additionalProperties" => false
),
strict=true
)
println("Format type: ", fmt.format.type)
println("Schema name: ", fmt.format.name)Format type: json_schema
Schema name: colorsresult = respond("List 5 popular colors", text=fmt)
if result isa ResponseSuccess
println(JSON.json(JSON.parse(output_text(result)), 2))
else
println("Request failed — ", output_text(result))
end{
"colors": [
"Red",
"Blue",
"Green",
"Black",
"White"
]
}Response Accessors
result = respond("Hello!")
if result isa ResponseSuccess
r = result.response
output_text(result) # full text output
function_calls(result) # Vector of function call Dicts (empty if none)
r.id # "resp_00e791c8..."
r.status # "completed"
r.model # "gpt-5.2-2025-12-11"
r.output # full output array
r.usage # Dict with token counts
endManaging Stored Responses
When you pass store=true, the response is saved on OpeAnI's servers and can be retrieved, inspected, or deleted later:
r = respond("Say 'stored response test' and nothing else.", store=true)
if r isa ResponseSuccess
rid = r.response.id
println("Stored response ID: ", rid)
# Retrieve
retrieved = get_response(rid)
if retrieved isa ResponseSuccess
println("Retrieved text: ", output_text(retrieved))
end
# List input items
items = list_input_items(rid)
if items isa Dict
println("Input items: ", length(items["data"]))
end
# Delete
del = delete_response(rid)
if del isa Dict
println("Deleted: ", del["deleted"])
end
else
println("Request failed — ", output_text(r))
endStored response ID: resp_0335ff51490d9b4a0069c2b834c3408194a874b2c54f293a9d
Retrieved text: stored response test
Input items: 1
Deleted: trueMetadata
Attach arbitrary key-value metadata to any request for tracking, filtering, or debugging:
result = respond(
"Say 'metadata test' and nothing else.",
metadata=Dict("env" => "docs", "request_id" => "demo_123")
)
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endmetadata testService Tier
Control the processing tier for your request ("auto", "default", "flex", "priority"):
result = respond("Say 'tier test' and nothing else.", service_tier="auto")
if result isa ResponseSuccess
println(output_text(result))
else
println("Request failed — ", output_text(result))
endtier testCounting Input Tokens
Estimate token usage before making a full request — useful for cost estimation or verifying that input fits within the context window:
result = count_input_tokens(input="Tell me a joke about programming")
if result isa Dict
println("Input tokens: ", result["input_tokens"])
else
println("Request failed — see result for details")
endInput tokens: 12With tools and instructions:
tool = function_tool("search", "Search for information",
parameters=Dict("type" => "object", "properties" => Dict(
"query" => Dict("type" => "string")
), "required" => ["query"], "additionalProperties" => false),
strict=true
)
result = count_input_tokens(
input="Search for Julia language news",
instructions="You are a helpful assistant.",
tools=[tool]
)
if result isa Dict
println("Tokens with tools: ", result["input_tokens"])
else
println("Request failed — see result for details")
endTokens with tools: 54Compacting Conversations
For long conversations, compact_response compresses the history into opaque, encrypted items that reduce token usage while preserving context:
items = [
Dict("role" => "user", "content" => "Hello, I want to learn about Julia."),
Dict("type" => "message", "role" => "assistant", "status" => "completed",
"content" => [Dict("type" => "output_text",
"text" => "Julia is a high-performance programming language for technical computing.")])
]
result = compact_response(input=items)
if result isa Dict
println("Compact succeeded")
println("Output items: ", length(result["output"]))
println("Usage: ", result["usage"])
else
println("Request failed — see result for details")
endCompact succeeded
Output items: 2
Usage: Dict{String, Any}("input_tokens" => 125, "input_tokens_details" => Dict{String, Any}("cached_tokens" => 0), "output_tokens_details" => Dict{String, Any}("reasoning_tokens" => 0), "total_tokens" => 495, "output_tokens" => 370)Cancelling Responses
Cancel an in-progress (background) response:
# Start a background response
result = respond("Write a very long essay about Julia", background=true)
# Cancel it
if result isa ResponseSuccess
cancel_result = cancel_response(result.response.id)
if cancel_result isa ResponseSuccess
println("Cancelled: ", cancel_result.response.status)
end
endParameters Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
model | String | "gpt-5.2" | Model to use |
input | Union{String, Vector} | (required) | String or Vector{InputMessage} |
instructions | String | — | System-level instructions |
tools | Vector | — | Available tools (function, web search, file search) |
tool_choice | String | — | "auto", "none", "required" |
parallel_tool_calls | Bool | — | Allow parallel tool calls |
temperature | Float64 | — | 0.0–2.0 (mutually exclusive with top_p) |
top_p | Float64 | — | 0.0–1.0 (mutually exclusive with temperature) |
max_output_tokens | Int64 | — | Maximum tokens in the response |
stream | Bool | — | Enable streaming |
text | TextConfig | — | Output format (text, jsonobject, jsonschema) |
reasoning | Reasoning | — | Reasoning config for O-series models |
truncation | String | — | "auto" or "disabled" |
store | Bool | — | Store response for later retrieval |
metadata | Dict | — | Arbitrary key-value metadata |
previous_response_id | String | — | Chain to a previous response for multi-turn |
user | String | — | End-user identifier |
background | Bool | — | Run in background (cancellable) |
include | Vector{String} | — | Extra data to include (e.g. "file_search_call.results") |
max_tool_calls | Int64 | — | Max number of tool calls per turn |
service_tier | String | — | "auto", "default", "flex", "priority" |
top_logprobs | Int64 | — | 0–20, top log probabilities |
prompt | Dict | — | Prompt template reference |
prompt_cache_key | String | — | Cache key for prompt caching |
prompt_cache_retention | String | — | "in-memory" or "24h" |
conversation | Any | — | Conversation context (String or Dict) |
context_management | Vector | — | Context management strategies |
stream_options | Dict | — | Streaming options (e.g. include_usage) |
Retry Behaviour
respond automatically retries on HTTP 429, 500, and 503 errors with exponential backoff and jitter (up to 30 attempts, max 60s delay). On 429 responses, the Retry-After header is respected. This applies to all Responses API functions (respond, get_response, delete_response, etc.).
Parameter Validation
The Respond constructor validates parameter ranges at construction time:
| Parameter | Valid Range |
|---|---|
temperature | 0.0–2.0 |
top_p | 0.0–1.0 |
max_output_tokens | ≥ 1 |
top_logprobs | 0–20 |
Out-of-range values throw ArgumentError. Additionally, temperature and top_p are mutually exclusive.
See Also
Respond— full type referenceResponseObject— response structure- Tool Calling — detailed tool calling guide
- Streaming — streaming with
do-blocks