Responses API

The Responses API is OpenAI's newer, more flexible alternative to Chat Completions. Key advantages include built-in tools (web search, file search), stateless multi-turn via previous_response_id, and reasoning support for O-series models.

Basic Usage

The simplest call — just a string:

result = respond("Explain Julia's multiple dispatch in 2-3 sentences.")
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
Julia’s multiple dispatch means that when you call a function, Julia chooses which method to run based on the types of *all* the arguments, not just the first one. This makes it easy to write generic code and then add specialized, efficient methods for particular type combinations (e.g., different implementations for `+` on `Int`, `Float64`, or custom numeric types).
if result isa ResponseSuccess
    println("ID:     ", result.response.id)
    println("Status: ", result.response.status)
    println("Model:  ", result.response.model)
else
    println("No response metadata available")
end
ID:     resp_0b5a281a117550930069c2b81cf9e481979919a4f79465b821
Status: completed
Model:  gpt-5.2-2025-12-11

The Respond Type

For full control, construct a Respond object:

r = Respond(
    model="gpt-5.2",
    input="Explain monads simply",
    instructions="You are a functional programming expert. Be concise.",
    temperature=0.5,
    max_output_tokens=500,
)
println("Model: ", r.model)
println("Request body:")
println(JSON.json(r))
Model: gpt-5.2
Request body:
{"temperature":0.5,"max_output_tokens":500,"model":"gpt-5.2","input":"Explain monads simply","instructions":"You are a functional programming expert. Be concise."}

Instructions (System Prompt)

Unlike Chat Completions where you push a system Message, the Responses API uses the instructions parameter:

result = respond(
    "Translate to French: The quick brown fox jumps over the lazy dog.",
    instructions="You are a professional translator. Respond only with the translation."
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
Le rapide renard brun saute par-dessus le chien paresseux.

Structured Input

For multimodal inputs, use InputMessage with content helpers:

# Text-only structured input
msgs = [
    InputMessage(role="system", content="You analyze images."),
    InputMessage(role="user", content=[
        input_text("What do you see in this image?"),
        input_image("https://example.com/photo.jpg"),
    ]),
]
r = Respond(input=msgs, model="gpt-5.2")
println("Input is structured: ", r.input isa Vector)
println("Number of input messages: ", length(r.input))
Input is structured: true
Number of input messages: 2

Input Helpers

FunctionPurpose
input_textText content part
input_imageImage URL content part
input_fileFile (URL or ID) content part

Multi-Turn Conversations

Chain requests using previous_response_id — no need to re-send the full history:

r1 = respond("Tell me a one-liner programming joke.", instructions="Be concise.")
if r1 isa ResponseSuccess
    println(output_text(r1))
else
    println("Request failed — ", output_text(r1))
end
There are only 10 types of people in the world: those who understand binary and those who don’t.
if r1 isa ResponseSuccess
    r2 = respond("Explain why that's funny, in one sentence.", previous_response_id=r1.response.id)
    if r2 isa ResponseSuccess
        println(output_text(r2))
    else
        println("Request failed — ", output_text(r2))
    end
else
    println("Skipped — first request failed")
end
It’s funny because “10” looks like ten in decimal but equals two in binary, splitting people into those who get that and those who don’t.

Built-in Tools

result = respond(
    "What is the latest stable release of the Julia programming language?",
    tools=[web_search()]
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
The latest **stable** release of the Julia programming language is **Julia v1.12.5 (February 9, 2026)**. ([julialang.org](https://www.julialang.org/downloads/manual-downloads/))
result = respond(
    "Find information about error handling",
    tools=[file_search(["vs_abc123"])]
)

Function Tools

weather_tool = function_tool(
    "get_weather",
    "Get current weather for a location",
    parameters=Dict(
        "type" => "object",
        "properties" => Dict(
            "location" => Dict("type" => "string", "description" => "City name")
        ),
        "required" => ["location"]
    )
)
println("Tool name: ", weather_tool.name)
println("Tool JSON: ", JSON.json(JSON.lower(weather_tool)))
Tool name: get_weather
Tool JSON: {"type":"function","name":"get_weather","description":"Get current weather for a location","parameters":{"properties":{"location":{"type":"string","description":"City name"}},"required":["location"],"type":"object"}}
result = respond("What's the weather in Tokyo? Use celsius.", tools=[weather_tool])
calls = function_calls(result)
if !isempty(calls)
    println("Function: ", calls[1]["name"])
    println("Arguments: ", JSON.json(JSON.parse(calls[1]["arguments"]), 2))
else
    println("No function calls — ", output_text(result))
end
Function: get_weather
Arguments: {
  "location": "Tokyo"
}

Reasoning (O-Series Models)

For models like o3 that support extended reasoning:

r = Respond(
    input="Prove that √2 is irrational",
    model="o3",
    reasoning=Reasoning(effort="high", summary="detailed")
)
println("Model: ", r.model)
println("Reasoning effort: ", r.reasoning.effort)
println(JSON.json(r))
Model: o3
Reasoning effort: high
{"reasoning":{"effort":"high","summary":"detailed"},"model":"o3","input":"Prove that √2 is irrational"}

Structured Output

Force JSON-conformant output:

# JSON Schema format
fmt = json_schema_format(
    "colors",
    "A list of colors",
    Dict(
        "type" => "object",
        "properties" => Dict(
            "colors" => Dict(
                "type" => "array",
                "items" => Dict("type" => "string")
            )
        ),
        "required" => ["colors"],
        "additionalProperties" => false
    ),
    strict=true
)
println("Format type: ", fmt.format.type)
println("Schema name: ", fmt.format.name)
Format type: json_schema
Schema name: colors
result = respond("List 5 popular colors", text=fmt)
if result isa ResponseSuccess
    println(JSON.json(JSON.parse(output_text(result)), 2))
else
    println("Request failed — ", output_text(result))
end
{
  "colors": [
    "Red",
    "Blue",
    "Green",
    "Black",
    "White"
  ]
}

Response Accessors

result = respond("Hello!")

if result isa ResponseSuccess
    r = result.response

    output_text(result)      # full text output
    function_calls(result)   # Vector of function call Dicts (empty if none)

    r.id                     # "resp_00e791c8..."
    r.status                 # "completed"
    r.model                  # "gpt-5.2-2025-12-11"
    r.output                 # full output array
    r.usage                  # Dict with token counts
end

Managing Stored Responses

When you pass store=true, the response is saved on OpeAnI's servers and can be retrieved, inspected, or deleted later:

r = respond("Say 'stored response test' and nothing else.", store=true)
if r isa ResponseSuccess
    rid = r.response.id
    println("Stored response ID: ", rid)

    # Retrieve
    retrieved = get_response(rid)
    if retrieved isa ResponseSuccess
        println("Retrieved text: ", output_text(retrieved))
    end

    # List input items
    items = list_input_items(rid)
    if items isa Dict
        println("Input items: ", length(items["data"]))
    end

    # Delete
    del = delete_response(rid)
    if del isa Dict
        println("Deleted: ", del["deleted"])
    end
else
    println("Request failed — ", output_text(r))
end
Stored response ID: resp_0335ff51490d9b4a0069c2b834c3408194a874b2c54f293a9d
Retrieved text: stored response test
Input items: 1
Deleted: true

Metadata

Attach arbitrary key-value metadata to any request for tracking, filtering, or debugging:

result = respond(
    "Say 'metadata test' and nothing else.",
    metadata=Dict("env" => "docs", "request_id" => "demo_123")
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
metadata test

Service Tier

Control the processing tier for your request ("auto", "default", "flex", "priority"):

result = respond("Say 'tier test' and nothing else.", service_tier="auto")
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
tier test

Counting Input Tokens

Estimate token usage before making a full request — useful for cost estimation or verifying that input fits within the context window:

result = count_input_tokens(input="Tell me a joke about programming")
if result isa Dict
    println("Input tokens: ", result["input_tokens"])
else
    println("Request failed — see result for details")
end
Input tokens: 12

With tools and instructions:

tool = function_tool("search", "Search for information",
    parameters=Dict("type" => "object", "properties" => Dict(
        "query" => Dict("type" => "string")
    ), "required" => ["query"], "additionalProperties" => false),
    strict=true
)
result = count_input_tokens(
    input="Search for Julia language news",
    instructions="You are a helpful assistant.",
    tools=[tool]
)
if result isa Dict
    println("Tokens with tools: ", result["input_tokens"])
else
    println("Request failed — see result for details")
end
Tokens with tools: 54

Compacting Conversations

For long conversations, compact_response compresses the history into opaque, encrypted items that reduce token usage while preserving context:

items = [
    Dict("role" => "user", "content" => "Hello, I want to learn about Julia."),
    Dict("type" => "message", "role" => "assistant", "status" => "completed",
         "content" => [Dict("type" => "output_text",
            "text" => "Julia is a high-performance programming language for technical computing.")])
]
result = compact_response(input=items)
if result isa Dict
    println("Compact succeeded")
    println("Output items: ", length(result["output"]))
    println("Usage: ", result["usage"])
else
    println("Request failed — see result for details")
end
Compact succeeded
Output items: 2
Usage: Dict{String, Any}("input_tokens" => 125, "input_tokens_details" => Dict{String, Any}("cached_tokens" => 0), "output_tokens_details" => Dict{String, Any}("reasoning_tokens" => 0), "total_tokens" => 495, "output_tokens" => 370)

Cancelling Responses

Cancel an in-progress (background) response:

# Start a background response
result = respond("Write a very long essay about Julia", background=true)

# Cancel it
if result isa ResponseSuccess
    cancel_result = cancel_response(result.response.id)
    if cancel_result isa ResponseSuccess
        println("Cancelled: ", cancel_result.response.status)
    end
end

Parameters Reference

ParameterTypeDefaultDescription
modelString"gpt-5.2"Model to use
inputUnion{String, Vector}(required)String or Vector{InputMessage}
instructionsStringSystem-level instructions
toolsVectorAvailable tools (function, web search, file search)
tool_choiceString"auto", "none", "required"
parallel_tool_callsBoolAllow parallel tool calls
temperatureFloat640.0–2.0 (mutually exclusive with top_p)
top_pFloat640.0–1.0 (mutually exclusive with temperature)
max_output_tokensInt64Maximum tokens in the response
streamBoolEnable streaming
textTextConfigOutput format (text, jsonobject, jsonschema)
reasoningReasoningReasoning config for O-series models
truncationString"auto" or "disabled"
storeBoolStore response for later retrieval
metadataDictArbitrary key-value metadata
previous_response_idStringChain to a previous response for multi-turn
userStringEnd-user identifier
backgroundBoolRun in background (cancellable)
includeVector{String}Extra data to include (e.g. "file_search_call.results")
max_tool_callsInt64Max number of tool calls per turn
service_tierString"auto", "default", "flex", "priority"
top_logprobsInt640–20, top log probabilities
promptDictPrompt template reference
prompt_cache_keyStringCache key for prompt caching
prompt_cache_retentionString"in-memory" or "24h"
conversationAnyConversation context (String or Dict)
context_managementVectorContext management strategies
stream_optionsDictStreaming options (e.g. include_usage)

Retry Behaviour

respond automatically retries on HTTP 429, 500, and 503 errors with exponential backoff and jitter (up to 30 attempts, max 60s delay). On 429 responses, the Retry-After header is respected. This applies to all Responses API functions (respond, get_response, delete_response, etc.).

Parameter Validation

The Respond constructor validates parameter ranges at construction time:

ParameterValid Range
temperature0.0–2.0
top_p0.0–1.0
max_output_tokens≥ 1
top_logprobs0–20

Out-of-range values throw ArgumentError. Additionally, temperature and top_p are mutually exclusive.

See Also