Responses API

The Responses API is OpenAI's newer, more flexible alternative to Chat Completions. Key advantages include built-in tools (web search, file search), stateless multi-turn via previous_response_id, and reasoning support for O-series models.

Basic Usage

The simplest call — just a string:

result = respond("Explain Julia's multiple dispatch in 2-3 sentences.")
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
Julia uses multiple dispatch: when you call a function, Julia chooses which method to run based on the types of *all* the arguments, not just the first one. You can define several methods with the same function name but different type signatures, and Julia picks the most specific match at runtime. This makes it easy to write generic code that becomes fast and specialized for different combinations of input types.
if result isa ResponseSuccess
    println("ID:     ", result.response.id)
    println("Status: ", result.response.status)
    println("Model:  ", result.response.model)
else
    println("No response metadata available")
end
ID:     resp_029c615b652fbe390069c7ae15125881908a991da02d6e249f
Status: completed
Model:  gpt-5.2-2025-12-11

The Respond Type

For full control, construct a Respond object:

r = Respond(
    model="gpt-5.2",
    input="Explain monads simply",
    instructions="You are a functional programming expert. Be concise.",
    temperature=0.5,
    max_output_tokens=500,
)
println("Model: ", r.model)
println("Request body:")
println(JSON.json(r))
Model: gpt-5.2
Request body:
{"temperature":0.5,"max_output_tokens":500,"model":"gpt-5.2","input":"Explain monads simply","instructions":"You are a functional programming expert. Be concise."}

Instructions (System Prompt)

Unlike Chat Completions where you push a system Message, the Responses API uses the instructions parameter:

result = respond(
    "Translate to French: The quick brown fox jumps over the lazy dog.",
    instructions="You are a professional translator. Respond only with the translation."
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
Le rapide renard brun saute par-dessus le chien paresseux.

Structured Input

For multimodal inputs, use InputMessage with content helpers:

# Text-only structured input
msgs = [
    InputMessage(role="system", content="You analyze images."),
    InputMessage(role="user", content=[
        input_text("What do you see in this image?"),
        input_image("https://example.com/photo.jpg"),
    ]),
]
r = Respond(input=msgs, model="gpt-5.2")
println("Input is structured: ", r.input isa Vector)
println("Number of input messages: ", length(r.input))
Input is structured: true
Number of input messages: 2

Input Helpers

FunctionPurpose
input_textText content part
input_imageImage URL content part
input_fileFile (URL or ID) content part

Multi-Turn Conversations

Chain requests using previous_response_id — no need to re-send the full history:

r1 = respond("Tell me a one-liner programming joke.", instructions="Be concise.")
if r1 isa ResponseSuccess
    println(output_text(r1))
else
    println("Request failed — ", output_text(r1))
end
There are only 10 kinds of people in the world: those who understand binary and those who don’t.
if r1 isa ResponseSuccess
    r2 = respond("Explain why that's funny, in one sentence.", previous_response_id=r1.response.id)
    if r2 isa ResponseSuccess
        println(output_text(r2))
    else
        println("Request failed — ", output_text(r2))
    end
else
    println("Skipped — first request failed")
end
It’s funny because “10” is binary for 2, so the line itself only makes sense if you understand binary—creating a self-referential nerdy twist.

Built-in Tools

result = respond(
    "What is the latest stable release of the Julia programming language?",
    tools=[web_search()]
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
The latest **stable** release of the Julia programming language is **Julia 1.12.4**. ([julialang.org](https://julialang.org/blog/2026/01/this-month-in-julia-world/))
result = respond(
    "Find information about error handling",
    tools=[file_search(["vs_abc123"])]
)

Function Tools

weather_tool = function_tool(
    "get_weather",
    "Get current weather for a location",
    parameters=Dict(
        "type" => "object",
        "properties" => Dict(
            "location" => Dict("type" => "string", "description" => "City name")
        ),
        "required" => ["location"]
    )
)
println("Tool name: ", weather_tool.name)
println("Tool JSON: ", JSON.json(JSON.lower(weather_tool)))
Tool name: get_weather
Tool JSON: {"type":"function","name":"get_weather","description":"Get current weather for a location","parameters":{"properties":{"location":{"type":"string","description":"City name"}},"required":["location"],"type":"object"}}
result = respond("What's the weather in Tokyo? Use celsius.", tools=[weather_tool])
calls = function_calls(result)
if !isempty(calls)
    println("Function: ", calls[1]["name"])
    println("Arguments: ", JSON.json(JSON.parse(calls[1]["arguments"]), 2))
else
    println("No function calls — ", output_text(result))
end
Function: get_weather
Arguments: {
  "location": "Tokyo"
}

Reasoning (O-Series Models)

For models like o3 that support extended reasoning:

r = Respond(
    input="Prove that √2 is irrational",
    model="o3",
    reasoning=Reasoning(effort="high", summary="detailed")
)
println("Model: ", r.model)
println("Reasoning effort: ", r.reasoning.effort)
println(JSON.json(r))
Model: o3
Reasoning effort: high
{"reasoning":{"effort":"high","summary":"detailed"},"model":"o3","input":"Prove that √2 is irrational"}

Structured Output

Force JSON-conformant output:

# JSON Schema format
fmt = json_schema_format(
    "colors",
    "A list of colors",
    Dict(
        "type" => "object",
        "properties" => Dict(
            "colors" => Dict(
                "type" => "array",
                "items" => Dict("type" => "string")
            )
        ),
        "required" => ["colors"],
        "additionalProperties" => false
    ),
    strict=true
)
println("Format type: ", fmt.format.type)
println("Schema name: ", fmt.format.name)
Format type: json_schema
Schema name: colors
result = respond("List 5 popular colors", text=fmt)
if result isa ResponseSuccess
    println(JSON.json(JSON.parse(output_text(result)), 2))
else
    println("Request failed — ", output_text(result))
end
{
  "colors": [
    "Red",
    "Blue",
    "Green",
    "Black",
    "White"
  ]
}

Response Accessors

result = respond("Hello!")

if result isa ResponseSuccess
    r = result.response

    output_text(result)      # full text output
    function_calls(result)   # Vector of function call Dicts (empty if none)

    r.id                     # "resp_00e791c8..."
    r.status                 # "completed"
    r.model                  # "gpt-5.2-2025-12-11"
    r.output                 # full output array
    r.usage                  # Dict with token counts
end

Managing Stored Responses

When you pass store=true, the response is saved on OpeAnI's servers and can be retrieved, inspected, or deleted later:

r = respond("Say 'stored response test' and nothing else.", store=true)
if r isa ResponseSuccess
    rid = r.response.id
    println("Stored response ID: ", rid)

    # Retrieve
    retrieved = get_response(rid)
    if retrieved isa ResponseSuccess
        println("Retrieved text: ", output_text(retrieved))
    end

    # List input items
    items = list_input_items(rid)
    if items isa Dict
        println("Input items: ", length(items["data"]))
    end

    # Delete
    del = delete_response(rid)
    if del isa Dict
        println("Deleted: ", del["deleted"])
    end
else
    println("Request failed — ", output_text(r))
end
Stored response ID: resp_059c532252752d140069c7ae24c28881a180e4cab2d6d05cd7
Retrieved text: stored response test
Input items: 1
Deleted: true

Metadata

Attach arbitrary key-value metadata to any request for tracking, filtering, or debugging:

result = respond(
    "Say 'metadata test' and nothing else.",
    metadata=Dict("env" => "docs", "request_id" => "demo_123")
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
metadata test

Service Tier

Control the processing tier for your request ("auto", "default", "flex", "priority"):

result = respond("Say 'tier test' and nothing else.", service_tier="auto")
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end
tier test

Counting Input Tokens

Estimate token usage before making a full request — useful for cost estimation or verifying that input fits within the context window:

result = count_input_tokens(input="Tell me a joke about programming")
if result isa Dict
    println("Input tokens: ", result["input_tokens"])
else
    println("Request failed — see result for details")
end
Input tokens: 12

With tools and instructions:

tool = function_tool("search", "Search for information",
    parameters=Dict("type" => "object", "properties" => Dict(
        "query" => Dict("type" => "string")
    ), "required" => ["query"], "additionalProperties" => false),
    strict=true
)
result = count_input_tokens(
    input="Search for Julia language news",
    instructions="You are a helpful assistant.",
    tools=[tool]
)
if result isa Dict
    println("Tokens with tools: ", result["input_tokens"])
else
    println("Request failed — see result for details")
end
Tokens with tools: 54

Compacting Conversations

For long conversations, compact_response compresses the history into opaque, encrypted items that reduce token usage while preserving context:

items = [
    Dict("role" => "user", "content" => "Hello, I want to learn about Julia."),
    Dict("type" => "message", "role" => "assistant", "status" => "completed",
         "content" => [Dict("type" => "output_text",
            "text" => "Julia is a high-performance programming language for technical computing.")])
]
result = compact_response(input=items)
if result isa Dict
    println("Compact succeeded")
    println("Output items: ", length(result["output"]))
    println("Usage: ", result["usage"])
else
    println("Request failed — see result for details")
end
Compact succeeded
Output items: 2
Usage: Dict{String, Any}("input_tokens" => 125, "input_tokens_details" => Dict{String, Any}("cached_tokens" => 0), "output_tokens_details" => Dict{String, Any}("reasoning_tokens" => 0), "total_tokens" => 537, "output_tokens" => 412)

Cancelling Responses

Cancel an in-progress (background) response:

# Start a background response
result = respond("Write a very long essay about Julia", background=true)

# Cancel it
if result isa ResponseSuccess
    cancel_result = cancel_response(result.response.id)
    if cancel_result isa ResponseSuccess
        println("Cancelled: ", cancel_result.response.status)
    end
end

Parameters Reference

ParameterTypeDefaultDescription
modelString"gpt-5.2"Model to use
inputUnion{String, Vector}(required)String or Vector{InputMessage}
instructionsStringSystem-level instructions
toolsVectorAvailable tools (function, web search, file search)
tool_choiceString"auto", "none", "required"
parallel_tool_callsBoolAllow parallel tool calls
temperatureFloat640.0–2.0 (mutually exclusive with top_p)
top_pFloat640.0–1.0 (mutually exclusive with temperature)
max_output_tokensInt64Maximum tokens in the response
streamBoolEnable streaming
textTextConfigOutput format (text, jsonobject, jsonschema)
reasoningReasoningReasoning config for O-series models
truncationString"auto" or "disabled"
storeBoolStore response for later retrieval
metadataDictArbitrary key-value metadata
previous_response_idStringChain to a previous response for multi-turn
userStringEnd-user identifier
backgroundBoolRun in background (cancellable)
includeVector{String}Extra data to include (e.g. "file_search_call.results")
max_tool_callsInt64Max number of tool calls per turn
service_tierString"auto", "default", "flex", "priority"
top_logprobsInt640–20, top log probabilities
promptDictPrompt template reference
prompt_cache_keyStringCache key for prompt caching
prompt_cache_retentionString"in-memory" or "24h"
conversationAnyConversation context (String or Dict)
context_managementVectorContext management strategies
stream_optionsDictStreaming options (e.g. include_usage)

Retry Behaviour

respond automatically retries on HTTP 429, 500, and 503 errors with exponential backoff and jitter (up to 30 attempts, max 60s delay). On 429 responses, the Retry-After header is respected. This applies to all Responses API functions (respond, get_response, delete_response, etc.).

Parameter Validation

The Respond constructor validates parameter ranges at construction time:

ParameterValid Range
temperature0.0–2.0
top_p0.0–1.0
max_output_tokens≥ 1
top_logprobs0–20

Out-of-range values throw ArgumentError. Additionally, temperature and top_p are mutually exclusive.

See Also