Responses API

The Responses API is OpenAI's newer, more flexible alternative to Chat Completions. Key advantages include built-in tools (web search, file search), stateless multi-turn via previous_response_id, and reasoning support for O-series models.

Basic Usage

The simplest call — just a string:

result = respond("Explain Julia's multiple dispatch in 2-3 sentences.")
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end

Julia uses multiple dispatch: when you call a function, Julia chooses which method to run based on the types of *all* the arguments, not just the first one. You can define several methods with the same function name but different type signatures, and Julia picks the most specific match at runtime. This makes it easy to write generic code that becomes fast and specialized for different combinations of input types.

if result isa ResponseSuccess
    println("ID:     ", result.response.id)
    println("Status: ", result.response.status)
    println("Model:  ", result.response.model)
else
    println("No response metadata available")
end

ID:     resp_029c615b652fbe390069c7ae15125881908a991da02d6e249f
Status: completed
Model:  gpt-5.2-2025-12-11

The `Respond` Type

For full control, construct a Respond object:

r = Respond(
    model="gpt-5.2",
    input="Explain monads simply",
    instructions="You are a functional programming expert. Be concise.",
    temperature=0.5,
    max_output_tokens=500,
)
println("Model: ", r.model)
println("Request body:")
println(JSON.json(r))

Model: gpt-5.2
Request body:
{"temperature":0.5,"max_output_tokens":500,"model":"gpt-5.2","input":"Explain monads simply","instructions":"You are a functional programming expert. Be concise."}

Instructions (System Prompt)

Unlike Chat Completions where you push a system Message, the Responses API uses the instructions parameter:

result = respond(
    "Translate to French: The quick brown fox jumps over the lazy dog.",
    instructions="You are a professional translator. Respond only with the translation."
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end

Le rapide renard brun saute par-dessus le chien paresseux.

Structured Input

For multimodal inputs, use InputMessage with content helpers:

# Text-only structured input
msgs = [
    InputMessage(role="system", content="You analyze images."),
    InputMessage(role="user", content=[
        input_text("What do you see in this image?"),
        input_image("https://example.com/photo.jpg"),
    ]),
]
r = Respond(input=msgs, model="gpt-5.2")
println("Input is structured: ", r.input isa Vector)
println("Number of input messages: ", length(r.input))

Input is structured: true
Number of input messages: 2

Input Helpers

Function	Purpose
`input_text`	Text content part
`input_image`	Image URL content part
`input_file`	File (URL or ID) content part

Multi-Turn Conversations

Chain requests using previous_response_id — no need to re-send the full history:

r1 = respond("Tell me a one-liner programming joke.", instructions="Be concise.")
if r1 isa ResponseSuccess
    println(output_text(r1))
else
    println("Request failed — ", output_text(r1))
end

There are only 10 kinds of people in the world: those who understand binary and those who don’t.

if r1 isa ResponseSuccess
    r2 = respond("Explain why that's funny, in one sentence.", previous_response_id=r1.response.id)
    if r2 isa ResponseSuccess
        println(output_text(r2))
    else
        println("Request failed — ", output_text(r2))
    end
else
    println("Skipped — first request failed")
end

It’s funny because “10” is binary for 2, so the line itself only makes sense if you understand binary—creating a self-referential nerdy twist.

Built-in Tools

Web Search

result = respond(
    "What is the latest stable release of the Julia programming language?",
    tools=[web_search()]
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end

The latest **stable** release of the Julia programming language is **Julia 1.12.4**. ([julialang.org](https://julialang.org/blog/2026/01/this-month-in-julia-world/))

File Search

result = respond(
    "Find information about error handling",
    tools=[file_search(["vs_abc123"])]
)

Function Tools

weather_tool = function_tool(
    "get_weather",
    "Get current weather for a location",
    parameters=Dict(
        "type" => "object",
        "properties" => Dict(
            "location" => Dict("type" => "string", "description" => "City name")
        ),
        "required" => ["location"]
    )
)
println("Tool name: ", weather_tool.name)
println("Tool JSON: ", JSON.json(JSON.lower(weather_tool)))

Tool name: get_weather
Tool JSON: {"type":"function","name":"get_weather","description":"Get current weather for a location","parameters":{"properties":{"location":{"type":"string","description":"City name"}},"required":["location"],"type":"object"}}

result = respond("What's the weather in Tokyo? Use celsius.", tools=[weather_tool])
calls = function_calls(result)
if !isempty(calls)
    println("Function: ", calls[1]["name"])
    println("Arguments: ", JSON.json(JSON.parse(calls[1]["arguments"]), 2))
else
    println("No function calls — ", output_text(result))
end

Function: get_weather
Arguments: {
  "location": "Tokyo"
}

Reasoning (O-Series Models)

For models like o3 that support extended reasoning:

r = Respond(
    input="Prove that √2 is irrational",
    model="o3",
    reasoning=Reasoning(effort="high", summary="detailed")
)
println("Model: ", r.model)
println("Reasoning effort: ", r.reasoning.effort)
println(JSON.json(r))

Model: o3
Reasoning effort: high
{"reasoning":{"effort":"high","summary":"detailed"},"model":"o3","input":"Prove that √2 is irrational"}

Structured Output

Force JSON-conformant output:

# JSON Schema format
fmt = json_schema_format(
    "colors",
    "A list of colors",
    Dict(
        "type" => "object",
        "properties" => Dict(
            "colors" => Dict(
                "type" => "array",
                "items" => Dict("type" => "string")
            )
        ),
        "required" => ["colors"],
        "additionalProperties" => false
    ),
    strict=true
)
println("Format type: ", fmt.format.type)
println("Schema name: ", fmt.format.name)

Format type: json_schema
Schema name: colors

result = respond("List 5 popular colors", text=fmt)
if result isa ResponseSuccess
    println(JSON.json(JSON.parse(output_text(result)), 2))
else
    println("Request failed — ", output_text(result))
end

{
  "colors": [
    "Red",
    "Blue",
    "Green",
    "Black",
    "White"
  ]
}

Response Accessors

result = respond("Hello!")

if result isa ResponseSuccess
    r = result.response

    output_text(result)      # full text output
    function_calls(result)   # Vector of function call Dicts (empty if none)

    r.id                     # "resp_00e791c8..."
    r.status                 # "completed"
    r.model                  # "gpt-5.2-2025-12-11"
    r.output                 # full output array
    r.usage                  # Dict with token counts
end

Managing Stored Responses

When you pass store=true, the response is saved on OpeAnI's servers and can be retrieved, inspected, or deleted later:

r = respond("Say 'stored response test' and nothing else.", store=true)
if r isa ResponseSuccess
    rid = r.response.id
    println("Stored response ID: ", rid)

    # Retrieve
    retrieved = get_response(rid)
    if retrieved isa ResponseSuccess
        println("Retrieved text: ", output_text(retrieved))
    end

    # List input items
    items = list_input_items(rid)
    if items isa Dict
        println("Input items: ", length(items["data"]))
    end

    # Delete
    del = delete_response(rid)
    if del isa Dict
        println("Deleted: ", del["deleted"])
    end
else
    println("Request failed — ", output_text(r))
end

Stored response ID: resp_059c532252752d140069c7ae24c28881a180e4cab2d6d05cd7
Retrieved text: stored response test
Input items: 1
Deleted: true

Metadata

Attach arbitrary key-value metadata to any request for tracking, filtering, or debugging:

result = respond(
    "Say 'metadata test' and nothing else.",
    metadata=Dict("env" => "docs", "request_id" => "demo_123")
)
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end

metadata test

Service Tier

Control the processing tier for your request ("auto", "default", "flex", "priority"):

result = respond("Say 'tier test' and nothing else.", service_tier="auto")
if result isa ResponseSuccess
    println(output_text(result))
else
    println("Request failed — ", output_text(result))
end

tier test

Counting Input Tokens

Estimate token usage before making a full request — useful for cost estimation or verifying that input fits within the context window:

result = count_input_tokens(input="Tell me a joke about programming")
if result isa Dict
    println("Input tokens: ", result["input_tokens"])
else
    println("Request failed — see result for details")
end

Input tokens: 12

With tools and instructions:

tool = function_tool("search", "Search for information",
    parameters=Dict("type" => "object", "properties" => Dict(
        "query" => Dict("type" => "string")
    ), "required" => ["query"], "additionalProperties" => false),
    strict=true
)
result = count_input_tokens(
    input="Search for Julia language news",
    instructions="You are a helpful assistant.",
    tools=[tool]
)
if result isa Dict
    println("Tokens with tools: ", result["input_tokens"])
else
    println("Request failed — see result for details")
end

Tokens with tools: 54

Compacting Conversations

For long conversations, compact_response compresses the history into opaque, encrypted items that reduce token usage while preserving context:

items = [
    Dict("role" => "user", "content" => "Hello, I want to learn about Julia."),
    Dict("type" => "message", "role" => "assistant", "status" => "completed",
         "content" => [Dict("type" => "output_text",
            "text" => "Julia is a high-performance programming language for technical computing.")])
]
result = compact_response(input=items)
if result isa Dict
    println("Compact succeeded")
    println("Output items: ", length(result["output"]))
    println("Usage: ", result["usage"])
else
    println("Request failed — see result for details")
end

Compact succeeded
Output items: 2
Usage: Dict{String, Any}("input_tokens" => 125, "input_tokens_details" => Dict{String, Any}("cached_tokens" => 0), "output_tokens_details" => Dict{String, Any}("reasoning_tokens" => 0), "total_tokens" => 537, "output_tokens" => 412)

Cancelling Responses

Cancel an in-progress (background) response:

# Start a background response
result = respond("Write a very long essay about Julia", background=true)

# Cancel it
if result isa ResponseSuccess
    cancel_result = cancel_response(result.response.id)
    if cancel_result isa ResponseSuccess
        println("Cancelled: ", cancel_result.response.status)
    end
end

Parameters Reference

Parameter	Type	Default	Description
`model`	String	`"gpt-5.2"`	Model to use
`input`	`Union{String, Vector}`	(required)	String or `Vector{InputMessage}`
`instructions`	String	—	System-level instructions
`tools`	Vector	—	Available tools (function, web search, file search)
`tool_choice`	String	—	`"auto"`, `"none"`, `"required"`
`parallel_tool_calls`	Bool	—	Allow parallel tool calls
`temperature`	Float64	—	0.0–2.0 (mutually exclusive with `top_p`)
`top_p`	Float64	—	0.0–1.0 (mutually exclusive with `temperature`)
`max_output_tokens`	Int64	—	Maximum tokens in the response
`stream`	Bool	—	Enable streaming
`text`	TextConfig	—	Output format (text, jsonobject, jsonschema)
`reasoning`	Reasoning	—	Reasoning config for O-series models
`truncation`	String	—	`"auto"` or `"disabled"`
`store`	Bool	—	Store response for later retrieval
`metadata`	Dict	—	Arbitrary key-value metadata
`previous_response_id`	String	—	Chain to a previous response for multi-turn
`user`	String	—	End-user identifier
`background`	Bool	—	Run in background (cancellable)
`include`	Vector{String}	—	Extra data to include (e.g. `"file_search_call.results"`)
`max_tool_calls`	Int64	—	Max number of tool calls per turn
`service_tier`	String	—	`"auto"`, `"default"`, `"flex"`, `"priority"`
`top_logprobs`	Int64	—	0–20, top log probabilities
`prompt`	Dict	—	Prompt template reference
`prompt_cache_key`	String	—	Cache key for prompt caching
`prompt_cache_retention`	String	—	`"in-memory"` or `"24h"`
`conversation`	Any	—	Conversation context (String or Dict)
`context_management`	Vector	—	Context management strategies
`stream_options`	Dict	—	Streaming options (e.g. `include_usage`)

Retry Behaviour

respond automatically retries on HTTP 429, 500, and 503 errors with exponential backoff and jitter (up to 30 attempts, max 60s delay). On 429 responses, the Retry-After header is respected. This applies to all Responses API functions (respond, get_response, delete_response, etc.).

Parameter Validation

The Respond constructor validates parameter ranges at construction time:

Parameter	Valid Range
`temperature`	0.0–2.0
`top_p`	0.0–1.0
`max_output_tokens`	≥ 1
`top_logprobs`	0–20

Out-of-range values throw ArgumentError. Additionally, temperature and top_p are mutually exclusive.