OpenAI has released a research preview of GPT-5.3-Codex-Spark, a smaller model built for real-time coding, and is rolling it out to ChatGPT Pro users through its Codex tools.
It is positioned as OpenAI's first system designed specifically for interactive coding workflows where response time is the primary constraint, and the first publicly disclosed milestone in its partnership with AI hardware company Cerebras.
GPT-5.3-Codex-Spark runs on Cerebras hardware and is tuned for fast inference. When served on ultra-low-latency infrastructure, OpenAI says it can deliver more than 1,000 tokens per second.
Real-time coding
The release adds a second working mode to Codex. OpenAI has recently highlighted that its frontier models can run long tasks for extended periods with little intervention. Codex-Spark targets a different pattern: quick edits, fast iteration, and immediate feedback during coding sessions.
In this preview, the model is text-only with a 128k context window. It has separate rate limits from standard usage, and preview usage does not count toward standard limits. OpenAI warned that demand spikes could cause limited access or temporary queuing as it balances reliability across users.
By default, the model is geared toward lightweight interaction, making minimal, targeted edits. It will not run tests unless a user asks.
Benchmarks and speed
OpenAI cited results on SWE-Bench Pro and Terminal-Bench 2.0, which it described as benchmarks for agentic software engineering capability. It says GPT-5.3-Codex-Spark performs strongly on those tests while completing tasks in a fraction of the time compared with GPT-5.3-Codex.
No detailed benchmark scores were included in the announcement. OpenAI framed the preview as an early period for experimentation and developer feedback, while it works with Cerebras on capacity and the broader user experience.
Pipeline changes
Alongside the model release, OpenAI described changes intended to reduce latency across Codex's request-response pipeline. The work went beyond model inference speed, addressing response streaming between client and server, how inference components operate, and how sessions initialize.
OpenAI introduced a persistent WebSocket connection and made changes to its Responses API. It reported an 80% reduction in overhead per client/server round trip, a 30% reduction in per-token overhead, and a 50% reduction in time to first token. The WebSocket path is enabled for Codex-Spark by default and is expected to become the default for all models soon.
Cerebras hardware
The model runs on Cerebras' Wafer Scale Engine 3, designed for low-latency inference. OpenAI says it integrated this serving path into the same production serving stack used across its broader fleet, enabling it to support Codex and future models.
OpenAI also described Cerebras as complementary to GPUs. GPUs remain central for training and inference and provide the most cost-effective tokens for broad usage, while GPUs and Cerebras can be combined for single workloads.
Sean Lie, CTO and co-founder of Cerebras, framed the announcement around the potential for new usage patterns enabled by faster interaction.
"What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible-new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning," said Sean Lie, CTO and co-founder, Cerebras.
Access and expansion
GPT-5.3-Codex-Spark is available to ChatGPT Pro users in the latest versions of the Codex app, Codex CLI, and the VS Code extension. It is also available via API to a small set of design partners, with a focus on how developers want to integrate it into products.
OpenAI plans to expand access over the coming weeks as it tunes the integration under real workloads. It also described Codex-Spark as the first in a family of ultra-fast models, with future updates expected to include larger models, longer context lengths, and multimodal input.
On safety, OpenAI says Codex-Spark includes the same safety training as its mainline models, including cyber-relevant training. It evaluated the model through its standard deployment process and determined it does not have a plausible chance of reaching its Preparedness Framework threshold for high capability in cybersecurity or biology.
Longer term, OpenAI envisions a Codex experience that combines longer-horizon reasoning and execution with real-time collaboration. It expects these modes to blend over time through background work by sub-agents and parallel task execution.