OpenAI’s new Spark model codes 15x faster than GPT-5.3-Codex – but there’s a catch

Follow ZDNET: Add us as a preferred source on Google.
ZDNET's key takeaways
- OpenAI targets "conversational" coding, not slow batch-style agents.
- Big latency wins: 80% faster roundtrip, 50% faster time-to-first-token.
- Runs on Cerebras WSE-3 chips for a latency-first Codex serving tier.
The Codex team at OpenAI is on fire. Less than two weeks after releasing a dedicated agent-based Codex app for Macs, and only a week after releasing the faster and more steerable GPT-5.3-Codex language model, OpenAI is counting on lightning striking for a third time.
Also: OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new
Today, the company has announced a research preview of GPT-5.3-Codex-Spark, a smaller version of GPT-5.3-Codex built for real-time coding in Codex. The company reports that it generates code 15 times faster while "remaining highly capable for real-world coding tasks." There is a catch, and I'll talk about that in a minute.
Also: OpenAI's Codex just got its own Mac app - and anyone can try it for free now
Codex-Spark will initially be available only to $200/mo Pro tier users, with separate rate limits during the preview period. If it follows OpenAI's usual release strategy for Codex releases, Plus users will be next, with other tiers gaining access fairly quickly.
(Disclosure: Ziff Davis, ZDNET's parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
Expanding the Codex family for real-time collaboration
OpenAI says Codex-Spark is its "first model designed specifically for working with Codex in real-time -- making targeted edits, reshaping logic, or refining interfaces and seeing results immediately."
Let's deconstruct this briefly. Most agentic AI programming tools take a while to respond to instructions. In my programming work, I can give an instruction (and this applies to both Codex and Claude Code) and go off and work on something else for a while. Sometimes it's just a few minutes. Other times, it can be long enough to get lunch.
Also: I got 4 years of product development done in 4 days for $200, and I'm still stunned
Codex-Spark is apparently able to respond much faster, allowing for quick and continuous work. This could speed up development considerably, especially for simpler prompts and queries.
I know that I've been occasionally frustrated when I've asked an AI a super simple question that should have generated an immediate response, but instead I still had to wait five minutes for an answer.
By making responsiveness a core feature, the model supports more fluid, conversational coding. Sometimes, using coding agents feels more like old-school batch style coding. This is designed to overcome that feeling.
GPT-5.3-Codex-Spark isn't intended to replace the base GPT-5.3-Codex. Instead, Spark was designed to complement high-performance AI models built for long-running, autonomous tasks lasting hours, days, or weeks.
Performance
The Codex-Spark model is intended for work where responsiveness matters as much as intelligence. It supports interruption and redirection mid-task, enabling tight iteration loops.
This is something that appeals to me, because I always think of something more to tell the AI ten seconds after I've given it an assignment.
Also: I used Claude Code to vibe code a Mac app in 8 hours, but it was more work than magic
The Spark model defaults to lightweight, targeted edits, making quick tweaks rather than taking big swings. It also doesn't automatically run tests unless requested.
OpenAI has been able to reduce latency (faster turnaround) across the full request-response pipeline. It says that overhead per client/server roundtrip has been reduced by 80%. Per-token overhead has been reduced by 30%. The time-to-first-token has been reduced by 50% through session initialization and streaming optimizations.
Another mechanism that improves responsiveness during iteration is the introduction of a persistent WebSocket connection, so the connection doesn't have to continually be renegotiated.
Powered by Cerebras AI chips
In January, OpenAI announced a partnership with AI chipmaker Cerebras. We've been covering Cerebras for a while. We've covered its inference service, its work with DeepSeek, its work boosting the performance of Meta's Llama model, and Cerebras' announcement of a really big AI chip, meant to double LLM performance.
GPT-5.3-Codex-Spark is the first milestone for the OpenAI/Cerebras partnership announced last month. The Spark model runs on Cerebras' Wafer Scale Engine 3, which is a high-performance AI chip architecture that boosts speed by putting all the compute resources on a single wafer-scale processor the size of a pancake.
Also: 7 ChatGPT settings tweaks that I can no longer work without - and I'm a power user
Usually, a semiconductor wafer contains a whole bunch of processors, which later in the production process get cut apart and put into their own packaging. The Cerebras wafer contains just one chip, making it a very, very big processor with very, very closely coupled connections.
According to Sean Lie, CTO and co-founder of Cerebras, "What excites us most about GPT-5.3-Codex-Spark is partnering with OpenAI and the developer community to discover what fast inference makes possible -- new interaction patterns, new use cases, and a fundamentally different model experience. This preview is just the beginning."
The gotchas
Now, here are the gotchas.
First, OpenAI says that "when demand is high, you may see slower access or temporary queuing as we balance reliability across users." So, fast, unless too many people want to go fast.
Here's the kicker. The company says, "On SWE-Bench Pro and Terminal-Bench 2.0, two benchmarks evaluating agentic software engineering capability, GPT-5.3-Codex-Spark underperforms GPT-5.3-Codex, but can accomplish the tasks in a fraction of the time."
Last week, in the GPT-5.3-Codex announcement, OpenAI said that GPT-5.3-Codex was the first model it classifies as "high capability" for cybersecurity, according to its published Preparedness Framework. On the other hand, the company admitted that GPT-5.3-Codex-Spark "does not have a plausible chance of reaching our Preparedness Framework threshold for high capability in cybersecurity."
Think on these statements, dear reader. This AI isn't as smart, but it does do those not-as-smart things a lot faster. 15x speed is certainly nothing to sneeze at. But do you really want an AI to make coding mistakes 15 times faster and produce code that is less secure?
Let me tell you this. "Eh, it's good enough" isn't really good enough when you have thousands of pissed off users coming at you with torches and pitchforks because you suddenly broke their software with a new release. Ask me how I know.
Last week, we learned that OpenAI uses Codex to write Codex. We also know that it uses it to be able to build code much faster. So the company clearly has a use case for something that's way faster, but not as smart. As I get a better handle on what that is and where Spark fits, I'll let you know.
What's next?
OpenAI shared that it is working toward dual modes of reasoning and real-time work for its Codex models.
The company says, "Codex-Spark is the first step toward a Codex with two complementary modes: longer-horizon reasoning and execution, and real-time collaboration for rapid iteration. Over time, the modes will blend."
The workflow model it envisions is interesting. According to OpenAI, the intent is that eventually "Codex can keep you in a tight interactive loop while delegating longer-running work to sub-agents in the background, or fanning out tasks to many models in parallel when you want breadth and speed, so you don't have to choose a single mode up front."
Also: I tried a Claude Code rival that's local, open source, and completely free - how it went
Essentially, it's working toward the best of both worlds. But for now, you can choose fast or accurate. That's a tough choice. But the accurate is getting more accurate, and now, at least, you can opt for fast when you want it (as long as you keep the trade-offs in mind and you're paying for the Pro tier).
What about you? Would you trade some intelligence and security capability for 15x faster coding responses? Does the idea of a real-time, interruptible AI collaborator appeal to you, or do you prefer a more deliberate, higher-accuracy model for serious development work?
How concerned are you about the cybersecurity distinction between Codex-Spark and the full GPT-5.3-Codex model? And if you're a Pro user, do you see yourself switching between "fast" and "smart" modes depending on the task? Let us know in the comments below.
You can follow my day-to-day project updates on social media. Be sure to subscribe to my weekly update newsletter, and follow me on Twitter/X at @DavidGewirtz, on Facebook at Facebook.com/DavidGewirtz, on Instagram at Instagram.com/DavidGewirtz, on Bluesky at @DavidGewirtz.com, and on YouTube at YouTube.com/DavidGewirtzTV.
Post Comment