The Increasing Similarity Between Claude Code and Codex

Recently, OpenAI officially released its new large model, GPT-5.4-Cyber. Many users have noted a strong sense of déjà vu with this model.

This new model closely mirrors Anthropic’s recently launched Claude Mythos in terms of target user base, application scenarios, and promotional strategies. The competition between these two companies has become overtly apparent, as highlighted by a recent New York Times article.

This trend of homogenization is not limited to the foundational models. A look at the recent products released by both companies shows they are becoming reflections of each other.

In the capital market, this convergence is even more evident. The valuations of both companies are closely aligned, with Anthropic recently surpassing OpenAI in the enterprise market. Investors perceive these two unicorns as developing similar capabilities.

It appears that the homogenization of foundational models is inevitably leading to a convergence of upper-layer applications.

Today, I want to discuss the two benchmark tools representing the highest level of AI-assisted programming: OpenAI’s Codex and Anthropic’s Claude Code. How have they evolved from distinct paths to become so alike?

From Divergence to Convergence: The Evolution of Two Giants

Going back a few years, Codex and Claude Code were products of entirely different technological philosophies.

Codex’s underlying logic is “speed is the ultimate weapon.” It functions like an experienced developer ready to assist with code completion at any moment.

In OpenAI’s vision, Codex is a lightweight, highly interactive terminal agent focused on rapid iteration and interactive programming. With the support of Cerebras WSE-3 hardware, it can achieve a throughput of 1000 tokens per second. In specific workflows, Codex offers suggestions, automatic editing, and fully automated approval modes, keeping developers in the loop. This design is particularly suited for developers who need to quickly build prototypes and handle high-frequency interactions.

In contrast, Claude Code was designed with a more reserved and architect-like attribute from the start.

Anthropic infused it with the ability to tackle extremely complex tasks. It relies on a vast context window of up to 1 million tokens and unique “compression” technology to enable infinite dialogue. Claude Code’s mantra is “global control, act after careful consideration.” Before executing any action, it uses agent search technology to thoroughly understand the entire codebase, coordinating consistent modifications across multiple files. For enterprise-level refactoring tasks involving thousands of lines of code, Claude Code demonstrates impressive dominance.

However, as time has passed and application scenarios have expanded, these two originally distinct tools have begun to borrow from each other.

In handling complex projects, the biggest bottleneck faced by single AI models is context pollution. If you ask an AI to refactor an authentication module, after reading 40 files, it often forgets the design patterns of the first file. To address this pain point, both companies have provided nearly identical solutions: assigning independent context windows for each sub-task.

OpenAI quickly launched a new macOS desktop application that isolates tasks by project in different threads and runs independently in a cloud sandbox. Anthropic introduced an agent team architecture, allowing developers to derive multiple sub-agents that share task lists and dependencies while working in their independent windows. Whether termed a “cloud sandbox” or an “agent team,” the core engineering concepts have completely aligned.

In benchmark testing, both models exhibit a subtle balance. GPT-5.3-Codex leads with a score of 77.3% in the Terminal-Bench 2.0 tasks, while Claude Code achieves 80.8% in the complex SWE-bench Verified leaderboard. Each has excelled in its respective strengths while striving to overcome its weaknesses.

The OpenClaw Effect: The Invisible Hand Breaking Down Barriers

If the internal strategies of both companies have driven them toward homogenization, the pressure from the open-source ecosystem is an undeniable external force. Here, we must mention the profound impact of OpenClaw on the entire AI programming tool landscape.

As a workflow framework launched by the open-source community, OpenClaw has effectively dismantled the ecological barriers that giants have painstakingly built. It standardizes the interaction process between large models and local toolchains. Previously, how to elegantly invoke local Git commits, safely run test scripts in a sandbox, and perform multi-step reasoning verification were proprietary “black technologies” that Codex and Claude Code took pride in.

However, OpenClaw has abstracted these processes into a universal protocol. This means developers are no longer bound to specific platforms for a particular collaborative mode. The celebration within the open-source community has made standardization an irreversible trend. In light of this, both OpenAI and Anthropic must lower their guard to accommodate these open standards.

As the technical barriers are leveled by the open-source power of OpenClaw, and as all advanced features become standard configurations in the industry, the only path for Codex and Claude Code is to engage in endless competition at the finer user experience level. This is why they seem increasingly alike; within a standardized framework, the optimal solution often becomes singular—much like convergent evolution in biology.

Codex is Catching Up to Claude Code

Although Claude Code and Codex are evolving toward convergence, differences still exist, and Codex has become more favored by developers in certain aspects.

Recently, in the r/ClaudeCode community, a senior engineer with 14 years of experience shared a rigorous evaluation after spending 100 hours using Claude Code and 20 hours using Codex on a complex project containing 80,000 lines of code.

From his perspective, using Claude Code felt like guiding an engineer racing against a deadline; it was fast but often ignored the specifications written by the developer in CLAUDE.md, preferring to pile on code in existing files to complete tasks, lacking a refactoring mindset.

In contrast, Codex felt more like a steady developer with 5 to 6 years of experience. Although its processing speed was 3 to 4 times slower, it would pause to think and refactor code while strictly adhering to instruction boundaries. This high degree of autonomy allowed the engineer to confidently assign tasks to it and focus on other work.

Similar sentiments have emerged on social networks like X. Researcher Aran Komatsuzaki noted that while Claude Code excels in front-end tasks, Codex is clearly more robust in back-end planning and maintaining information updates due to its frequent network searches.

The comment sections are filled with real-world experiences and critiques. Developers have sharply pointed out that while the Opus-based model runs quickly, it often accumulates a lot of “code cleanliness debt” in projects, whereas Codex, although slower, manages to keep things tidy while progressing. Some users even summarized a survival rule: when the context window usage reaches 70%, it’s crucial to start a new session to avoid hidden bugs added by the system.

These genuine complaints from the front lines clearly indicate that as the capabilities of these two tools increasingly overlap, the final allegiance of developers often hinges on minor experiential differences related to “debt filling costs” and “maintenance mindset.” Additionally, there are unique challenges for Chinese users, such as:

Cold Reflection: The Ecological Struggle Behind Homogenization

Of course, the advantages and disadvantages of Codex and Claude Code also depend on the developers themselves. As noted in the evaluation report by u/Canamerican726: if you lack software engineering knowledge, both tools will yield poor results; tools do not equate to skills.

This statement shatters the illusion that AI programming tools have long cultivated. We once believed that with a powerful AI assistant, even a novice coder could single-handedly build enterprise-level applications. The reality is that Claude Code requires a highly focused and skilled “driver” to avoid getting lost in a vast codebase. While Codex is more independent, it also needs developers to provide precise contextual information to maximize its effectiveness.

So, in an era where tool capabilities are highly homogenized, where have these companies’ competitive advantages shifted?

The answer lies in the tedious financial reports and pricing strategies. Under similar tasks, Claude Code often consumes 3 to 4 times the number of tokens as Codex, leading to higher usage costs. For enterprise teams, using Claude Code can cost between $100 to $200 per developer each month, while Codex offers its capabilities in a more affordable subscription plan and has built a large user base through its extensive GitHub community.

Anthropic’s ambition is to deeply embed Claude Code into the workflows of tech giants that are not short on cash. For instance, Stripe has allowed 1,370 engineers to use Claude Code, completing a cross-language code migration that would have taken 10 people weeks in just 4 days. Ramp has even managed to reduce incident response times by 80% using it. OpenAI, on the other hand, has made Codex the default choice for many ordinary developers due to its pervasive ecosystem penetration.

This is no longer a simple technological competition but a war of ecological binding, pricing strategies, and reshaping user habits.

Developers at a Crossroads

Reflecting on the technological evolution over the past year, the release of GPT-5.4-Cyber is just a minor footnote in this long battle. Codex and Claude Code are moving toward “the same face,” marking the transition of AI programming tools from an early stage filled with uncertainties and curiosities to a mature and mundane phase of industrial production.

Currently, Claude Code automatically generates 135,000 GitHub submissions daily, accounting for 4% of the total public submissions on the web. We can foresee that in the near future, most template codes, basic test cases, and routine code refactoring will be quietly handled by these increasingly similar AI agents in the background.

Faced with two super tools that are infinitely approaching each other in capabilities and mimicking each other in experience, what remains of our core value as human developers? Perhaps the tool dividend period is about to come to a complete end. When everyone wields the same sharp weapon, the true determinant of victory will no longer be who has better code completion speed, but who can better define problems, who possesses a broader architectural vision, and who can find that irreplaceable human uniqueness in a code world filled with AI.

So, which one will you choose?