Beyond Reasoning: Why Agentic Thinking Is the New Frontier for Global AI

Former Alibaba Qwen lead Lin Junyang argues that AI is shifting from a 'reasoning' phase to an 'agentic' phase where models are trained to prioritize action and environmental interaction. He highlights the technical difficulties in merging deep thinking with instruction-following and predicts that future AI success will depend on building integrated systems that can independently determine the necessary level of deliberation for any given task.

Close-up of Scrabble tiles spelling 'Alibaba' and 'Qwen' on a wooden surface.

Key Takeaways

  • 1The AI industry is moving from 'reasoning-style' thinking to 'agentic-style' thinking focused on real-world action.
  • 2Merging instruction-following and deep reasoning in a single model remains a major technical challenge due to data distribution conflicts.
  • 3Longer reasoning chains are often a waste of compute; efficiency and 'action-oriented' thinking are becoming the new benchmarks for quality.
  • 4AI training is evolving from focusing solely on the model to optimizing the entire 'model-plus-environment' system.
  • 5Alibaba's Qwen team chose to release separate specialized models after finding that unified versions often delivered mediocre performance.

Editor's
Desk

Strategic Analysis

Lin Junyang’s insights reflect a sobering realization within the top tiers of Chinese AI development: the 'brute force' approach to reasoning compute is hit by the law of diminishing returns. While DeepSeek-R1 demonstrated that reasoning could be democratized, Lin is signaling that the next competitive advantage lies in 'agency'—the ability of a model to interface with the world. This is a strategic pivot that shifts the bottleneck from sheer compute and data to the complexity of 'environment design' and 'closed-loop interactions.' For global observers, this suggests that the next phase of the US-China AI rivalry will be fought not just on benchmarks of logic, but on the reliability of AI to operate as autonomous agents in professional and industrial workflows.

China Daily Brief Editorial
Strategic Insight
China Daily Brief

In a post-departure manifesto that has sent ripples through the Chinese tech ecosystem, Lin Junyang, the former technical lead of Alibaba’s Qwen (Tongyi Qianwen) large language model team, has outlined a fundamental shift in the evolution of artificial intelligence. Following his exit from Alibaba, Lin argues that the industry is rapidly outgrowing the 'reasoning' phase characterized by OpenAI’s o1 and DeepSeek’s R1. The next epoch, he contends, will be defined by 'agentic thinking'—intelligence designed not just to deliberate, but to act within the constraints of the real world.

Reflecting on the progress made in late 2024 and early 2025, Lin notes that the industry successfully proved that 'thinking' could be a trained capability. However, the current focus on lengthening reasoning chains—allowing models to 'think longer' before answering—is reaching a point of diminishing returns. Lin warns that a longer reasoning chain does not necessarily equate to a smarter model; in many cases, it represents a wasteful expenditure of compute on 'noisy' or redundant internal deliberation that fails to produce better outcomes.

The technical struggle at Alibaba provides a cautionary tale for the global AI race. Lin reveals that the Qwen team harbored the ambitious goal of creating a unified system that could seamlessly toggle between 'instruction mode' and 'thinking mode.' Yet, the team discovered that merging these two behaviors is fraught with difficulty because their data distributions and goals are fundamentally different. This friction often resulted in 'mediocre' performance in both directions, leading Alibaba to eventually release separate 'Instruct' and 'Thinking' versions of their Qwen 2.5 models.

Looking toward the second half of 2025, Lin predicts that the core object of AI training will shift from the model itself to a 'model-plus-environment' system. In this new paradigm, the quality of a model will be judged by its ability to achieve progress through continuous interaction with its environment and feedback loops. The definition of 'good thinking' is thus being redefined: it is no longer the most visible or exhaustive chain of thought, but the specific trajectory that most effectively supports decisive action under real-world pressure.

This transition marks the end of the era of static model training and the beginning of the 'Agent' era. For developers, this means the 'core circle' of AI development now includes environment design, rollout infrastructure, and robust evaluators rather than just raw datasets and model architectures. As AI agents begin to handle more complex, multi-step tasks in the physical and digital worlds, the industry’s obsession with internal reasoning is likely to be eclipsed by a demand for reliable, interactive execution.

Share Article

Related Articles

📰
No related articles found