As the global race for artificial intelligence supremacy shifts from sheer scale to precision, Chinese state scientists are proposing a paradigm shift in how Large Language Models (LLMs) are trained. At a recent high-level briefing in Beijing, experts from the Chinese Academy of Sciences (CAS) argued that the persistent problem of 'AI hallucinations'—the tendency of models to confidently generate false information—can only be solved by pivoting away from erratic internet-scraped data toward high-quality, verifiable scientific data.
Zhou Yuanchun, Deputy Director of the CAS Computer Network Information Center, framed scientific data as both a 'stabilizer' and an 'accelerator' for the next generation of AI. Unlike the noise-filled data sets common in general-purpose models, scientific data is derived from rigorous observation and self-consistent physical logic. By embedding these natural laws into AI training, developers can create a foundation that prevents models from drifting into logical fallacies or 'nonsense' that contradicts the laws of physics.
The strategic push for scientific data also addresses the 'black box' problem of AI interpretability. Zhou noted that most current data sets lack unique identifiers, making it nearly impossible to trace the provenance of information or determine ownership. China’s proposed solution involves a 'digital ID' system for data, which would allow for a traceable chain of reasoning during AI inference, significantly increasing the transparency and security of autonomous systems.
This drive for data-centric AI is unfolding alongside China’s broader industrial transformation, particularly in the energy sector. Xu Liangfei of Tsinghua University emphasized that just as AI requires high-quality inputs, China’s decarbonization goals require high-density energy carriers. Hydrogen is being positioned not just as a fuel for transport, but as a critical long-term storage solution for the country's massive but volatile wind and solar power investments.
Ultimately, Beijing is betting on 'Physical AI'—intelligence that understands the material world as deeply as it does human language. By integrating scientific rigor into digital models and hydrogen into the physical grid, China seeks a path to technological self-reliance that moves beyond mimicry toward genuine industrial and scientific innovation.
