China’s National Data Bureau has revealed that the country’s repository of high-quality datasets has reached a critical mass, with more than 116,000 datasets now compiled to fuel its domestic artificial intelligence sector. As of the first quarter of this year, the total volume of these datasets exceeded 960 petabytes—a figure approximately 336 times the digital resource capacity of the National Library of China. This aggressive data consolidation underscores Beijing's strategic pivot toward treating data as a primary factor of production.
At the heart of this expansion is the launch of the National Dataset Management Service Platform, which entered its trial phase during the 9th Digital China Summit. The platform is designed to provide a lifecycle management service, ensuring that data is not only collected but also processed, circulated, and utilized effectively. By certifying over 200 supply-and-demand entities and hosting more than 1,000 initial datasets, the bureau aims to create a centralized ecosystem that bridges the gap between raw information and model training.
Liu Liehong, Director of the National Data Bureau, emphasized that the next phase of China’s AI evolution will focus on transitioning from general large language models to specialized industry models and 'embodied intelligence.' This shift reflects a move toward more practical applications in manufacturing and autonomous decision-making. To support this, the government is promoting 'Data Empowerment Factories,' which are specialized hubs dedicated to producing the high-fidelity data required for sophisticated multi-modal AI and autonomous agents.
This infrastructure build-out serves as a direct response to the global AI race, where the quality of training data is increasingly seen as the ultimate differentiator. As Western AI development faces hurdles regarding copyright and data transparency, China is leveraging its centralized administrative power to standardize and mobilize vast quantities of sector-specific data. This top-down approach is intended to provide Chinese firms with a competitive edge in training models that are more accurate, industry-aware, and culturally aligned.
