The Rise of Tokens in the AI Industry: A New Metric for Intelligence

Introduction

In early 2026, a set of data sparked intense discussions in the global AI industry. According to OpenRouter, the largest AI model API aggregation platform, from February 9 to 15, the token usage of Chinese large models reached 41.2 trillion, surpassing the U.S. models’ 29.4 trillion for the first time in history. This trend continued for several weeks, with the usage exceeding 73 trillion by mid-March, and four out of the top five models globally coming from China.

This data is not presented to compare quantities but to highlight a quiet revolution in the basic measurement unit of the AI industry—tokens are becoming the “kilowatt-hour” of the intelligent era. The dimensions of models, computing power, data, applications, industry, and governance are profoundly reshaped by this established measurement unit. Understanding AI in 2026 begins with understanding tokens.

Sixfold Reconstruction Driven by a Measurement Unit

The measurement unit of the industrial revolution was the “kilowatt-hour,” allowing energy to be precisely measured, priced, and transmitted across domains. The information revolution’s unit was “bits” and “bandwidth,” enabling information to be packaged, transmitted, and billed for the first time. The measurement unit of the intelligent revolution is “tokens,” allowing intelligence to be segmented, measured, priced, and traded for the first time.

The popularization of the token concept and its rapid growth in usage are gradually pushing intelligence toward industrialization, marketization, and circulation.

Models

The economic value of large models is shifting from one-time training costs to long-term inference outputs. Model vendors no longer simply “sell capabilities” but directly “sell tokens,” with pricing based on millions of tokens for input and output becoming a global industry norm. The asset attribute of models is transitioning from “weight files” to “the ability to continuously produce tokens.”

Computing Power

The focus is shifting from “training computing power” to “inference computing power.” Training computing power is pulsed and centralized, while inference computing power is continuous and distributed, posing new requirements for latency, energy efficiency, and geographical distribution. The collaboration of cloud-edge-end computing power, inference-specific chips, silicon photonics interconnects, and computing networks is becoming the new focus of infrastructure. JPMorgan predicts that China’s inference token consumption will grow by more than two orders of magnitude by 2030 compared to 2025.

Data

Data must be processed into standardized fuel before it can be used for power generation; similarly, data entering large models requires cleaning, labeling, and tokenization. In long-tail scenarios like autonomous driving, robot training, and scientific discovery, synthetic data generated from simulations has achieved large-scale application. The construction of a data factor market has entered a substantial phase, where “trainability” and “token output density”—rather than just data volume—are becoming new benchmarks for data asset pricing. This shift is significant: the valuation of data is beginning to link with its actual contributions in the token production chain, providing a solid economic basis for the market-oriented allocation of data factors.

Applications

The shift is from “function delivery” to “token consumption.” Traditional software charges by seats or functions; today’s applications charge based on token usage and business results. Intelligent agents are becoming the primary consumers of tokens, with complex tasks potentially consuming hundreds of thousands or even millions of tokens. The “intelligent agent as a service” market is rapidly expanding, with performance-based billing models being implemented at scale in customer service, marketing, compliance, and programming scenarios. The essence of applications is shifting from “delivering functions” to “consuming intelligence.”

Industry

The industry is transitioning from a “software industry chain” to a “token industry chain.” A new industry chain is forming around the production (models and computing power), distribution (inference networks, APIs, intelligent agent protocols), consumption (applications and intelligent agents), and measurement (evaluation benchmarks, auditing, and trusted verification) of tokens. The boundaries between the model layer, inference service layer, intelligent agent middleware layer, and industry application layer are becoming increasingly clear, with industry-specific intelligent agents becoming mainstream investments. Model vendors, cloud vendors, chip manufacturers, green electricity operators, and content delivery network vendors are collectively forming a collaborative ecosystem for the token industry chain. According to the China Academy of Information and Communications Technology, the scale of China’s core AI industry is expected to exceed 1.2 trillion yuan by 2026, with the collaborative effects across the entire industry chain becoming evident.

Governance

The governance focus is shifting from “algorithm governance” to “full-chain governance of tokens.” As the AI industry has developed, the governance targets have expanded from “algorithms and code” to the production, circulation, consumption, and cross-border full chain of tokens: traceability of tokens, identification of synthetic content, cross-border token flow, constraints on computing power and energy consumption, and trusted evaluation and benchmarks—all of these new issues call for new governance tools and rules. The year 2026 may become a critical year for the concentrated implementation of global AI governance rules.

China’s Position in the Global Token Wave

In the global wave driven by tokens, China is forming a unique position supported by multiple factors.

On the production side of tokens, a cluster of domestic models is rising. A number of domestic models, such as MiniMax, Dark Side of the Moon, DeepQuest, Zhipu, Alibaba Qianwen, and ByteDance Doubao, are leveraging mixed expert architectures and extreme engineering optimizations to continuously improve performance while reducing inference costs to a fraction of comparable global models. On the OpenRouter platform, U.S. users account for 47%, while Chinese users make up about 6%, yet the usage is led by Chinese models—this recognition comes from global developers voting with their feet.

On the consumption side, applications are unprecedentedly deepening, with tokens entering people’s daily lives at an unprecedented speed. A general practitioner in a county hospital can analyze a suspicious lung CT scan in seconds, using AI to highlight nodules and provide differential diagnosis suggestions, compressing a two-week consultation into a single outpatient visit. A farmer in Shouguang, Shandong, can take a picture of a curled cucumber with their phone, and a smart agriculture app uses tokenized agricultural knowledge to identify whether it’s a thrip or a viral disease and recommend the appropriate pesticide. An elderly person living alone can tell a smart speaker in their dialect, “I feel tight in my chest,” and after a few thousand tokens of conversation, their children’s phones receive a warning and location sharing for emergency services. Delivery riders no longer hear mechanical navigation instructions but receive routes planned based on real-time traffic and elevator wait times. AI assistants in government service halls respond around the clock to inquiries about medical insurance transfers and property registrations, transforming “people running errands” into “tokens running errands.” Tokens are becoming the “invisible labor force” across various industries.

At the industry chain level, a full-stack collaborative ecosystem is rapidly taking shape. From domestic chips like Ascend, Cambricon, and Haiguang to inference service platforms like Volcano Engine, Alibaba Cloud, and Tencent Cloud, the entire industry chain covering chips, computing power, models, middleware, and applications is quickly improving. The “East Data West Computing” project provides low-cost computing power, and green electricity directly supplied to data centers solidifies the energy foundation.

However, it is crucial to recognize that there is still significant room for improvement in areas such as the originality of cutting-edge models, high-end computing power foundations, cross-language and cross-cultural ecological influence, and participation in global rule-making.

The second half of the token wave is not about “already winning” but rather “just beginning.” In the global landscape unfolded by small tokens, China is not only a vast market but also an active builder and responsible co-governor. Understanding tokens means understanding the next phase of artificial intelligence.