AI Patents in the United States and China: Measurement, Organization, and Knowledge Flows
This paper introduces a high-precision AI patent classifier to reveal that while the United States and China exhibit converging AI patenting growth and market value premiums, they differ significantly in organizational structures—with the U.S. dominated by large private firms and China by diverse institutions—and remain technologically interdependent through cross-border knowledge flows.
Original authors:Hanming Fang, Xian Gu, Hanyin Yan, Wu Zhu
This is an AI-generated explanation of the paper below. It is not written or endorsed by the authors. For technical accuracy, refer to the original paper. Read full disclaimer
The Big Picture: Building a Better Ruler
Imagine the United States and China are two giant chefs competing to see who can cook the best "Artificial Intelligence" (AI) meal. For years, they've been counting how many AI dishes they've made by looking at a menu. But the paper argues that the menu the US government (the USPTO) was using was broken. It was missing most of the real AI dishes and accidentally labeling regular food as AI.
The authors of this paper decided to build a super-accurate AI detector. They used a high-tech "smart brain" (a Large Language Model) that they trained by reading thousands of patent documents. Think of it like teaching a dog to sniff out specific scents; this digital dog learned to smell the difference between a real AI invention and a fake one with 97% accuracy.
Once they had this new, perfect ruler, they measured the AI "cooking" in both countries from 1976 to 2023. Here is what they found:
1. The Race: Who is Winning?
The Volume: Both countries are cooking up a storm. The number of AI patents is exploding.
The Scoreboard: For a long time, the US was ahead. But recently, China has overtaken the US in the sheer number of AI patents filed every year.
The Menu: Interestingly, both countries are cooking very similar dishes. They are both focusing heavily on "Planning" (making robots think ahead), "Vision" (computer eyes), and "Hardware" (chips and circuits).
The Twist: The US got a head start on "Natural Language Processing" (like Chatbots), but China is catching up fast and has started accelerating rapidly since 2020.
2. The Kitchen: Who is Cooking?
This is where the two countries look very different.
The US Kitchen: It's like a private club of giants. Almost all the big AI patents come from a small group of massive, private tech companies (like Google, Microsoft, IBM, Amazon). It's a "winner-take-all" game where the big players dominate.
The Chinese Kitchen: It's a busy, diverse bazaar. While private giants like Tencent and Huawei are there, the government is also cooking. State-owned companies and universities are major players. In China, the government and schools are deeply involved in creating the tech, whereas in the US, it's mostly the private sector.
3. The Map: Where is the Cooking Happening?
The US: The AI innovation is stuck in a few super-hubs. It's like a lighthouse that stays bright in Silicon Valley and Boston. Over the last 20 years, the light has barely spread to new towns. The US is mature; the hot spots are established and staying put.
China: The innovation is spreading like wildfire. It started in big cities like Beijing and Shanghai, but it has rapidly exploded into smaller cities and provinces. The Chinese "AI map" is getting bigger and more crowded every year.
4. The Value: Is the Food Worth Eating?
A common criticism is that China is just making "fake" patents to get government cash (subsidies), and that these patents aren't actually valuable.
The Test: The authors looked at the stock market. When a company gets an AI patent, does its stock price go up?
The Result:Yes. In both the US and China, AI patents make companies more valuable than non-AI patents. Even patents from Chinese universities and state-owned companies are valued highly by the market. This proves that China's AI boom isn't just "junk" for show; it's creating real, valuable technology.
5. The Connection: Are They Breaking Up?
There is a lot of talk about the US and China "decoupling" or cutting ties.
The Reality: They are still very much in love (technologically speaking).
The Flow: Chinese inventors are still heavily relying on US knowledge. They cite US patents constantly. It's like a student in China still reading the textbooks written by US professors.
The Reverse: The US cites Chinese patents less often, and mostly in areas that aren't the "core" of AI.
The Verdict: They aren't separating. They are in a fierce competition, but they are still standing on each other's shoulders to build the future.
Summary Analogy
Imagine two construction crews building skyscrapers.
The US crew is made of a few elite, private master builders who have been working on the same few city blocks for decades. They are very efficient and their buildings are worth a fortune.
The Chinese crew is a massive army including private builders, government workers, and university teams. They are building skyscrapers everywhere, spreading out to new neighborhoods rapidly.
The Secret: Even though they are racing to see who builds the tallest tower, the Chinese crew is still using the blueprints and tools invented by the US crew. They haven't stopped talking to each other; they are just trying to build faster than the other guy.
The Bottom Line: The US and China are converging in what they are building (similar AI tech), but they are diverging in how they are building it (private giants vs. mixed government/university effort) and where it is spreading (stable hubs vs. rapid expansion). And despite the political noise, they are still deeply connected.
1. Problem Statement
The paper addresses a critical gap in the empirical analysis of global AI competition: the lack of a reliable, high-precision metric to measure AI innovation at scale.
The Identification Problem: Existing measures, specifically the US Patent and Trademark Office's (USPTO) Artificial Intelligence Patent Dataset (AIPD), suffer from significant measurement error. The USPTO's current LSTM-based classifier yields a precision of 40.5% and recall of 37.5%. This implies that nearly 60% of patents flagged as "AI" are false positives, and over 60% of true AI patents are missed (false negatives).
Consequence: These errors introduce substantial attenuation bias into economic analyses regarding firm-level innovation, productivity, and the comparative dynamics between the US and China.
Goal: To develop a superior classifier to accurately identify AI patents, validate its performance across borders, and use it to analyze the structural differences in AI innovation between the US and China.
2. Methodology
The authors propose a new classification framework, the FGYZ classifier, leveraging modern Natural Language Processing (NLP) and Large Language Models (LLMs).
A. Data Construction
Source Data: The study utilizes the USPTO's AIPD seed (positive examples) and anti-seed (negative examples) datasets, originally manually labeled by Giczy et al. (2022) and Pairolero et al. (2025).
Scope: The analysis covers granted utility patents from the US (1976–2023) and invention patents from China (2010–2023).
Subfields: Patents are classified into eight AI subfields: Machine Learning, Natural Language Processing (NLP), Speech, Vision, Planning, Knowledge Processing, Hardware, and Evolutionary Computation.
B. Model Architecture
Base Model: The authors fine-tune PatentSBERTa, a transformer-based language model pre-trained specifically on patent texts (abstracts, claims, and full descriptions).
Training Strategy:
The model is fine-tuned using contrastive learning objectives to capture semantic and contextual relationships specific to the patent domain.
An 80/20 training-test split is used with five-fold cross-validation for hyperparameter tuning.
Eight separate binary classifiers are trained, one for each AI subfield, allowing for multi-label classification (a patent can belong to multiple subfields).
Exclusion: The "Evolutionary Computation" subfield is excluded from empirical analysis due to a small training sample size (only 128 labeled observations), which resulted in poor predictive accuracy.
C. Validation Framework
To ensure the classifier's validity, the authors employ a multi-pronged validation strategy:
Citation-Based Connectivity: They define a "Directional Citation Preference" to measure how strongly a group of patents cites a benchmark group (Dual-Positive: patents identified as AI by both USPTO and FGYZ).
Hypothesis: True AI patents should cite the AI frontier more frequently than random chance.
Lexical Similarity Analysis: Using TF-IDF re-weighting, they measure the semantic overlap between patent groups.
Method: They penalize common legal/technical boilerplate (using IDF weights derived from non-AI patents) to isolate distinctive AI vocabulary.
Cross-Border Generalization: The US-trained model is applied to Chinese patents. Validation checks if Chinese patents classified as AI by FGYZ exhibit stronger citation and lexical links to US AI patents than to US non-AI patents.
3. Key Contributions
Methodological Advancement: The FGYZ classifier significantly outperforms the USPTO's LSTM model, achieving 97.0% precision, 91.3% recall, and a 94.0% F1 score. It successfully identifies a broader and more economically meaningful set of AI inventions.
High-Resolution Dataset: The authors construct a novel dataset of 876,668 US AI patents and 651,630 Chinese AI patents, matched to nearly 400,000 unique inventors.
External Validity: They demonstrate that a model trained on US data generalizes effectively to Chinese patents, evidenced by strong cross-border citation connectivity and lexical alignment with the US AI frontier.
4. Key Results
A. Growth and Convergence in Volume
Rapid Expansion: Both countries have seen exponential growth in AI patenting, accelerating significantly after the mid-2010s.
Volume Shift: While the US led early, China overtook the US in total annual AI patent counts starting in 2020.
Subfield Composition: There is broad convergence in the distribution of subfields (Planning, Vision, and Hardware are dominant in both). However, divergence exists in NLP, where the US expanded earlier, while China saw a sharp acceleration post-2020.
China: More institutionally diverse. While private giants (Tencent, Baidu, Huawei) lead, State-Owned Enterprises (SOEs) and Universities play a much larger, prominent role, particularly in hardware and applied subfields.
Geographic Diffusion:
US: AI innovation remains tightly anchored in early "super-clusters" (San Francisco Bay Area, Northeast Corridor) with limited geographic expansion over time.
China: Exhibits rapid spatial diffusion. AI activity has spread quickly from initial hubs (Beijing, Shanghai, Shenzhen) to secondary provincial capitals, reflecting state-led initiatives to distribute R&D capacity.
C. Economic Value and Knowledge Flows
Valuation Premium: AI patents command a robust market-value premium over non-AI patents in both countries. This holds true even for Chinese patents held by universities and SOEs, challenging the view that Chinese non-market patents are merely "junk" or subsidy-driven.
Knowledge Flow Dynamics:
US: Academic institutions operate as "intellectual enclaves" (high self-citation, low industry citation).
China: Strong reciprocal linkages exist between academia/SOEs and the private sector. Private firms cite state-sector patents more frequently than they cite other private firms, indicating a strategic role for state actors in generating economically relevant knowledge.
Cross-Border Dependency: Contrary to narratives of "decoupling," there is continued technological interdependence.
Chinese AI inventors rely heavily on US frontier knowledge (high citation intensity to US AI).
US inventors cite Chinese patents less intensively, and more selectively (often outside core AI domains).
The relationship is characterized by asymmetric cross-border learning rather than isolation.
5. Significance
Policy and Strategy: The findings suggest that while the US and China are converging in the volume and technical composition of AI innovation, they remain divergent in their institutional organization and geographic diffusion.
Economic Interpretation: The robust market valuation of Chinese AI patents (including those from non-market entities) suggests that China's AI ecosystem is generating genuine economic value, not just administrative output.
Global Innovation Landscape: The paper refutes the idea of a complete technological decoupling. Instead, it highlights a complex global innovation environment where China is rapidly catching up but remains deeply dependent on US frontier knowledge, particularly in core algorithmic domains.
Future Research: The FGYZ classifier provides a reliable tool for future economic research on AI, allowing for more accurate assessments of productivity, firm value, and the impact of industrial policy.