China’s generative AI governance
In the West, discussing content control runs counter to the foundations of liberal politics, particularly the libertarianism of the tech community. China, by contrast, never had an assumption that online content should not, or could not, be controlled. Bill Clinton might have believed that Chinese attempts to control the Internet were tantamount to “nailing jello to a wall”. As it turns out, if you have enough nails and a very big hammer, and the will to use them, a great degree of control is possible indeed.
From Chinese regulators’ point of view, therefore, the advent of generative AI is merely the next novel way in which online content can be created, distributed, managed, and accessed. In co-evolution with the development of AI, they have both adapted existing regulatory tools and developed new ones to deal with the AI challenge. First, regulators focused on AI’s ability to play a role in prioritising and ranking information. Second, they worked to control the emergence of deep fakes.
China’s media control toolkit is not merely built on censorship (the removal of information from the public realm) or propaganda (the production of politically desirable content). It also covers the ability to set priorities and signal the relative importance of stories covered in official media.
However, this ability has been challenged before, most notably by Jinri Toutiao, an algorithm-enabled news aggregator launched in 2012 by ByteDance. Its popularity rose swiftly in the following years, and it became China’s most-used news app. However, its founder had been vocal about the goal of the algorithm: providing the content a user might want, irrespective of the political value of that content to authorities. A groundswell of criticism of the app and some of its competitors appeared in official media outlets, followed by stricter scrutiny by authorities, and even a temporary ban from app stores.
Deep regulation
In addition to concerns about information prioritisation, regulators also became increasingly wary about potentially misleading content. Specifically, face-swapping on Chinese social media platforms led to greater regulatory attention to deepfakes. Although this started out in the entertainment realm, it does not take a leap of the imagination to see that the growing sophistication of this technology would soon enable the production of videos in which Xi Jinping, for instance, could be made to say just about anything.
The regulatory machine sprang swiftly into operation. In December 2020, a top-level development plan for the legal system identified recommendation algorithms and deepfakes specifically as two phenomena requiring a regulatory response. Regulations on recommendation algorithms were finalised in 2021, followed a year later by rules covering “deep synthesis” technologies. The redesignation of “deepfakes” to “deep synthesis” was the result of industrial lobbying to remove the pejorative meaning of the former term, and recognise that the technology might have positive uses as well as negative ones.
Apart from setting rules on content control, such as mandatory watermarking of AI-generated content and monitoring mechanisms, these regulations also include a mandatory registry for algorithms with “public opinion and social mobilization characteristics”, which by now has come to include hundreds of entries. These range from high-profile applications such as Baidu’s Ernie chatbot to AI-enhanced search engines and mobile phone functionalities.
With impeccable timing, the final rules on deep synthesis algorithms were promulgated five days before the release date of ChatGPT, which triggered a new cycle of attention around the potential of generative AI. Regulators felt obligated to respond, and “Interim Measures” on generative AI services were required, even though the deep synthesis regulations already ostensibly covered LLMs as well as image and video generators.
Within five months of ChatGPT being released, a first draft regulation emerged. It echoed much of the deep synthesis rules, but went further in several areas. For instance, it required all AI developers to guarantee the “veracity, accuracy, objectivity and diversity” of their training data. As the draft applied to all segments of AI development, including research and development, this would have severely restricted China’s domestic ability to develop homegrown generative AI technologies.
China’s AI industry and researcher community pushed back, and the final version of the AI regulations, published in July 2023, relented in certain areas. First, its scope of application shrank to only include the provision of generative AI services to the public. The language on training data requirements was softened, and the veracity and accuracy of generated content needed to be assessed within the characteristics of a specific service (enabling, for instance, science fiction content or images of unicorns and dragons). The requirement to register algorithms was retained. Lastly, reflecting growing concerns about China’s laggard position in core AI technologies, the Interim Measures “encouraged” developers to use “secure and reliable” (i.e. domestic) chips, software, tools, computing power and data sources.
Training and testing
The cybersecurity body TC260, formally under the leadership of the Cyberspace Administration of China but with significant participation from the scholarly and corporate communities, released an additional set of standards in March 2024. Although such standards are, officially, not legally binding, regulators and courts hold them to represent best industry standards, which makes them de facto binding mechanisms. On training data, the report stated that any data set containing more than 5% “unlawful and harmful” information should be blacklisted, and any such information should be filtered out from remaining data sets. This effectively excludes, for instance, the archives of large foreign media organisations such as the New York Times, the BBC, and other outlets not accessible from within China. This might have significant consequences on the availability of an English-language corpus for Chinese LLMs.
The standard also establishes testing protocols for generated content. Service providers must compile a question bank of at least 2000 queries covering 31 risk categories, including violations of socialist core values, discriminatory content, commercial infringements, intellectual property infringements, infringement of individuals’ legal rights, and the ability to meet sector-specific security standards. Moreover, they must also test a bank of at least 5000 questions that models must refuse to answer in order to pass on topics including China’s “political system, ideology, image, culture, customs, ethnicities, geography, historical and revolutionary heroes”, as well as specific information pertaining to individuals, such as their gender, age, occupation, and health.
The finalised standard also demonstrates geopolitical considerations, prohibiting companies from using unregistered foundation models to provide generative AI services to the public. On the one hand, those models do not necessarily comply with Chinese content regulation requirements. On the other hand, Beijing intends to develop indigenous capabilities in this realm as well. Currently, many Chinese domestic AI services are based on Meta’s open source LLaMA 2 model, and it remains to be seen whether and how they can migrate to domestic alternatives (or whether Meta will register LLaMA in China). Chinese businesses are rapidly catching up: 01.AI, a Beijing-based AI start-up backed by the well-known tech luminary Kai-Fu Lee, has released an open source model that is technically competitive with LLaMA 2 and other models.
These standards will likely have a significant impact on the specific generative AI applications that will arise in China. First and foremost, limiting Chinese access to foreign technologies may pay off in the longer run as the country invests in its own ecosystem, but may slow down the development of Chinese generative AI capabilities in the immediate future. Second, where ChatGPT and other Western generative AI systems are general in nature, Chinese developers have more rapidly moved to sector-specific applications of their products, amongst others, by integrating them with existing products. For instance, Alibaba announced partnerships for Ernie Bot with household appliance producers and carmakers. This more focused application of generative AI may result in profitable, rapid use cases, as algorithms can be trained in a much more targeted manner, but at the cost of more general and innovative applications.