Recommendation
The dual challenges of open source big model governance and community order construction
Core content:
1. The value and risks of open source big models coexist
2. Building a governance framework for innovation freedom and risk prevention
3. The rise and contribution of open source communities in the field of AI
Yang Fangxian
Founder of 53AI/Most Valuable Expert of Tencent Cloud (TVP)
Open source is valuable because its low threshold and high transparency bring a constant stream of creativity and improvement. But at the same time, the risks inherent in big model technology itself - from illusions to illegal abuse - are also real. The governance of open source big models needs to achieve "dual goals". The first is to ensure the vitality of the open source ecosystem. Build a predictable responsibility haven for well-intentioned open source contributors to achieve "innovation freedom"; the second is to jointly prevent major risks of big models. The characteristics of open source ecology, transparency, openness, and equal collaboration are no longer suitable for the traditional centralized supervision model. The governance of open source big models should return to the open source community and build a "community governance order."1. Open source: the source of digital innovationOpen source has become the cornerstone of the digital world over the past few decades. From the Apache Web server in the early days of the Internet, to the Android system that occupies more than 70% of smartphones in the mobile era [1] , to the ubiquitous Linux operating system in the cloud computing field [2] , open source software has formed the backbone of the global information infrastructure. In the field of artificial intelligence, Google's open source TensorFlow has made deep learning tools accessible, and the open source large model DeepSeek-R1 launched by a Chinese team has approached the performance level of closed-source models under limited computing power, pushing the wave of AI innovation into a new stage [3] .
The key to open source's ability to drive innovation lies in its culture of openness and collaboration. Developers from all over the world contribute code, discover vulnerabilities, and optimize functions around the clock, jointly shaping the future of technology. It is particularly noteworthy that China is changing from a follower to an important contributor in the open source wave. The Global Open Source Ecosystem Insight Report (2024) shows that there are nearly 8.4 million Chinese developers, accounting for one-third of the world's total. Among the top 100 most active open source software projects in the world, open source software projects led by Chinese developers account for 17%, ranking second, and there is still huge room for growth in the future [4 ] .2. In the field of artificial intelligence, open source large models are rising rapidly
Since 2022, open source models have been catching up. The gap between open source and closed source models in performance and application areas has narrowed rapidly, and they have even surpassed them in some aspects [5] . Why are they catching up?First, the open source "marketplace collaboration model" seems to be repeating itself in the AI field, opening up a new era of "mass innovation". The open source big model makes the underlying capabilities public. The public can download the weights for free and deploy them by themselves, conduct "secondary creation", and fine-tune them in different industries and scenarios to form dedicated versions, greatly enhancing flexibility and adaptability. Developers share engineering experience and participate in trial and error of technical routes, which accelerates the development and evolution of AI. This trend reminds us of the open source paradigm proposed in the classic book "The Cathedral and the Bazaar" - anyone can participate in innovation, forming a "borderless technology market" [6] . The AI field is expected to continue to write the legend of open source.Second, open source models improve the transparency and security of AI systems . “Just as the best-known encryption algorithms are often the most secure, the best-known AI models are also likely to be the most secure.” [7 ] Since the model architecture and weights are open, external researchers can deeply “dissect” the model, discover biases or loopholes in a timely manner, and work together to improve it. In contrast, closed models are like “black boxes” that are difficult for outsiders to supervise.Finally, and most importantly, open source has broken the existing closed structure in the field of AI. Previously, the most advanced AI models were often controlled by a few giants, and the downstream industry paid for their use through APIs, lacking the ability to bargain or constrain them. Open source reduces the lock-in effect of a single supplier and enhances technological autonomy. In the field of large models, the open source community has reproduced most of the capabilities of commercial closed-source models with lower computing power, forcing large technology companies to re-evaluate their strategies and ensure healthy competition in the industry.3. The openness of the current major open source models
Although people often use "open source" and "closed source" as binary contrasts, in reality, the degree of openness of models presents a progressive spectrum [8] . There are many levels in between, from fully closed models that only provide API interfaces to fully open models with public model architecture, weights, and even training data.In summary, the degree of openness of big models forms a rich spectrum. From a governance perspective, it is also necessary to classify and implement policies for different levels of model openness. For the sake of discussion, the open source big models mentioned below are based on the current mainstream industry standards, which means that the model parameters are open and transparent, and the open source license allows users to freely run, study, and modify the model, except for general statements such as prohibiting illegal use.IV. Open Source Big Model Governance - Drawing Wisdom from Open Source Culture
Open source is valuable because its low threshold and high transparency bring a steady stream of creativity and improvement. But at the same time, the risks inherent in big model technology itself - from illusions to illegal abuse - are also real. Specifically, open source big model governance needs to achieve "dual goals":On the one hand, we must ensure the vitality of the open source ecosystem, realize "innovation freedom", build a predictable safe haven for open source developers, and provide ample space for good-intentioned open source exploration, so as to attract more developers to participate in open source and promote the prosperity and development of open source technology.On the other hand, in view of the uniqueness of open source risks, a "community order" for open source AI governance should be established to avoid causing major harm. Among them: the development history of open source itself provides many valuable experiences, from community autonomy to collaborative governance, which will provide important inspiration for the security governance of open source models.Focusing on the above two goals, this paper puts forward the following suggestions:1. Build a predictable safe haven for open source developers:Specifically, the safe harbor consists of two clear boundaries of responsibility:First, in the vertical direction: distinguish between the upstream and downstream of the industry, clarify the two roles of model developers and users, and clarify the boundaries of responsibility.The basic starting point of responsibility allocation is to consider the subject's "risk management capabilities". Model developers control model design, development and training, while deployers delve into the details of specific application scenarios. There is an essential difference between the two in their ability to manage the security risks of large models. Therefore, one of the current consensuses of AI governance in various countries is to distinguish the roles of industry entities and adapt different governance responsibilities. For example:The EU Artificial Intelligence Law defines two different entities: model developers (Providers) , whose main responsibility is to ensure that the AI systems they develop meet the requirements of security and transparency. Risk assessments and appropriate technical measures are taken to enhance the reliability of AI systems; model deployers (Deployers) mainly ensure that AI systems comply with regulatory requirements during application. High-risk AI systems are continuously monitored, and adequate rights protection mechanisms are provided to users.The controversy surrounding California's 2024 SB-1047 also reflects the perception of classified governance. The bill did not initially clearly distinguish between "developers" and "deployers," but instead placed almost all obligations on "model developers" (providers) . This proposal sparked strong opposition from industry insiders, including Fei-Fei Li [10] . Some experts pointed out that SB-1047 shifted the use responsibility that deployers should bear to developers, which was equivalent to making "motor manufacturers responsible for accidents caused by the use of electric saws" [11] . The bill was ultimately rejected.China's "Interim Measures for the Administration of Generative Artificial Intelligence Services" takes the idea of classified governance a step further. The regulations make an overall exemption for the model development stage. Unless the service is provided to the public, the service provider shall bear the relevant requirements of the research and development stage, such as the basic specifications of data governance. This actually forms a "sandbox that encourages research and development exploration": in the research and development stage, various entities, including open source communities, are allowed to explore freely; once entering the application stage, especially services accessible to the public, the service itself will be brought under supervision.The "disclaimer" for downstream applications in the license agreement commonly used in the open source model is self-consistent with the above logic of responsibility division. As the "social contract" of the open source community, the core function of the open source license is to clarify the rights and responsibilities between developers and downstream users through authorization terms. For example, the BSD license clearly states that developers are not responsible for any direct or indirect losses. The MIT license emphasizes providing software "as is" without any express or implied warranties. Open source licenses comply with the basic principles of copyright law and contract law, and through the responsibility framework, they eliminate the worries of open source contributors. It promotes collaborative innovation in the open source community. This is also the fundamental reason why countries generally recognize that open source licenses have clear legal effect in laws and judicial cases.Second, horizontally: Based on the significant differences between open source and closed source, open source should not bear the same responsibilities as closed source.Training a model is like building a machine. If the model has basic and obvious flaws, developers need to foresee this and take precautions as much as possible. Therefore, whether open source or closed source, developers should assume basic security responsibilities. However, compared with closed source, this "basic responsibility" should have boundaries for open source model developers.On the one hand, open source developers often lack sufficient control. After releasing a model, it is “impossible to monitor and prevent downstream abuse” [12] . Downstream personnel can fine-tune and circumvent the security guardrails set by the original developer, and it is impossible for the open source developer to foresee all extreme uses. In contrast, closed model development companies are usually also providers of model commercial services. In order to provide stable and reliable AI commercial services, closed source model companies usually have larger-scale security investments in the model development stage , such as internal red team testing and security optimization [13] . A related report from the Brookings Institution pointed out: "Open source developers usually do not profit from their contributions, and they do not have the budget and compliance department to fulfill those onerous obligations." [14]On the other hand, governance needs to take into account the enthusiasm of open source developers. If open source developers are required to bear the same obligations as closed source commercial providers, it will lead to the open source community or researchers choosing not to publicly release models in order to avoid responsibility, thereby inhibiting open source innovation. [15] Relevant think tanks have criticized the practice of "failing to distinguish between closed source and open source AI", which actually entangles transparent and open projects and closed commercial systems "into a regulatory web, resulting in open source developers not knowing whether their well-intentioned contributions will be praised or punished, thus creating a chilling effect on open source innovation." [16]For this reason, even in the European Union, which has the strictest regulation, its AI Act stipulates that free and open-source AI systems are generally not subject to the obligations of the Act; and in the field of general models, open source providers are also exempted from the obligations of compiling documents and providing information. The United States has taken a more relaxed approach. The National Telecommunications and Information Administration (NTIA) of the United States pointed out that there is currently no sufficient evidence to prove that open source models are more dangerous than closed models, and there is no need to immediately introduce mandatory regulations for open source models. It emphasizes continuous monitoring as a risk bottom line, and only considers taking action when necessary.2. Learning governance wisdom from open source culture: trusting the power of the communityThe open source ecosystem highlights the characteristics of transparency, openness, and equal collaboration. If it is rigidly applied to the centralized and bureaucratic traditional regulatory model, it will not only fail to achieve the expected regulatory effect, but will also have a huge negative impact on the vitality of the open source ecosystem. The security governance of the open source large model should return to the open source community and learn from the governance wisdom in the open source culture.Community self-discipline and supervision are the core of open source governance. For example, Hugging Face, a distribution center for open source AI models, has developed a mature community governance experience. The platform requires model publishers to provide a detailed model card , explain the model training data, performance, ethical impact, etc., and specifically list the applicable and inapplicable scopes of the model. At the same time, the platform will conduct additional review of the model and explicitly prohibit models with malicious backdoor codes and illegal uses. Once discovered, the administrator will quickly remove the model from the platform. If a model is repeatedly reported to contain harmful content, the condemnation of the community and public opinion will also have a certain deterrent effect. For some high-risk models, Hugging Face will require the addition of a "not for all audiences" label, so that a warning pops up when the user accesses it and the content is hidden by default. [17] Community users can mark and report harmful content or suspicious models, or contribute improved data sets to fine-tune and correct them. Similarly, the Open Assistant project uses GitHub issues and Discord channels to collect user feedback on model outputs and encourages everyone to participate in "red team" testing to discover model weaknesses [18] .In addition to the open source community, the development and deployment of open source AI models involve multiple stakeholders, including cloud computing providers, model hosting platforms, downstream users and application developers, distribution channels, third-party evaluation auditors, end users, and even government regulators. Each party has unique capabilities and responsibilities, and effective AI risk governance can only be achieved through full collaboration [19] ."Enough eyes can make all problems emerge." In 1997, Eric S. Raymond proposed the "Linus Law", which still shines today. Whether in the software era or the AI era, in an open community, errors and defects are easier to discover and correct. This spirit is also permeated in every community practice - the security maintenance of the Linux kernel relies on the joint supervision and patch submission of thousands of developers around the world; major distribution manufacturers are responsible for pushing security updates to users in a timely manner; large IT companies fund vulnerability bounty programs; government departments issue network security baseline requirements... Similarly, in the field of open source big models, we also look forward to forming an open governance ecosystem based on community mechanisms - each type of participant in the ecosystem implements various measures including early warning, perception, correction, and risk reduction for the risk links that it is most capable of controlling, and achieves open collaboration, agile and efficient security collaboration.