Ghosts in the machine and the hidden dangers of autopoiesis

AI-generated code is increasingly being used within software development practices globally. Over 90% of developer respondents surveyed by Github and Snyk have reported using AI as a key tool for their day-to-day workload.

However, the lackadaisical attitude with which this ostensive productivity enhancement has been approached overlooks core security risks associated with the use of AI-generated code in software and product development settings.

An overreliance on autopoietic systems – such as AI-generated code – which are capable of rapid, scalable self-authoring with limited human input, could introduce new, increased, and potentially more complex security vulnerabilities into and atop a digital ecosystem already replete with the vulnerabilities of decades past.

The EU, as the world-leading tech norms entrepreneur and equipped with the forthcoming Cyber Resilience Act, is well-placed to navigate this potential storm and deliver a renewed product security-by-design philosophy fit for the AI age.

Our fragile foundations

Software vulnerabilities are ubiquitous features of the digital world: the flaws that are invariably created in code that can allow for unintended – potentially harmful – outcomes if accessed. Largely veiled from us, and only discovered by a talented, underpaid, and thankless few, these vulnerabilities present some of the biggest risks to the continued operation of digital systems. Some of the most consequential vulnerabilities sit beneath ten, twenty, or even thirty years’ worth of code, and can prove essential Jenga pieces to the overall structure. Exploit one of sufficient criticality, and the whole tower collapses.

The world saw this most saliently in April 2024, with the XZ-utils incident, wherein a critical backdoor was discovered – by chance – relating to a core protocol for the internet’s function. Had it been exploited, the backdoor could have potentially allowed the actor responsible virtually unfettered access to hundreds of millions of computers worldwide.

Every addition or alteration to this fragile structure – either innocuously through software or patch provision or maliciously through exploitation – risks prising vulnerabilities into existence and realising real-world risks across the global technology supply chains upon which modern society is predicated. With an average of over 1,200 dependencies per network device, the propensity for systemic risks arising from one vulnerability exploitation are enormous. The ghosts in the machine may yet come to haunt us.

Move fast and things break

The precarity of the XZ-utils incident and those like it has not dissuaded software developers from rapidly incorporating AI use into software development practices. General purpose large language models (LLMs), such as ChatGPT, as well as models tailored for code generation, such as WizardCoder, are some of the most widely used platforms by developers today. Efficient though they may be, the security of development tools and activities is often viewed as an encumbrance to releasing products, and appears far less of a consideration when using AI to code. Recent research has demonstrated how, when tested for security, AI generated code largely fails along key axes. This chiefly includes exposure to consequential flaws, including those contained in MITRE Corporation’s ‘Top 25 Most Dangerous Software Weaknesses.’

Despite the apparent risks, the tech industry largely remains starry-eyed about AI’s use for secure software development. One study reported that 76% of tech industry leaders believe AI generated code is more secure than human-authored code. To give them some credence, one study has reported that, by instructing certain LLMs to embody a ‘security persona’, code vulnerabilities have been reduced; however, in other LLMs, vulnerabilities increased.

Yet, code vulnerabilities are much more than just a mere technical problem; they ultimately exist within the broader institutional, cultural, and human contexts in which they are produced.

In July 2024, the failure to secure vulnerabilities as part of a broader security regime manifested in real-world consequences with the largest information technology outage in history. Borne of an insecure patch for a – ironically enough – security platform, CrowdStrike, the resulting incident generated global disruption to airports, banks, hospitals, and other critical infrastructures. One of the world’s biggest security vendors had failed to sufficiently vet an otherwise routine platform update before being deployed globally and synchronously. This incident demonstrated the fragility of our shared digital foundations, and the ease with which they can be shaken.

These episodes also highlight the innate asymmetry of cyber security and the historic struggle to securely construct and maintain the systems that make our world run. A heightened consciousness of security in development lifecycles in recent years – such as through the promulgation of frameworks like security-by-design – has undoubtedly saved us from unrealised vulnerability exploitations, the reach and scale of which we are fortunate not to know. Yet, through the growing use of AI generated code and an unfounded faith that machines can do security better than humans, we risk undoing this progress and exponentially increasing the spectrum and extent of systemic vulnerabilities.

The Brussels effect

The EU has led the way for embedding secure development practices with the Cyber Resilience Act (CRA), passed in late 2024, which demands firmer standards for the development of products with digital elements. Crucially, it does this by rebalancing the responsibility for product security onto developers and manufacturers, including requiring risk assessments, regular security updates, security-by-default, and vulnerability reporting, underwritten by potential fines up to €15 million ($15.2 million) or 2.5 percent of global annual turnover – whichever is higher – for failures to uphold these expectations. This rebalancing of responsibility has established a precedent that the United States and Australia have echoed in their respective cyber security strategies.

However, the CRA lacks specific guidance on AI use in product development. While AI itself could conceivably be captured as a software product, the law does not fully elucidate how the use of AI should be regulated in the development of products with digital elements. Ancillary regulatory materials, such as guidelines surrounding the secure use of AI in software development, should be developed to provide developers with interim advice on Europe’s expectations. This could include advising on the use of the few dedicated security benchmarks for AI-generated code, such as CyberSecEval and CodeLMSec. More generally, it should advise on security-by-design and -default practices at each stage of the secure software development lifecycle, emphasising specific organisational checks for AI-generated code.

Comprehensively securing the use of AI-generated code in product development would, however, prove a herculean task. Open-source software libraries, for example, are central resources in modern software development. It would be totally unfeasible to regulate the individual use of these repositories or to constrain LLM platforms’ ability to generate code. The CRA’s preventative-first approach, backed by significant fines for non-compliance, provides for a regime that best attends to these risks. Future policy reforms will have to account for any clear failings from AI-generated code, including vulnerability exploitations and the impacts thereof.

Ghostbusting

To entrust – and thus defer responsibility onto – AI to secure software via autopoiesis is folly. The digital world is fragile and acutely susceptible to vulnerability exploitations that could incur vast systemic risks. The widespread use of AI within software development augurs a worrying trend in the tech industry that, without diligent controls, these vulnerabilities could be compounded exponentially. To short-circuit this, Europe, through the CRA, is casting a wide net to increase development security. With an added focus on the use of AI-generated code in software development, Europe can once more push the needle on greater digital security regulation not just within its jurisdiction, but globally.

This is the winning essay of the AI-Cybersecurity Essay Prize Competition 2024-2025, organised in partnership between Binding Hook and the Munich Security Conference (MSC), and sponsored by Google.