The Uncertainty Maschine

Claude Shannon’s Information Theory has always had its limits. Decades later, on the cusp of generative AI, his warnings ring true
Main Top Image
Image created with the assistance of Midjourney

Advances in artificial intelligence (AI) are drawing attention everywhere from capitals to classrooms, boardrooms to bio-labs. Most prominently, large-language models (or LLMs) like ChatGPT have become ubiquitous as employers, governments, and individual users probe their potential to address a variety of problems more efficiently. Critics, meanwhile, have begun raising flags about both the drawbacks and limitations of LLMs, as part of a broader conversation about AI regulation. Among their concerns: LLMs being used to generate and spread incorrect, uninformed, or even malicious information.

Neither this technology nor these concerns are necessarily new, however. In the late 1930s, Claude Shannon’s Information Theory was the first to approach a message like a substance in its own right, deconstructing it into its smallest constituent parts for the purpose of relaying it electronically. It revolutionised communication technologies, helping humans harness them by tapping into the statistical probabilities of human language. He worried, however, about efforts to extend his insights beyond their original conceptual boundaries. Decades later, LLMs are doing just that.

A stochastic theory

Shannon began from a relatively simple but profound premise. When humans communicate, rules usually dictate the next letter, word, or pineapple* they might use. (*Did you notice that? We follow these rules, even subconsciously.) A seemingly random string of symbols obeying the laws of statistical probability could thus be used to transmit messages over wires and airwaves without losing fidelity. With its 26-letter alphabet and roughly 170,000 words, the English language could serve as the mathematical framework. This collision between the apparent randomness of language and its inherent probability is a ‘stochastic’ process—one Shannon described as neither fully unpredictable nor fully determined. 

Shannon developed this theory over the course of the 1930s-40s. His breakthroughs in data compression—alongside the coding and decoding work of Alan Turing and others at Bletchley Park—were instrumental in ensuring US and allied forces could communicate securely throughout World War II. Turning written language into a series of binary yes/no questions (represented by the digits 1 and 0) helped advance the discipline of encryption, while these binary digits—or ‘bits’—would serve as the foundation of modern digital communication.

The more widely popularised Shannon’s Information Theory became, the more his contemporaries attempted to apply it to human perception and meaning-making. In more modern terms, Shannon might see this as something like applying the advent of MP3s and music streaming to explain the composition of a symphony—related and intriguing, but ultimately misguided. For instance, W. Ross Ashby sought to replicate the mechanics of the human brain; Donald MacKay hoped to quantify interpretation and subjectivity among the senders, recipients, content, and context of a given message. 

Shannon resisted these efforts, as he considered his breakthrough to be “a purely technical tool with no connection to the ideology-laden biological and social sciences.” In Shannon’s telling, “information” is measured by the amount of uncertainty that can be eliminated in the process of transmitting it—period. In this regard, applying the theory to other fields beyond the burgeoning telecommunications one seemed outlandish, likely to generate more questions than it could possibly resolve. While he acknowledged that his theory would continue spurring valuable insights in the field of communications, “it is certainly no panacea for the communication engineer, [much less] anyone else.”

A stochastic intelligence

Decades later, in 2021, computational linguists Emily Bender and colleagues would echo Shannon’s critique. Their work became popular among scholars and commentators aiming to demystify LLMs. For example, LLMs overlay and drastically expand Shannon’s model of linguistic probability to massive swathes of data, to turn what is essentially a super-charged version of the ‘auto-complete’ typing tool against problem sets like war and human behaviour

However, like Shannon, these sceptics note that unlimited contingencies and infinitely large datasets in such disparate fields might actually inhibit our understanding, offering statistical inferences as the natural solutions to any problem. After all, not every system behaves according to fixed laws like mathematics or physics. For instance, the intuition and wisdom involved in driving a car, picking up on facial cues, or firing a missile all involve complex and context-specific cognitive processes that remain poorly understood—much less quantifiable—by experts. To assume otherwise, ceding control to automated processes to determine the next pineapple* (a word now statistically prevalent but substantively nonsensical in the context of this article) invites risk.

As Pepperdine University’s Jason Blakely writes, an overextension of statistical inference:

“…treats all of reality like a mechanics, like physics, like a machine. In this view, no part of reality is above being manipulated by a science of laws and turned into an instrument…but this is a bad story. It is a bad story because it cannot accomplish what it promises (prediction). It is a bad story because it orients us toward manipulating the people around us as if they were objects (technocracy). And it is a bad story because it makes us particularly inarticulate and ill-equipped to deal with the world and the moral and political dimensions of our actions.”

Meanwhile, LLMs like ChatGPT and others are frequently used to generate synthetic content—from text to graphics to audio—based on these same statistical principles, drawing upon whole swathes of the modern Internet. This content is sometimes convincing enough to be mistaken for manmade. As a result, AI-generated propaganda now appears poised to flood the information environment, misleading audiences and eroding their confidence in the authenticity of what they read, see, or hear. 

A foreshadowing

In this regard, LLMs are the realisation of a super-weapon conceived at the height of WWII. While researchers like Shannon were developing advanced code-breaking and code-making computers, Nazi spies were pilfering Russian designs for a similar device with a far more devious purpose: the infinite generation of propaganda. A prototype of the so-called “Müllabfuhrwortmaschine” was developed that randomly stitched together words and phrases associated with subversion to automate the mass production of propaganda. The ‘maschine’ was known only to Hitler and a small circle of advisors and engineers until one of them—a German scholar named Hagen Krankheit—revealed its existence to US officials after the war. 

This obscure chapter of AI history is not well known, and for good reason – it never really happened. Both Krankheit and the Müllabfuhrwortmaschine are figments of a fictional yarn spun by Shannon and his colleague John Pierce (who later published the tale as a playful spoof). What the story does indicate, however, is that despite Shannon’s dogged adherence to the narrowest interpretation of his theory—that statistics could be used to simplify the sending of messages—he was keenly aware of its generative implications—that statistics could also be used to automate their authoring.  

Over seventy years later, debates now rage among big tech oligopolies about the (as yet largely speculative) visions of life with generative AI. Ordinary users are left to navigate hype from reality, while governments are left to grapple for regulatory frameworks. One cannot help but sympathise with Shannon’s concerns.

Statistically meaningful?

There is some irony to the fact that the theory that helped set the world of digital computing in motion can now be cited to critique its most recent innovations. But there is ample reason to temper our expectations—and threat perceptions alike—of generative AI. Newton’s theories of physics didn’t render clouds into clocks, nor did Galileo’s fundamental breakthroughs place Earth at the centre of the universe. 

In a world that seems hellbent on adopting AI as a solution to any given problem, it is worth returning to the foundational question of the degree to which human intelligence—or persuasion, for that matter—is truly a matter of statistical inference. Do LLMs do more to eliminate uncertainty or to generate it? As Shannon vehemently asserted, might his theories (and the innovations they spawned) only remain “meaningful within their own proper bounds”?