Exploiting documents is stock-in-trade for spies and their masters. It is an ancient practice, long predating the democratised activism of the Epstein files, Signalgate, the Steele Dossier, and Hillary Clinton’s emails (and before them, WikiLeaks, Snowden, the Panama Papers, and the Pentagon Papers). A simple written message, sometimes discovered by chance, surreptitiously communicated, or strategically emplaced, could forewarn of enemy intentions, as Herodotus recorded in The Histories, bring down a monarch, as it did in 1587 with Mary, Queen of Scots, or short circuit a troop deployment, as did the discovery of General Robert E. Lee’s Special Order 191 in 1862 prior to the Battle of Antietam.
'Document exploitation', however, is a relatively new intelligence practice. Since the Second World War, the US, UK, and wartime Allies have built vast systems for collecting, indexing, translating, analysing and storing captured enemy documents and foreign publications. They were the direct inheritors of industrial-age sorting, counting and production technologies, and the pursuit of military advantage helped them propel breakthrough innovations in computing and communications. Cryptography and atomic science were catalytic, ultimately helped bring about Allied victory, and made significant contributions in the formative early days of the Information Age.
Post-war geopolitical, scientific and technical competition between East and West marked an important transition. Through means fair or foul, the players on each side sought to acquire intelligence that could tip the balance. It could begin as a whispered voice, wireless transmission or radio broadcast. It might consist of one errant slip of paper, or the complete holdings of a state archive, cabinets filled with party policies, military orders, or transcribed signals intercepts, or sheafs of lab reports, equipment blueprints, and industrial manufacturing records. Intelligence needles have always been difficult to find, in other words, but now industrial-scale haystacks of information complicated the picture.
Anglo-American Intelligence
Allied document exploitation in the 1940s was a sprawling, multi-theatre set of activities in Europe and the Pacific. It was, ostensibly, a dispersed support function for processing intelligence acquired in combat, prisoner of war interrogations, signals intercepts, broadcast monitoring, coopted expertise, and large-scale acquisition of serials and other publications. A major feature of the process was (and remains) technical conversion of materials into standardised, ready-to-use formats, through decryption, transcription, translation and other means. Each of the wartime Allies had national systems and battlefield approaches for this. Among them, the Anglo-American intelligence partnership during and after the Second World War was uniquely close-knit and far-reaching in its implications.
The British and the Americans each had previous experience - more on that below - working with enemy documents and materials. American officers received specialised intelligence training at British facilities, and in the early days when Britain and Nazi Germany were at war and America was still officially neutral, the British began building intelligence channels and joint structures to ensure close cooperation with Washington. British capabilities, importantly, included the geographic reach of imperial colonies and domains, and the additional expertise Canada, Australia, and New Zealand brought to the table through Commonwealth channels. Wartime synergy laid the foundation for the 1946 UKUSA intelligence pact, its expansion into the Five Eyes alliance, and standardised practices adopted within NATO and beyond that endure to this day.
This legacy of intelligence cooperation is evident in how modern allies handle captured information, but it is the US experience that stands out. The Americans have pursued and developed their intelligence exploitation capabilities more aggressively, consistently and publicly than have any of their allies or adversaries. American document exploitation, or 'DOCEX' - pronounced 'dock-ex' in intelligence community vernacular - has evolved over time, adapting to new priorities, investigative techniques, and information and communications technologies. It has matured since the Second World War into a cornerstone intelligence discipline in the US. It now has a permanent doctrinal and institutional architecture, and it operates as a centrally coordinated, technically specialised, and industrial-scale enterprise. Importantly, its trajectory corresponds closely with an Information Age context that heavily conditioned intelligence developments, influences and technologies.
Defining Document Exploitation
Other than a handful of professional publications, trade journal articles and websites, discussion of document exploitation is limited to allusions and references to it in historicised intelligence cases, features, elements capabilities, and applications. Traces of DOCEX influence are discernible in adjacent histories of open source intelligence, digital evidence, international fact-finding, war crimes prosecutions, and disinformation. DOCEX operations have also channeled an extensive corpus of primary sources into the public domain, a boon for researchers in multiple fields and a source of recurring controversies, debates, and ethical dilemmas.
Exploitation is a peculiar frame. Intelligence work leverages opportunities to to access secret information and elucidate meaning from it. The methods needed to achieve this may or may not involve exploitation in the morally grey sense of manipulation or deceit. However gaining access to and making sense of intelligence has always involved a degree of exploitation in technical sense of the term: using supplemental methods and tools to extract encoded or hidden information from intelligence finds that have already been accessed. To put it another way, this kind of intelligence 'exploitation' - or 'exploitation intelligence' - refers to a more specific, secondary process of squeezing information of value from materials already under control. This includes documents, weapons systems, digital devices, personal effects, and physical or administrative records, whether they were gathered on the battlefield, recovered in a raid, or secured through legal process.
As far as I have been able to determine, no comprehensive history of document exploitation, as one form of exploitation intelligence, has ever been written. The problem is not merely academic. It represents a real-world issue for applied historians, political scientists, and legal scholars to unpack. My own interest in the subject, for example, is driven in part by field experience and direct observation. I worked with DOCEX units and personnel on operations, and as a forward-deployed intelligence analyst I was a direct beneficiary of DOCEX-processed primary sources. Later, as a scholar and civil servant, I observed specific inheritances and confused echoes of American DOCEX in evidence-centric discourses, practices and priorities while on assignments in Afghanistan, Iraq, Nigeria and Ukraine.
Researching Conflict Records
I founded the Conflict Records Unit as a vehicle for research and writing on the subject. Its name was directly inspired by the Conflict Records Research Centre, a US Department of Defense-funded initiative at the National Defense University that had been the public face of DOCEX between 2010 and 2015. I surveyed readings on captured enemy records, battlefield evidence, and conflict archives. I produced a general bibliography and hosted a series of events with historians, archivists, and investigators from the UK, Belgium, Canada, France, the Netherlands, Poland, Ukraine and the US. It became clear from all this that document exploitation has a unique history. It has not yet been written, it is a story worth telling, and it can help illuminate historical puzzles and contemporary challenges alike. The mosaic of relevant readings points clearly to this gap, and the primary sources available to help fill it are extensive. Some of my initial findings, while still tentative, are summarised here.
Sources refer to the Second World War or the Vietnam War as historical index cases, but there are instances of explicitly self-described document exploitation dating much further back, to middle of the 19th century. Document exploitation was a feature of the American Civil War, the Philippine-American War (1899), the First World War (1914-1918), the Irish Revolution and Civil War period (1912-1922), and Mandatory Palestine (1936-1939) - not to mention domestic American and British technical exploitation of diplomatic communications, cable intercepts and telegrams (1917-1929). The Second World War was indeed a historical turning point for early exploitation intelligence as it emerged to support signals intelligence, human intelligence, and counterintelligence. Its survivals became foundational architectures in dozens of subsequent cases, during the Cold War (1945-1991), the post-Cold War decade (1991-2001), and in the quarter century that has elapsed since September 11th, 2001.
The UK and the US are not the only state entities to have recognised the intelligence value of bits of paper and built systems to take advantage of them. Within the ambit of Anglo-American interests during the Second World War, there were notable Canadian, Australian, and New Zealand contributions, from 30 Assault Unit 'pinch' operations to steal German Enigma codes and equipment, to the specialised work of the Allied Translator and Interpreter Section in the Pacific. There is evidence too of French, German, Japanese and Soviet exploitation activities. Follow the historical breadcrumbs, and the list of cases grows. Questions of context and common practice also arise. Based on the examples of imperial and colonial administration, civil war, conventional war, and counterinsurgency alluded to in the preceding paragraph, it is entirely reasonable to expect to find more historical evidence explaining the emergence of bureaucratized document and media exploitation.
Tracing Illustrative Trajectories
These are just the minimalist cases, marked by state sanction, resource and capability allocations, operational histories, and legal and doctrinal frameworks. Extend this in maximalist terms and the list grows yet again. Consider the proliferation of independent civil society organisations whose purpose is to document and preserve an authenticated public record of war casualties, civilian deaths, human rights violations, and disinformation. At time of writing, my research has yielded a list of 83 historical and contemporary cases. I expect the dataset to grow. The prominence, reach, and extended influence of the American experience, however, make it the intrinsic case for this work - point zero for deeper study of what came before and after.
The Americans thoroughly documented their approach, reflected and improved on their operations, and socialised them well beyond their own shores. They did this for well over a century, a trajectory that offers the potential for wider understanding of change and continuity in intelligence history. The US borrowed, deployed and codified document exploitation in fits and starts, as expedient responses to fleeting opportunities and the exigencies of war. The results have accumulated into a significant corpus of official directives, iterated doctrinal publications, training circulars, field manuals, unit archives and databases, and intelligence output. There is a lineage that is well documented. It must be meticulously traced for its causal effects to be fully understood.
Document exploitation was once an intelligence support function. In a sense, it still is. It has been transformed into a central and permanent feature of the American and Allied intelligence enterprise, and in the US it underwrites an impressive set of interagency capabilities and resources. Managed by the Defense Intelligence Agency and rebranded 'DOMEX' (for document and media exploitation) in 2007, its senior most element is the National Media Exploitation Center, which reports directly to the Office of the Director of National Intelligence. It is mandated, resourced and structured to maximise the intelligence utility of information through an interagency DOMEX Committee that includes the Department of Justice, Department of Homeland Security, National Security Agency (NSA), Federal Bureau of Investigations (FBI), Drug Enforcement Administration (DEA), Central Intelligence Agency (CIA), and others. Its trajectory is illustrative, not least because of historical and contemporary entanglements with information technologies that enable it. This now includes artificial intelligence, which at this stage in its development - Silicon Valley claims notwithstanding - is a fundamentally experimental and unreliable technology. There is, therefore, much to recommend research into the potential consequences of delegating or automating control to it of sweeping exploitation infrastructure.
Mike Innes is Director of the Conflict Records Unit and a Senior Visiting Research Fellow in the Department of War Studies. His current book project, The Cipher of Polybius, is a history of Anglo-American intelligence, American DOCEX, and information technologies. He is the author, most recently, of Streets Without Joy: A Political History of Sanctuary and War, 1959-2009 (Hurst, 2021) and numerous shorter items in scholarly, trade and popular outlets. Early in his career he did a bit of soldiering before serving as a NATO official in the Balkans and Afghanistan and as a UN official in Iraq. An elected fellow of the Royal Geographical Society and of the Royal Historical Society, he holds BA and MA degrees in history, a PhD in Politics from the University of London’s School of Oriental and African Studies, and a Graduate Diploma in Law from Birkbeck.