The journey of a file - part 3

The journey of a file - part 3
Metadata
Date: 2026/02/10
Author:
Reading Time: 18 min read
Tags:
privacylegalsecurity
Share:
Article

This article is part of a series: The journey of a file - The risk of US access.

In the previous parts of our series, we mapped the physical internet backbone, the hardware and operating systems, and the networks and identity access layers. In this third installment, we follow our hypothetical file into the software layer itself: Transit Point 4, where the file is created and stored, and Transit Point 5, where encryption certificates ostensibly secure its journey across the web.

Here, the abstract concepts of digital sovereignty collide with the everyday tools we all use: word processors, cloud syncing folders, and the familiar “padlock” icon in our browsers.

Transit point 4: File creation & (cloud) storage

Before a file ever travels across a network, it is born on a device. For our Dutch freelancer, this likely involves using software such as Microsoft Word, Apple Keynote, or an Adobe application. A user opens a software tool, creates a document, and saves it. That act of creation is far from neutral. The choice of application determines what telemetry is generated, what metadata is embedded in the file, and ultimately who has access to its content.

The software layer

The dominant productivity suites used worldwide — Microsoft 365 (Word, Excel, PowerPoint), Google Workspace, and Adobe Acrobat — are all American companies subject to US law. This means that from the moment you begin typing, the software vendor may be generating usage telemetry and file metadata tied to your identity.

Microsoft 365 in particular has been scrutinized by European data protection authorities. In 2024, the European Data Protection Supervisor (EDPS) found that the European Commission’s own use of Microsoft 365 violated EU data protection rules, specifically because diagnostic data and other telemetry were being transferred to Microsoft and, via Microsoft, to third parties in the United States. While compliance steps were taken in 2025, the underlying jurisdictional exposure for organizations using these tools remains a structural concern rather than a fully resolved issue.

Apple’s Pages and Keynote, often seen as consumer alternatives, store your files in iCloud by default. iCloud is operated by Apple Inc., a US company. Under the CLOUD Act, US authorities can compel Apple to hand over data stored in iCloud regardless of where the data center sits.

Open-source tools like LibreOffice (developed by The Document Foundation, based in Berlin) provide a meaningful alternative. LibreOffice does not phone home, generates no usage telemetry by default, and is developed under European governance. For organizations seeking to reduce dependency at the application layer, it is a well-maintained and professionally viable option.

The real-time AI trap

Before we even reach the question of where a file is stored, a newer risk has emerged at the moment of creation itself. The more or less aggressive integration of AI assistants — Microsoft Copilot, Google Gemini, and similar tools — into word processors and other software means that your document drafts are increasingly processed in the cloud in real time, as you type. The file does not need to be saved or uploaded for this exposure to occur: there is an inescapable technical reality at play. For an AI tool to generate suggestions or summaries, it must first tokenize your input, which requires access to the plaintext version of your document before any encryption can apply. Big Tech vendors market EU data residency commitments as a reassurance, but when Microsoft France was asked directly during a June 2025 parliamentary hearing whether it could guarantee protection against US government requests, the answer was no. Data residency and legal sovereignty are not the same thing. We return to the AI layer in more depth at Transit Point 8.

The storage layer: your device and beyond

Once a file is saved locally, the question becomes: where does it actually live? In practice, most operating systems today are configured to synchronize local files to cloud storage by default. Windows 11 prompts users repeatedly to enable OneDrive sync. macOS defaults to storing Desktop and Documents folders in iCloud Drive. Both are American services subject to US legal process.

The risks at this layer are twofold. First, the access risk: under the CLOUD Act or a FISA Section 702 order, US authorities can request file contents, version history, and associated metadata from these providers without requiring a European court order. Second, the shutdown risk: OFAC sanctions or executive orders can force US companies to cut off access for specific users, organizations, or geographies with very little warning.

The dependency runs deeper than it appears. Even explicitly European cloud services often rely on underlying American infrastructure. WeTransfer, for example, is a Dutch company, but uses Amazon Web Services (AWS) for its storage layer. Dropbox runs on AWS. Box has historically used AWS and Azure. The moment your file is stored by a service running on AWS, Google Cloud, or Azure, it falls within the jurisdictional reach of US law, regardless of the European address of the service provider.

This is the essence of the beneficial owner principle described in our introductory article: US jurisdiction follows the owner, not the server. A file stored in a Frankfurt data center owned by AWS is still legally accessible under the CLOUD Act.

Metadata goldmine

Even before a file leaves a device, it carries a shadow document: its metadata. This includes the author’s name, organization, creation date, revision history, last-edited timestamp, and in many cases the IP address of the device on which it was created. What makes this particularly significant is the legal asymmetry around it: the threshold for US authorities to subpoena metadata is historically lower than for actual file content (a subpoena is a legal order that compels to hand over specific information and can be issued without a judge’s finding of a ‘probable cause’). Even if your documents were encrypted end-to-end, the metadata envelope around them remains accessible through standard legal process. Metadata alone can be sufficient to map a corporate hierarchy, trace a whistleblower, or identify the participants in a sensitive negotiation, without reading a single word of the actual document. The NSA’s former director Michael Hayden was explicit about its value: “we kill people based on metadata.”

A common response to concerns about cloud storage is: “but my files are encrypted.” This reassurance is increasingly complicated by the practice of client-side scanning (CSS). Rather than scanning files after they are uploaded, Big Tech companies scan files against a database of known hashes on the local device, before encryption takes place. The practical effect is that even a genuine end-to-end encryption promise can be undermined at the device level: a surveillance layer is inserted inside the user’s own device, and the encrypted upload becomes a sealed envelope containing a document that has already been read. This debate is not confined to US tech companies. As we discussed in an earlier post in this series, Europe’s proposed Chat Control regulation (CSAR) has at various stages proposed mandatory client-side scanning across EU platforms. The underlying mechanism is identical regardless of which government mandates it: a camera is placed inside the device before encryption can do its work.

Device encryption: a partial shield

One mitigation available to individuals is device encryption. Both Windows (BitLocker) and macOS (FileVault) offer full-disk encryption that protects data when the device is physically off or locked and not logged in. However, this protection is largely irrelevant to jurisdictional risk. When your device is running and you are logged in, the files are decrypted and accessible. More importantly, if your cloud sync is active, encrypted local files are uploaded to cloud storage in decrypted form by the sync client, making local encryption irrelevant to the cloud access risk.

European alternatives

For file creation, LibreOffice and the ONLYOFFICE suite (developed by Ascensio System SIA, a Latvian company) provide capable alternatives to Microsoft and Google. Cryptpad, developed by XWiki SAS in France, offers collaborative document editing with end-to-end encryption, where even the service provider cannot read your content.

For cloud storage, Nextcloud (a German open-source project) can be self-hosted or used through European-hosted providers, giving organizations full control over their data. Databeamer (a Dutch company) offers zero-knowledge encrypted transfers and storage. Proton Drive (Switzerland) provides zero-knowledge storage under Swiss privacy law. These providers are not subject to the US CLOUD Act and have no parent company that would bring them under US jurisdiction.

For organizations considering “sovereign cloud” offerings from large vendors, the earlier-mentioned risk of sovereignty washing applies here too. A US-owned company offering a “European cloud” may still be subject to US legal orders regardless of where the servers sit.

Examples transit point 4

Example 1: Section 702 information requests to Big Tech

The scale of US government data requests to technology companies is substantial and well-documented. According to reporting based on US government transparency data, the US government requests more user data from major technology firms than any other government in the world. The legal backbone for many of these requests involving foreign persons is Section 702 of the Foreign Intelligence Surveillance Act (FISA), which allows US intelligence agencies to compel American technology companies to hand over communications and stored data related to non-US persons outside the United States, without requiring individual warrants.

Section 702 is not a narrow authority. As legal analysis of current US cybersecurity and data privacy priorities makes clear, the current administration has shown no appetite to restrict this authority, and recent legislative changes have if anything expanded the categories of providers who may be covered by it. For a European citizen or business storing files in OneDrive, Google Drive, or iCloud, Section 702 is not an abstract risk. It is a live legal mechanism under which the files you create and store today can be requested by US intelligence services.

What makes this particularly significant at the file creation and storage layer is the combination of two factors: the dominance of US productivity software (which generates detailed usage telemetry alongside the file itself) and the default cloud sync behavior of those same platforms. The result is that not only your file, but your editing history, collaboration patterns, and access logs may all be reachable under a single legal instrument.

Example 2: The “Medical Dad” Google case (2022)

Perhaps no case illustrates the hidden risks of cloud storage more vividly than what happened to Mark, a father in San Francisco, as reported by The Guardian in 2022. At his pediatrician’s suggestion, Mark photographed his toddler son’s inflamed groin area to send to the doctor for remote assessment. He used his Android phone, and the photos were automatically backed up to Google Photos, as is the default behavior on Android devices.

Within days, Google’s automated scanning systems flagged the images. Google not only permanently deleted his account, removing years of emails, contacts, and files, it also reported him to the San Francisco Police Department. Although the police investigated and cleared him entirely, finding no wrongdoing, Google refused to restore his account. The automated system had made a determination, and the human review process did not result in reinstatement.

This case is instructive in several ways. It demonstrates that cloud providers do not merely store your files passively. They actively scan content using automated systems for policy violations, and those systems make errors with real consequences. It also demonstrates the shutdown risk in practice: a user can lose access to years of stored documents, emails, and personal data based on an automated flag, with no effective right of appeal. For a business, the equivalent would be losing access to all cloud-stored documents and communications overnight. Critically, this happened without any government order; it was a unilateral decision by a private US company. The government involvement came after, as a direct consequence of that decision.

Transit point 5: Encryption certificates

When you open a browser and see a padlock icon next to a web address, you are looking at the result of a system called Transport Layer Security (TLS), implemented through digital certificates. This system is the foundation of secure communication on the web. What is rarely visible to the user is how much of this foundational trust infrastructure is controlled by American companies and US-influenced governance.

How certificates work

A TLS certificate does two things: it encrypts the connection between your browser and the website, and it asserts the identity of the website. For a browser to trust a certificate, it must be signed by a Certificate Authority (CA) that the browser already trusts. The list of trusted CAs is maintained in a “root store,” managed by browser makers and operating system vendors.

The dominant root stores are controlled by Apple, Microsoft, Mozilla, and Google — all US companies. The Certificate Authorities whose root certificates appear in these stores are predominantly American or operate under significant US influence: DigiCert, Sectigo, and Let’s Encrypt (operated by the Internet Security Research Group, a US nonprofit) together account for the large majority of certificates issued on the web.

The access risk

The interplay between certificates and surveillance is more subtle than at other transit points. A Certificate Authority can, in principle, be compelled to issue a fraudulent certificate for a domain it does not legitimately control. With such a certificate, a government agency could conduct a “Man-in-the-Middle” attack: intercepting and decrypting HTTPS traffic between a user and a website in real time, while both parties believe the connection is secure. This is not theoretical. The DigiNotar hack in 2011, discussed below, demonstrated exactly how this trust can be exploited in practice, and the mechanisms that make it possible have not structurally changed since then.

The shutdown risk

Perhaps more immediately relevant is the shutdown risk. If the US government were to pressure a Certificate Authority to revoke a certificate for a specific domain, that website would appear as “unsafe” or “untrusted” in all major browsers. For most users, this is effectively equivalent to the site going offline.

A Certificate Authority only has power if browsers trust it. The root stores that determine which CAs are trusted are maintained by the makers of the dominant browsers and operating systems — Google Chrome, Apple Safari, and Microsoft Edge — all American companies. This gives US tech firms a structural role as the ultimate border guards of the internet. If the US government were to pressure Google or Apple to remove a specific foreign CA from their root store, that CA’s certificates would instantly become untrusted for the vast majority of the world’s web users. The power to shut down secure communication does not lie only with the certificate issuers; it lies equally with the American software that reads them.

This is not a hypothetical scenario. Mozilla and Microsoft took exactly this kind of action against TrustCor in 2022, removing the company from their root stores after a Washington Post investigation revealed connections between TrustCor and government contractors with ties to spyware development. The action was taken by the browser makers themselves rather than by a government order, but it illustrates how quickly and decisively the certificate trust chain can be severed at the root level, and how little recourse an affected party has.

We will address more about browser influence in transfer point 7 (The browser).

Certificate Transparency logs: security tool and intelligence feed

To prevent CAs from secretly issuing fraudulent certificates, the industry uses Certificate Transparency (CT) logs: public, searchable databases of every certificate ever issued. While this is a genuine security improvement, CT logs also function as a continuous intelligence feed. When a European company sets up an internal tool — say, project-apollo.europeancompany.eu — and acquires an SSL certificate for it, that subdomain is permanently published in a public, often US-hosted database. Competitors and foreign intelligence services can monitor these logs in real time to map a company’s hidden internal infrastructure and anticipate unannounced projects, acquisitions, or partnerships. The price of certificate transparency is corporate transparency.

The BYOK illusion

Many European companies using US cloud providers believe they are protected because they use “Bring Your Own Key” (BYOK) encryption. This is a dangerous misconception. Under standard BYOK implementations on AWS, Azure, and Google Cloud, the cloud provider still temporarily loads your encryption keys into server memory to perform the encryption or decryption operation. Under the CLOUD Act, the US government can legally compel the provider to extract those keys while they are in active use, completely bypassing the European company’s encryption strategy. Holding the key is not the same as controlling it. True protection requires architectures where keys never enter infrastructure subject to US jurisdiction — a standard that standard BYOK, by design, does not meet. We address more on encryption in Transit Point 6 (Transfer service infrastructure).

eIDAS and Europe’s attempt to reclaim the keys

The European Union has been aware of this structural dependency for some time. The revised eIDAS regulation (eIDAS 2.0) includes provisions for Qualified Web Authentication Certificates (QWACs), issued by Qualified Trust Service Providers (QTSPs) operating under EU supervision. The ambition is clear: Europe wants browser root stores to include EU-supervised Certificate Authorities, reducing dependence on US-controlled trust infrastructure.

However, the original draft of Article 45 of eIDAS 2.0 prompted an unusually unified response from the security research community and browser makers. The concern was that the regulation, as initially written, would have required browsers to trust QTSP-issued certificates regardless of the browser maker’s own security standards. Critics warned this could create a system where an EU member state government, acting through a QTSP, could issue certificates that browsers would be compelled to trust, potentially enabling state surveillance under a legal mandate.

After sustained pressure from browser vendors, security researchers, and civil society, a compromise was reached in December 2025 that preserved the ability of browser makers to apply their own security standards to QTSP certificates. This outcome restored the independent security gatekeeping role of browser vendors. However, the underlying tension remains: the EU’s desire for digital sovereignty in the certificate space creates its own centralization risks, and any mandated trust anchor becomes a potential point of failure or abuse. Europe is, justifiably, trying to reclaim the keys to its own digital borders — but doing so in a way that avoids simply replacing one concentration of power with another is genuinely difficult.

The broader lesson is that the certificate trust system is not merely a technical mechanism. It is a geopolitical instrument. Whether controlled by US companies or redirected toward EU-supervised providers, the entities who issue and validate certificates hold significant power over who can be seen as trustworthy online — and who can be effectively silenced.

European and open alternatives

At the CA level, HARICA (operated by the Greek Universities Network) and Actalis (Italian) are European Certificate Authorities whose roots are included in major browser stores. For organizations seeking to reduce dependence on US-dominated CAs, these provide viable options for obtaining TLS certificates.

For private or internal use, self-hosted certificate infrastructure using tools like step-ca (open source, developed by Smallstep) or [EJBCA][ejbca ca] (originally developed by PrimeKey, a Swedish company, now part of Keyfactor) allows organizations to operate their own CA entirely within their own jurisdiction.

For encrypted communication that sits outside the TLS certificate layer altogether, end-to-end encrypted tools based on protocols like Signal or Matrix — used by the open-source Element client, developed by Element Matrix Services, a UK/French company — provide encryption whose keys are controlled by the communicating parties, not by any Certificate Authority or cloud provider.

Examples transit point 5

Example 1: TrustCor removal (2022)

In November 2022, Mozilla and Microsoft both removed TrustCor Systems from their root certificate stores, effectively ending the company’s ability to issue trusted certificates. The decision followed a Washington Post investigation that found substantial links between TrustCor and contractors who had previously worked with the US Drug Enforcement Administration (DEA) on surveillance tools. A researcher also found evidence of corporate connections to Measurement Systems, a company that had embedded data-collection code into mobile apps under the guise of an advertising SDK.

The TrustCor case is instructive because the decision to remove the CA was made by browser vendors acting on reputational and security grounds, not by a court or government order. This demonstrates that control over the root of web trust ultimately rests with a small number of US companies. Their decisions, however well-founded in this particular case, have immediate and irreversible consequences for any website that relied on TrustCor for its certificates. A service provider with thousands of customers could find its entire user base blocked from accessing it overnight, with no judicial process and no appeal.

Example 2: The DigiNotar hack (2011)

The DigiNotar incident remains one of the most consequential security failures in internet history and a foundational case study in why certificate trust matters. DigiNotar was a Dutch Certificate Authority. In the summer of 2011, it was compromised by an attacker who used the breach to issue hundreds of fraudulent certificates, including one for google.com.

That fraudulent certificate was used to intercept the encrypted web traffic of an estimated 300,000 Iranian internet users, almost certainly by the Iranian government conducting surveillance on its own citizens. The users believed their connections to Google were secure because the padlock icon was visible in their browsers. In reality, their traffic was being read in plain text by a third party.

When the breach became public in September 2011, browser vendors swiftly revoked trust in all DigiNotar certificates. The company went bankrupt within weeks. The Dutch government, which had used DigiNotar for its own certificates, was forced to migrate its entire PKI infrastructure under emergency conditions.

While the DigiNotar case was not an example of US government overreach, it is the definitive real-world demonstration of what certificate compromise means in practice: the padlock provides no protection if the CA that issued the certificate has been compromised or coerced. And as long as the root stores that determine which CAs are trusted are controlled by a handful of US companies, the geopolitical dimension of this trust system cannot be separated from its technical one.