Data Minimization and Avoiding the Over-Retention of Personal Information

The cheapest data to protect is the data you never kept

Imagine a company we'll call Acme Loyalty Corp., which runs a grocery-store rewards program. Back in 2009, an enthusiastic marketing team decided that every scrap of customer information might someday be useful, so the database captured everything: full names, home addresses, dates of birth, the last four digits of payment cards, driver's-license numbers collected during a long-abandoned check-cashing promotion, fifteen years of itemized purchase histories, and a folder of customer-service call recordings nobody had listened to since the Obama administration. The data sat there, quietly, costing a little money in storage and attracting no attention at all—until a contractor's compromised password let an intruder copy the whole archive in an afternoon.

When the dust settled, Acme's lawyers discovered the cruelest part of the story. The driver's-license numbers, the ones that turned a routine breach into a fifty-state notification nightmare with class-action exposure, came from a promotion that ended in 2012. Acme had no business reason to keep them, no legal obligation to keep them, and no policy that said when to delete them. They were pure liability, sitting on a server, waiting to ruin a quarter. Had Acme deleted that field years earlier—as the law arguably required and as common sense certainly suggested—the breach would have been smaller, the notifications fewer, the regulators calmer, and the lawsuits weaker.

That is the entire argument for data minimization and storage limitation in a single hypothetical. Data minimization is the principle that you should collect only the personal information you genuinely need for a specified, legitimate purpose. Storage limitation (sometimes loosely called the "retention" or "deletion" principle) is its twin: you should keep that information only as long as you genuinely need it, and then get rid of it. Together they reverse the instinct of the early digital age. For two decades, the working assumption was that data is an asset, and assets should be accumulated. The modern legal and security consensus is that personal data is also a liability—a barrel of something flammable in your basement—and that the prudent organization keeps the smallest barrel it can.

This guide explains where these duties come from in U.S. and global law, why over-retention is genuinely dangerous (not just theoretically untidy), and how to build a program that keeps you on the right side of the line. We will spend real time on the General Data Protection Regulation (GDPR) Article 5, the California Consumer Privacy Act (CCPA) as overhauled by the California Privacy Rights Act (CPRA), the Federal Trade Commission's increasingly muscular enforcement posture, and the sectoral rules—GLBA, HIPAA, the FCRA—that quietly govern huge swaths of American data. By the end you should be able to look at any field in any database and ask the two questions that matter: Why do we have this? And when does it go away? If a company can answer both for every category of personal data it holds, it has already done most of the hard work. For the broader framework that this principle lives inside, see our companion guide on developing a privacy compliance program.

Two principles, one instinct: collect less, keep it shorter

It helps to separate the two ideas precisely, because lawyers and regulators treat them as distinct obligations even though they spring from the same impulse.

Data minimization operates at the front door. It asks whether you should have collected the information at all. The test is one of necessity and proportionality measured against a stated purpose: collect what is adequate, relevant, and limited to what is necessary for the purpose, and no more. A pizza-delivery app needs your delivery address. It does not need your date of birth, your gender, or permission to read your contacts list. When it asks for those anyway "to improve your experience," minimization is the principle being violated.

Storage limitation operates at the back door. Even data you were perfectly entitled to collect becomes a problem if you keep it forever. The principle requires that personal data be kept in a form permitting identification of individuals for no longer than is necessary for the purposes for which it is processed. Once the purpose is exhausted—the order is delivered, the account is closed, the warranty has expired, the statute of limitations has run—the clock on legitimate retention starts ticking, and at some point continued storage stops being "retention" and becomes "hoarding."

A related concept, purpose limitation, ties the two together. You collect data for a specified purpose; you may not casually repurpose it for something the individual never contemplated; and when the purpose is spent, so is your justification for holding the data. Purpose is the thread that runs through the entire lifecycle. If you cannot articulate the current purpose for a category of data, you have probably failed both minimization (you shouldn't have it) and storage limitation (you should delete it).

The instinct behind all of this is captured by security professionals in a blunt maxim: data you don't have can't be breached, can't be subpoenaed, can't be sold to your competitors by a departing employee, and can't be the subject of a regulator's consent decree. Minimization is, in the deepest sense, a security control. It is also a litigation control, a cost control, and—because individuals increasingly resent being surveilled—a trust and brand control. Few legal principles pay dividends in so many currencies at once.

Where the law comes from: a guided tour

There is no single American statute that says, in so many words, "thou shalt minimize." Instead, the duty emerges from a patchwork: a powerful and explicit European command, a fast-evolving set of U.S. state privacy laws led by California, an active federal enforcement agency reading minimization into general consumer-protection law, and a constellation of sector-specific rules. We'll take them in turn.

GDPR Article 5: the gold standard, stated plainly

The clearest articulation of these principles in the world is Article 5(1) of the GDPR (Regulation (EU) 2016/679), which lists the core principles relating to processing of personal data. Two of its subsections are the heart of our subject:

Article 5(1)(c)—data minimisation: personal data shall be "adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed."
Article 5(1)(e)—storage limitation: personal data shall be "kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the personal data are processed."

Article 5(1)(e) contains an important carve-out worth knowing: data may be stored for longer periods "insofar as the personal data will be processed solely for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes," subject to appropriate safeguards. That is the narrow exception that lets researchers and archivists keep records that would otherwise have to be deleted. It is not a loophole a marketing department can drive a truck through.

Two further pieces of Article 5 give these principles teeth. Article 5(1)(b) is purpose limitation. And Article 5(2) imposes accountability: the controller "shall be responsible for, and be able to demonstrate compliance with" the principles. That last word—demonstrate—is why a European regulator can show up and ask not "are you minimizing?" but "show me your retention schedule, your data map, and your deletion logs." A company that minimizes brilliantly but cannot prove it has still failed Article 5(2). The companion principle of data protection by design and by default in Article 25 reinforces minimization at the engineering level: systems must be built so that, by default, only personal data necessary for each specific purpose is processed. The default setting, in other words, must be the privacy-protective one.

The penalties give the principles their gravity. Violations of the basic processing principles in Article 5 fall into the GDPR's higher enforcement tier under Article 83(5): up to €20 million or 4% of total worldwide annual turnover, whichever is higher. Regulators have not been shy. The Irish Data Protection Commission and other European authorities have issued nine- and ten-figure fines against large platforms, and storage-limitation failures—keeping data long after the purpose ended—are a recurring theme in enforcement decisions across the EU. American companies sometimes assume the GDPR is "Europe's problem," but its extraterritorial reach under Article 3 captures any business that offers goods or services to, or monitors the behavior of, people in the EU. If you have EU customers or website visitors, Article 5 is your problem too. The mechanics of moving that data across the Atlantic raise their own minimization-adjacent questions, which we cover in international data transfers after Schrems II.

The CCPA and CPRA: California writes minimization into American law

For years, U.S. privacy law had nothing like Article 5(1)(c) on the books. The original 2018 CCPA was largely about transparency and choice—telling consumers what you collect and letting them opt out of "sales." It did not contain a freestanding minimization mandate. That changed with the California Privacy Rights Act of 2020 (CPRA), a ballot initiative (Proposition 24) that amended and expanded the CCPA, with its substantive operative provisions taking effect January 1, 2023.

The CPRA added genuine minimization and retention duties to California law. Cal. Civ. Code § 1798.100(c) now provides that a business's collection, use, retention, and sharing of a consumer's personal information "shall be reasonably necessary and proportionate to achieve the purposes for which the personal information was collected or processed," and shall not be further processed in a manner incompatible with those purposes. That is, in substance, minimization and purpose limitation imported into the California Civil Code.

The CPRA also created an explicit retention-disclosure duty. Under § 1798.100(a), at or before the point of collection, a business must inform consumers of the length of time it intends to retain each category of personal information, or—if that is not possible—the criteria used to determine that period. The implementing regulations from the California Privacy Protection Agency (CPPA) at 11 C.C.R. § 7002 flesh this out: a business may not retain personal information for longer than reasonably necessary for the disclosed purpose, and must apply a minimization and proportionality analysis to each purpose of collection. In plain English, California now requires you to publish a retention schedule (at least in summary form) and to actually follow it. "We keep everything indefinitely" is no longer a lawful answer in California.

It is worth pausing on the regulatory engine behind these rules. The CPRA created the CPPA, the first dedicated privacy regulator in the United States, with rulemaking and enforcement authority. The CCPA's enforcement also lives with the California Attorney General, and the statute carries civil penalties of up to $2,500 per violation and $7,500 per intentional violation (or violation involving a minor's data) under § 1798.155, plus a limited private right of action for certain data breaches resulting from a failure to maintain reasonable security under § 1798.150. The connection between that breach right of action and our subject is direct: every extra record you retain is one more record a plaintiff can point to as exposed. Hoarding doesn't just risk a minimization claim; it inflates the damages on every other claim.

California is not alone. A wave of "second-generation" state privacy laws—Virginia's VCDPA, Colorado's CPA, Connecticut's CTDPA, Utah, Texas, Oregon, Montana, and a growing roster of others—broadly track the same framework and almost all contain an express data-minimization requirement (typically phrased, like Virginia's, as a limit to "what is adequate, relevant and reasonably necessary" for the disclosed purposes). The details vary, and the list grows every legislative session, so any compliance program has to treat this as a moving target rather than a fixed map. For the operational nuts and bolts of handling consumer deletion requests across this patchwork, and for app-specific obligations, see legal issues for mobile applications—privacy.

The FTC: minimization by enforcement

The Federal Trade Commission has no general "privacy statute" the way Europe does. What it has is Section 5 of the FTC Act (15 U.S.C. § 45), which prohibits "unfair or deceptive acts or practices in or affecting commerce." For more than two decades the FTC has used Section 5 to build a kind of federal common law of privacy and data security, one consent decree at a time—and in the past several years it has leaned heavily into minimization and retention.

The deception prong catches companies that say one thing and do another: if your privacy policy promises you delete data after ninety days, and you keep it for nine years, that broken promise is a deceptive practice. The unfairness prong is broader and, for our purposes, more interesting. An act is "unfair" under Section 5(n) if it causes or is likely to cause substantial injury to consumers that is not reasonably avoidable and not outweighed by countervailing benefits. The FTC has increasingly argued that collecting more data than you need, or keeping it longer than you need, is itself an unfair practice, because it creates substantial, unavoidable risk of harm (a bigger breach surface, secondary uses consumers never agreed to) with little offsetting benefit to the consumer.

This is not abstract. In recent years the Commission has made minimization a centerpiece of its orders and public statements. Its enforcement actions involving sensitive geolocation data brokers, health data, and children's information have repeatedly imposed not just deletion of unlawfully collected data but forward-looking data-minimization and retention requirements—caps on how long companies may keep certain categories, mandates to delete data when a stated purpose ends, and obligations to publish retention schedules. The FTC's enforcement of the Children's Online Privacy Protection Act (COPPA) has long included a data-retention provision (the COPPA Rule requires operators to retain children's personal information only as long as reasonably necessary and to delete it securely thereafter), and the FTC's 2025 amendments to the COPPA Rule strengthened those retention and minimization expectations, including a requirement to maintain a written data-retention policy for children's data.

The throughline of the FTC's modern posture, repeated in speeches, blog posts, and complaints, is a simple warning: companies should not collect data they don't need and should not keep data they no longer need, and "we might monetize it later" is not a lawful purpose. Because FTC orders typically run for twenty years and bind the company (and often named executives), an FTC consent decree is one of the most durable and expensive consequences in American privacy law. Building minimization into your operations now is far cheaper than litigating an unfairness theory later. The security failures that draw FTC scrutiny often overlap with breach response and trade-secret loss; our guide on cybersecurity incident response and IP protection explores that intersection.

The sectoral overlay: GLBA, HIPAA, and the FCRA

Layered on top of these general regimes are industry-specific statutes that impose their own minimization-flavored and retention-flavored duties. Three matter to almost everyone.

The Gramm-Leach-Bliley Act (GLBA) governs "financial institutions"—a category the FTC and banking regulators read broadly to include not just banks but mortgage brokers, auto dealers that arrange financing, tax preparers, and more. The GLBA's Safeguards Rule (16 C.F.R. Part 314), substantially amended effective 2023, requires covered businesses to implement a written information-security program. The amended Rule expressly directs institutions to dispose of customer information securely and—critically—to do so "no later than two years after the last date the information is used in connection with the provision of a product or service to the customer," unless retention is necessary for a legitimate business purpose or required by law. That is a statutory storage-limitation default with a number attached to it: two years after last use. It is one of the most concrete retention commands in U.S. law.

The Health Insurance Portability and Accountability Act (HIPAA) is often misunderstood on this point. HIPAA's Privacy Rule does not set a flat retention period for the medical records themselves (state law usually governs that), but it builds minimization into the bloodstream through the "minimum necessary" standard at 45 C.F.R. § 164.502(b): a covered entity must make reasonable efforts to limit protected health information (PHI) to the minimum necessary to accomplish the intended purpose of a use, disclosure, or request. HIPAA does impose a hard six-year retention requirement on certain compliance documentation (policies, authorizations, accountings) under 45 C.F.R. § 164.530(j)—a useful reminder that retention obligations cut both ways: some data you must delete, and some you must keep. The cloud-computing dimension of HIPAA, including how business-associate agreements allocate these duties, is covered in HIPAA business associates and cloud computing.

The Fair Credit Reporting Act (FCRA) regulates consumer-reporting agencies and the users and furnishers of consumer reports. It contains its own minimization-adjacent and disposal rules. The companion Disposal Rule under FCRA (the FTC's rule at 16 C.F.R. Part 682, plus parallel banking-agency rules) requires anyone who possesses consumer report information to take reasonable measures to protect against unauthorized access when disposing of it—shredding paper, wiping drives, the works. FCRA also sets outer limits on how long obsolete negative information may appear in consumer reports (generally seven years, ten for bankruptcies, under 15 U.S.C. § 1681c), a different but related kind of storage limitation enforced for the benefit of consumers.

Other sectoral schedules abound—the SEC and FINRA require broker-dealers to keep certain records for set periods; tax law requires retention of supporting documents; OSHA, the IRS, and dozens of agencies publish their own schedules. The point is that a real-world retention policy is rarely a single number. It is a matrix: each category of data carries its own minimum (you must keep it this long for legal or business reasons) and its own maximum (you should delete it once the longest applicable obligation lapses). Good retention management lives in the gap between those two.

Why over-retention actually hurts: the four costs

It is easy to nod along at the principle and still do nothing, because the cost of keeping data feels like zero—storage is cheap and deletion takes effort. That intuition is wrong. Over-retention imposes four distinct and substantial costs, and naming them is the best way to motivate a program.

First, breach amplification. The size and severity of a data breach is a direct function of how much sensitive data you were holding when the intruder arrived. This is the lesson of our Acme hypothetical, and it plays out in the real world constantly. Every additional category of data multiplies the harm: a breach of email addresses is annoying; a breach of email addresses plus Social Security numbers plus health information plus geolocation history is a catastrophe with statutory-notification, identity-theft, and class-action consequences. Minimization is breach insurance you don't have to buy—it shrinks the blast radius before the bomb goes off. Regulators understand this, which is why "you collected and retained far more than you needed" is now a standard allegation in post-breach enforcement.

Second, regulatory exposure. As we've seen, over-retention is itself a violation under the GDPR, the CPRA, and the FTC's unfairness theory, independent of any breach. A company can be perfectly secure and still be fined for keeping data it had no continuing need to keep. And when a breach does occur, the retained-but-unneeded data converts a security problem into a compliance problem: every record you shouldn't have had is a separate aggravating fact.

Third, litigation and e-discovery cost. Here is the dimension lawyers feel most viscerally. In American civil litigation, a party generally must preserve and produce relevant information within its "possession, custody, or control." The more data you keep, the more there is to preserve once litigation is reasonably anticipated (triggering a litigation hold), the more there is to collect, review, and produce, and the more there is for an adversary to comb through for a damaging email. E-discovery in a document-heavy case can cost millions, and a great deal of that cost is spent on data the company kept out of pure inertia. A defensible deletion program—one that routinely and consistently disposes of data that has outlived its purpose, under a documented policy applied in good faith—directly reduces this burden. The key word is "defensible": deletion done before a duty to preserve arises, under a neutral, consistently-applied schedule, is lawful and prudent; deletion done after you anticipate litigation, to make evidence disappear, is spoliation and can draw severe sanctions under Federal Rule of Civil Procedure 37(e), up to an adverse-inference instruction or default judgment. The discipline of a retention schedule is precisely what lets you delete confidently in peacetime and stop deleting the moment a hold attaches. For the mechanics of preservation and production once a dispute begins, see a comprehensive guide to federal civil litigation for small businesses and our discovery refresher.

Fourth, operational drag and cost. Data is not actually free to keep. Storage, backups, indexing, access controls, monitoring, and—when a regulator or litigant comes calling—search and review all cost money and engineering time. Vast, undifferentiated data lakes also degrade the quality of the data you do need, burying signal in noise and making analytics slower and less reliable. Minimization is, among other things, good information hygiene. Many organizations discover, when they finally map their data, that they are paying to store and secure terabytes that serve no purpose at all.

Put the four together and the calculus inverts. Keeping data you don't need is not the safe, default, do-nothing choice. It is an affirmative decision to take on breach risk, regulatory risk, litigation cost, and operational expense in exchange for a benefit—"we might use it someday"—that almost never materializes.

Building a real program: from data map to defensible deletion

Principles are easy to state and hard to operationalize. A workable minimization-and-retention program has a recognizable architecture. The steps below are sequential in logic even though, in practice, you'll iterate. Treat this as the spine of a program that the broader privacy compliance program guide puts flesh on.

Step one: map your data (you cannot minimize what you cannot see)

Every program begins with a data inventory or data map: a documented understanding of what personal data you hold, where it lives, where it came from, why you have it, who can access it, where it flows (including to vendors and across borders), and—critically—how long you currently keep it. This is unglamorous, often surprising, and absolutely foundational. Organizations routinely find data they forgot they had: an old CRM export on a shared drive, a defunct app's user database, log files capturing more than anyone intended, backups going back a decade. You cannot apply a retention schedule to data you don't know exists.

A good data map ties each data category to a purpose and a legal basis (consent, contract, legitimate interest, legal obligation—the GDPR's Article 6 framework is a useful organizing lens even for U.S.-only data). The moment you write "purpose: unknown / legacy" next to a category, you've found a minimization candidate. The data map is also the artifact regulators ask for first and the document that makes everything downstream—DPIAs, breach response, deletion-request fulfillment—dramatically faster.

Step two: classify the data by sensitivity

Not all personal data is equal. Data classification sorts your inventory into tiers—say, public, internal, confidential, and restricted/sensitive—so that the most protective rules and the shortest retention periods apply to the most dangerous data. Sensitive personal information (a defined category under the CPRA and the other state laws, covering things like Social Security numbers, precise geolocation, biometric and health data, and account credentials) deserves the harshest scrutiny: collect it only when truly necessary, minimize who can touch it, and delete it aggressively. Biometric data carries its own statutory regime in several states (Illinois's BIPA being the famous and litigious example), which is why biometrics warrant special handling; we explore that in biometric data privacy laws and their impact on AI development. Classification is what lets you say, credibly, "we keep marketing-list emails for two years but purge raw geolocation logs in thirty days"—a differentiated, defensible posture rather than a single blunt rule.

Step three: write a retention schedule and actually follow it

The data retention schedule is the heart of the program. It is a table—usually maintained by category and system—that states, for each type of data: the retention period or the criteria for determining it; the legal or business justification for that period; the trigger that starts the clock (account closure, last transaction, end of warranty, end of the limitations period for likely claims); and the disposition method at the end (secure deletion, anonymization, archival).

Designing the periods is where law meets judgment. For each category you ask: What is the longest applicable legal minimum? (tax records, employment records, GLBA's two-year-after-last-use rule, HIPAA's six-year documentation rule, securities rules, statutes of limitations for foreseeable disputes). What is the legitimate business need? And then you set the retention period at the longest defensible minimum—no longer. The CPRA pushes you to do this category-by-category and to disclose the result; the GDPR requires you to justify and document it; the FTC will fault you for not having one. A schedule you wrote and ignore is worse than no schedule, because it's a written admission of what you should have been doing. The discipline of writing one also surfaces hard questions—"why are we keeping closed-account data for eleven years?"—that are healthy to confront. A retention schedule belongs in your broader governance documentation alongside other foundational policies; see how to write an employee handbook for how retention rules interact with employment records.

Step four: build deletion that actually deletes

A retention schedule is only as good as the deletion process that enforces it. This is harder than it sounds. Personal data has a way of replicating—into backups, caches, analytics warehouses, vendor systems, exported spreadsheets, and email attachments. A deletion process that wipes the production database but leaves perfect copies in five backup sets and a marketing platform has not really deleted anything. A mature process therefore:

Automates deletion wherever possible, so records purge on schedule without depending on a human remembering;
Reaches all copies, including backups (often via shorter backup-rotation cycles or documented backup-deletion practices), logs, and downstream systems;
Flows down to vendors through contractual deletion obligations in data-processing agreements, so that when you delete, your processors delete too;
Deletes securely, so that "deleted" data is genuinely unrecoverable (cryptographic erasure, secure overwriting, or physical destruction for media), satisfying disposal rules like the FCRA Disposal Rule and the GLBA Safeguards Rule; and
Suspends for legal holds, integrating with litigation-hold processes so that automatic deletion pauses for data subject to a preservation duty the instant litigation is reasonably anticipated.

That last bullet is the crucial safety valve that makes routine deletion defensible. The combination—delete relentlessly by default, but stop cleanly when a duty to preserve attaches—is what distinguishes prudent records management from obstruction of justice.

Step five: prefer de-identification and anonymization where you can

Sometimes you genuinely need the information in data long after you stop needing to identify the person. A retailer may want to analyze purchasing trends for a decade without needing to know that customer #88213 is Jane Doe. The answer is de-identification, pseudonymization, or anonymization—techniques that sever or obscure the link between data and an identifiable individual.

The legal distinctions matter. Under the GDPR, pseudonymized data (where identifiers are replaced with a key held separately) is still personal data, because re-identification remains possible; it's a security measure, not an escape hatch. Anonymized data—data so thoroughly stripped of identifiers that individuals can no longer reasonably be re-identified—falls outside the GDPR entirely, which is precisely why true anonymization is hard and easy to get wrong. Under the CCPA/CPRA, properly deidentified or aggregated information is exempt, but the statute imposes conditions: the business must take reasonable measures to prevent re-identification, publicly commit not to re-identify, and contractually bind recipients to the same. HIPAA has its own rigorous de-identification standard at 45 C.F.R. § 164.514, offering two safe harbors—the "Expert Determination" method and the "Safe Harbor" method that strips eighteen specified identifiers. The practical takeaway: de-identification is a powerful way to honor storage limitation while keeping analytical value, but only if it's done to the relevant legal standard. Sloppy "anonymization" that can be reversed with a little effort gives you the worst of both worlds—the liability of personal data and the false comfort of thinking you've escaped it.

Step six: design for minimization from the start (privacy by design)

The cheapest minimization happens before any data is collected. Privacy by design and by default—a phrase the GDPR codifies in Article 25 and that the FTC and state regulators echo—means engineering systems so that the privacy-protective configuration is the standard one, and so that you collect only what each feature genuinely needs. In practice this looks like: forms that don't ask for fields the business case doesn't require; defaults set to the least data-hungry option; collection scoped to purpose at the API and schema level; short default log-retention; and a habit, baked into product reviews, of asking "do we actually need this field?" before shipping. A Data Protection Impact Assessment (DPIA) or privacy review at the design stage of any high-risk processing is the structured way to force that question. Building minimization in at design time is vastly cheaper than retrofitting it onto a system already swollen with data—and it's the part of the program engineers tend to find most reasonable, because "collect less, store less" is also good engineering.

Step seven: govern, document, and audit

Finally, the program needs an owner and a paper trail. Assign accountability—a privacy officer, a data-governance committee, or both—so that the retention schedule is reviewed and updated as laws and business needs change (which is constantly). Document decisions: why each retention period was chosen, when deletions ran, how de-identification was validated. This documentation is what satisfies the GDPR's accountability principle, what you hand a regulator, and what demonstrates good faith if a deletion is ever challenged. And audit periodically: spot-check that data is actually purging on schedule, that vendors are honoring deletion obligations, and that no new "shadow" data stores have sprung up outside the inventory. A program that is written once and never revisited rots; one that is reviewed annually stays alive. Frameworks like the NIST Privacy Framework and NIST Special Publication 800-88 (on media sanitization) give you off-the-shelf scaffolding for the governance and the secure-disposal pieces, respectively.

A worked example: minimizing Acme, the right way

Let's return to Acme Loyalty Corp. and rebuild it as it should have been, to see the principles in motion.

Acme maps its data and discovers it holds eleven categories of customer information across four systems. For each, it asks the two questions—why do we have this, and when does it go away?

The loyalty profile (name, email, rewards balance) has an obvious, ongoing purpose, so it's retained for the life of the active account plus a short tail (say, two years of inactivity) before the account is deemed dormant and purged. The purchase history is useful for personalization but only recently; Acme decides to keep identified transaction data for twenty-four months and then de-identify it for long-term trend analysis, preserving analytical value while discharging storage limitation. The payment-card data is minimized at the source—Acme tokenizes through its processor and never stores full card numbers, so there's nothing to breach. The driver's-license numbers from the defunct 2012 check-cashing promotion are flagged with "purpose: expired"; they are securely deleted, and the collection point is removed so they're never gathered again. The dusty call recordings are placed on a ninety-day retention cycle going forward, with the historical backlog deleted under the new policy. Each decision is recorded in the retention schedule with its justification; deletions are automated; the schedule integrates with Acme's litigation-hold system so that any category subject to a preservation duty stops auto-deleting.

When an intruder later compromises a contractor's password—because intruders always eventually do—the archive they copy contains active loyalty profiles and twenty-four months of purchase data, secured and tokenized, and nothing else. No license numbers, no payment cards, no fifteen-year archive. The breach is real but survivable: smaller notification, weaker class action, calmer regulator, intact brand. Acme didn't get lucky. It got minimal. That is the entire point.

Special contexts worth a flag

A few settings deserve specific mention because they sharpen the minimization analysis.

Employee and HR data. Employers hold deeply sensitive data—Social Security numbers, health and disability information, performance records, sometimes biometric clock-in data—and are subject to overlapping retention rules (EEOC, FLSA, ERISA, OSHA, immigration I-9 rules, and, in California, the CPRA's now-full application to employee data after the prior exemption sunset in 2023). The temptation to keep everything "for HR reasons" is strong and usually unjustified. Minimize at hiring (don't collect what the role doesn't require), and set deletion clocks tied to the longest applicable employment-law retention minimum.

Mobile apps and SDKs. Apps are notorious over-collectors, often because embedded third-party SDKs vacuum up data the developer never even sees. Minimization here means auditing every SDK and permission and asking whether the app truly needs location, contacts, or device identifiers. Platform rules (Apple's App Tracking Transparency, Google Play's data-safety labels) increasingly enforce minimization from the store side. Our dedicated guides on legal issues for mobile applications—privacy and legal considerations after developing a mobile app go deep here.

AI and machine learning. Training data is the new frontier of the minimization debate. Models trained on vast personal datasets raise hard questions about purpose limitation (was the data collected for this?), storage limitation (does a trained model "retain" the data?), and de-identification at scale. Regulators are actively scrutinizing whether AI development can be squared with minimization, and the answer is unsettled and fast-moving. The intersection with biometric and other sensitive data is especially fraught—see biometric data privacy laws and their impact on AI development.

Backups and the "delete everywhere" problem. As noted, backups are where deletion promises go to die. The pragmatic answer most regulators accept: maintain reasonable backup-rotation cycles so that deleted data ages out of backups within a defined window, document the practice, and ensure that data restored from a backup re-applies the deletion. Perfect instantaneous deletion across every backup is rarely required; a reasonable, documented, consistently-applied approach is.

Key takeaways

If you remember nothing else, remember the two questions and the four costs. For every category of personal data your organization holds, you should be able to answer why do we have this? and when does it go away? And you should keep those answers honest by remembering what over-retention actually costs: bigger breaches, regulatory fines, crushing e-discovery, and operational drag.

The legal foundations are converging. The GDPR states the principles cleanly in Article 5(1)(c) and (e) and backs them with accountability and enormous fines. California, through the CPRA's amendments to the CCPA, has imported genuine minimization and retention-disclosure duties into American law, and a growing roster of states is following. The FTC is reading minimization into Section 5 unfairness and writing retention caps into its orders. And the sectoral laws—GLBA's two-year-after-last-use disposal rule, HIPAA's minimum-necessary standard, the FCRA Disposal Rule—fill in the industry-specific gaps. The direction of travel is unmistakable and one-way: collect less, keep it shorter, prove it.

The program that satisfies all of this is not exotic. Map your data; classify it; write a retention schedule grounded in real legal minimums; build deletion that reaches every copy and pauses for legal holds; prefer de-identification where you can; design for minimization from the start; and govern, document, and audit the whole thing. Do that, and the data you never collected—and the data you safely deleted—will quietly protect you from problems you'll never have to face.

Frequently asked questions

Is data minimization legally required in the United States, or just a best practice? Both, depending on where you operate and what data you hold. There is no single federal minimization statute, but California's CPRA imposes an express "reasonably necessary and proportionate" minimization standard and a retention-disclosure duty, most other state privacy laws contain similar requirements, the FTC treats unnecessary collection and over-retention as potential unfairness under Section 5, and sectoral laws (GLBA, HIPAA, FCRA, COPPA) impose minimization- and disposal-flavored duties on covered businesses. If you serve EU residents, GDPR Article 5 makes minimization and storage limitation hard legal obligations with fines up to 4% of global turnover. For most real businesses, minimization is required by something.

How long should we keep personal data? There is no universal number. The right period is the longest applicable legal minimum—and no longer. For each category, identify any legal retention mandates (tax, employment, securities, GLBA's two-years-after-last-use, HIPAA's six-year documentation rule, statutes of limitations for likely disputes), weigh the legitimate business need, set the period at the longest of those, document the justification, and delete on schedule. The CPRA additionally requires you to disclose the retention period (or the criteria) for each category at collection.

What's the difference between data minimization and storage limitation? Minimization is about the front door—collecting only the data you need for a specified purpose. Storage limitation is about the back door—keeping that data only as long as you need it, then deleting it. Purpose limitation ties them together: when the purpose ends, so does your justification for both having and keeping the data.

Can we just anonymize data instead of deleting it? Often, yes—if it's done right. Properly anonymized data (so individuals cannot reasonably be re-identified) falls outside the GDPR, and properly deidentified or aggregated data is exempt under the CCPA/CPRA, provided you take reasonable measures against re-identification, commit not to re-identify, and bind recipients contractually. HIPAA has its own de-identification standard with two safe harbors. The catch is that weak "anonymization" that can be reversed gives you the liability of personal data and a false sense of safety. Anonymize to the relevant legal standard or treat the data as still-personal.

Won't deleting data hurt us in litigation? The opposite, if you do it right. Routine deletion under a neutral, consistently-applied retention schedule before any duty to preserve arises is lawful and reduces e-discovery burden. The danger is deleting after litigation is reasonably anticipated, which is spoliation and can draw severe sanctions under Federal Rule of Civil Procedure 37(e). The solution is a retention schedule that integrates with litigation holds: delete relentlessly in peacetime, and stop cleanly the moment a preservation duty attaches.

Does the GDPR really apply to a U.S. company? It can. Under Article 3, the GDPR reaches any business that offers goods or services to people in the EU or monitors their behavior, regardless of where the business is located. If you have EU customers, take orders from EU residents, or track EU website visitors, Article 5's minimization and storage-limitation principles apply to you, and so does the up-to-4%-of-global-turnover fine schedule.

Who in our organization should own this? Assign clear accountability—a privacy officer, a data-governance committee, or both. The owner maintains the data map and retention schedule, coordinates with legal on litigation holds, manages vendor deletion obligations, and ensures periodic audits. The GDPR's accountability principle (Article 5(2)) and the FTC's expectations both assume a responsible human and a documented program; "everyone and no one owns it" is how data hoards form.

This article is general information, not legal advice. Privacy and data-security law varies by jurisdiction and changes rapidly; consult qualified counsel about your specific circumstances before acting.