A trademark trial often turns on a single number. Sixty-one percent of cell-phone shoppers associate the color magenta with one company. Fifty-four percent of women shown two "Delicious" products think they came from the same maker. Thirty-eight percent of jacket buyers are confused about source. That number—produced by a consumer survey, dressed in the lab coat of social science, and delivered by an expert with a CV three pages deep—can win or lose the case. It speaks directly to the question that decides most infringement suits: are ordinary buyers likely to be confused?
And precisely because that number is so persuasive, the fiercest fight in the litigation is usually not whether the jury should believe it. It is whether the jury should ever hear it. That fight has a name—the Daubert challenge—and it is fought before trial, on paper, in front of the judge. Win it, and the survey vanishes from the case, sometimes taking the plaintiff's proof of confusion with it. Lose it, and a damaging number reaches the jury wearing the imprimatur of an admitted expert. Few motions in a trademark case carry more leverage, and few are more misunderstood by litigants who assume that "it's just a survey, let the jury weigh it" is a complete answer. It is not.
This guide explains the gatekeeping framework that governs survey, damages, and technical experts; what separates a reliable trademark survey from a worthless one; the recurring battle over whether a survey's flaws are grounds for exclusion or merely fodder for cross-examination; and how the courts of the Second Circuit and the Eastern District of New York actually decide these motions. It is written for lawyers and non-lawyers alike, and it takes a neutral, practitioner's-eye view—useful whether you are the party attacking a survey or the party defending one. Surveys feed directly into the substantive analysis we cover in our piece on the Polaroid factors on summary judgment in the Second Circuit and into the recovery questions in damages apportionment in trademark cases. Here we focus on the evidentiary gateway those experts must clear.
Why Surveys Matter So Much in the First Place
Start with the puzzle a trademark case has to solve. The central test of infringement—likelihood of confusion—is not a fact you can photograph or a document you can subpoena. It lives inside the heads of consumers. The Polaroid factors and their sister tests in other circuits give courts a structured way to infer confusion from circumstantial proxies (mark similarity, proximity of goods, the senior mark's strength, the defendant's intent, and so on). A survey skips the inference. It walks into the marketplace, asks the relevant buyers what they think, and reports the answer as a percentage. Done well, it is the closest thing trademark law has to direct evidence of the ultimate question.
Surveys do more than measure confusion. A trademark owner may use one to prove that a descriptive mark has acquired distinctiveness—secondary meaning—so that it qualifies for protection at all. See T-Mobile US, Inc. v. Aio Wireless LLC, 991 F. Supp. 2d 888 (S.D. Tex. 2014) (crediting a double-blind survey showing that more than half of consumers associated the color magenta with a single wireless carrier). Defendants use surveys offensively too: to show that confusion is unlikely, or that the plaintiff's claimed mark is actually a generic term that no one may own. See King-Seeley Thermos Co. v. Aladdin Industries, Inc., 321 F.2d 577 (2d Cir. 1963) (the THERMOS genericness saga). Surveys prove fame and dilution. They appear in TTAB oppositions and cancellations and even in trademark prosecution. Wherever the law asks "what do consumers think?", a survey purports to answer.
That power cuts both ways. Courts have warned for decades that surveys are easy to manipulate—a small change in question wording, a tweak to the sample, a missing control, and the "confusion" number can be conjured out of nothing. The whole point of the gatekeeping inquiry is to separate surveys that genuinely measure marketplace perception from surveys engineered to produce a litigation-friendly result. The rest of this article is about how that sorting is done.
The Gatekeeping Framework: Daubert, the Trilogy, and Rule 702
The admissibility of expert testimony in federal court is governed by Federal Rule of Evidence 702 and the line of Supreme Court cases interpreting it. The foundational decision is Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993), which displaced the old "general acceptance" test of Frye v. United States, 293 F. 1013 (D.C. Cir. 1923), and assigned trial judges a gatekeeping role. Before expert testimony reaches the jury, the judge must be satisfied that it is both reliable (grounded in sound methodology) and relevant—that it "fits" the issues in the case. Daubert offered a flexible, non-exclusive list of reliability factors for scientific testimony: whether the theory or technique can be and has been tested; whether it has been subjected to peer review and publication; its known or potential rate of error; the existence and maintenance of standards controlling its operation; and its general acceptance in the relevant field. Id. at 593–94. The factors are a menu, not a checklist.
Two companion cases complete what practitioners call the Daubert trilogy. In General Electric Co. v. Joiner, 522 U.S. 136 (1997), the Court held that admissibility rulings are reviewed only for abuse of discretion and—crucially for survey and damages experts—that a court may exclude an opinion when there is "too great an analytical gap between the data and the opinion proffered." Id. at 146. An expert may not simply announce a conclusion and expect the gap to be filled by his own authority; the Court coined the memorable rebuke that "nothing in either Daubert or the Federal Rules of Evidence requires a district court to admit opinion evidence that is connected to existing data only by the ipse dixit of the expert." The methodology must actually support the conclusion. Then, in Kumho Tire Co. v. Carmichael, 526 U.S. 137 (1999), the Court extended the gatekeeping obligation to all expert testimony, not just the "scientific," and gave trial judges latitude in deciding how to test reliability for a given discipline. Kumho is the doctrinal bridge that pulls consumer surveys—which are social science, not bench science—squarely inside the gatekeeping regime, and we return to it below.
Rule 702 itself was significantly amended effective December 1, 2023, and the amendment is not cosmetic. It rewrote the rule to clarify two things that had drifted in practice. First, the proponent of expert testimony must now demonstrate to the court that it is more likely than not—a preponderance—that each admissibility requirement is satisfied. That language reaffirms, in the text of the rule, that the burden sits squarely on the party offering the expert. Second, the amendment makes explicit that the requirement that an opinion "reflect a reliable application of the principles and methods to the facts of the case" (Rule 702(d)) is an admissibility question for the judge, not merely a matter of weight for the jury. The Advisory Committee Note candidly explains why: many courts had been admitting shaky expert testimony on the reflexive theory that reliability "goes to the weight, not the admissibility," and the Committee found that this practice misstated the law. The amendment was designed to correct the drift and restore real gatekeeping. For anyone challenging a survey, that is a meaningful shift in the wind—the reflexive "goes to weight" rejoinder is harder to invoke for serious methodological defects than it was a decade ago.
The authoritative yardstick for evaluating a survey's reliability is the Reference Manual on Scientific Evidence, published by the Federal Judicial Center and the National Academies. Its "Reference Guide on Survey Research"—authored by Shari Seidman Diamond and reproduced at pages 359 and following of the third edition—sets out the accepted criteria for a valid survey, and courts and experts alike treat it as the standard. Any serious Daubert analysis of a survey runs through Diamond's criteria. The leading treatise, McCarthy on Trademarks and Unfair Competition §§ 32:158 et seq., collects the survey case law and is its companion. Keep both names in mind: when you see a court "evaluate" a survey, what it is usually doing is walking the survey through the Reference Guide and the case law, criterion by criterion.
Why Kumho Tire Is the Whole Ballgame for Surveys
A natural objection to applying Daubert to consumer surveys is that Daubert's original factors—testability, error rate, peer review—were written for laboratory science, and a survey of shoppers is not a chemistry experiment. There is no "error rate" for a well-worded question in the way there is for a blood-alcohol assay. So does Daubert even apply?
Kumho Tire answers that objection directly, and the answer governs everything that follows. The Court held that the trial judge's gatekeeping obligation extends to all expert testimony, whether based on scientific, technical, or other specialized knowledge. There is no carve-out for "soft" expertise; a survey expert is no more exempt than a tire-failure analyst (the expert at issue in Kumho itself) or a toxicologist. At the same time, Kumho recognized that the specific Daubert factors are not a rigid template to be mechanically applied to every field. Some factors fit some disciplines and not others, and the judge has "considerable leeway" to decide which reliability criteria are reasonable measures for the particular testimony at issue. Id. at 152. The gatekeeping obligation is constant; the tools for assessing reliability vary with the field.
For consumer surveys, this means two precise things. First, surveys are firmly inside the gatekeeping regime—the "it's just a survey, let the jury weigh it" argument does not exempt survey testimony from scrutiny, any more than calling something "engineering experience" exempted the tire expert in Kumho. Second, because surveys are a recognized social-science methodology with its own mature professional standards, the reliability inquiry is measured against those standards—the criteria collected in the Reference Guide and the trademark survey case law—rather than against the literal Daubert science factors. The operative question is whether the survey was designed and executed in accordance with accepted principles of survey research. Kumho thus does not dilute the scrutiny of surveys; it points it at the right target. A survey that departs materially from accepted survey-research practice is unreliable under Kumho just as surely as junk science is unreliable under Daubert. The discipline changes; the gatekeeper does not stand down.
What a Valid Trademark Survey Requires
You cannot understand how surveys are attacked until you understand what a sound one looks like, because every attack is, at bottom, an allegation that one of these requirements was violated. Drawing on the Reference Guide and the case law, a properly conducted trademark survey generally must satisfy a familiar set of criteria. The Manual for Complex Litigation (Fourth) § 11.493 distills them: a survey should properly define the target population, draw a representative sample from it, ask clear and non-leading questions, be conducted by a person with sufficient expertise using reliable procedures, report the data accurately, analyze it according to accepted statistical principles, and maintain objectivity throughout.
The proper universe. The survey must define and sample the population whose perceptions are legally relevant. In a typical forward-confusion case, that universe is the prospective purchasers of the junior user's (the defendant's) goods or services, because the legal question is whether those buyers are likely to be confused about source. Survey the wrong population—the senior user's existing customers, or the general public when the goods are specialized—and the results do not measure the relevant confusion, no matter how pristine the rest of the survey is. (Reverse-confusion cases flip the framing, as discussed below, but the discipline is the same: identify the buyers whose confusion the law cares about, then sample them.) From the correct universe, the survey must draw a representative sample of adequate size, so that the results can fairly be generalized to the population. A sample can be overinclusive (sweeping in people outside the target market) or underinclusive (excluding people who belong in it); both distort the result.
Non-leading questions and real-world stimuli. The survey must ask clear questions that do not suggest the answer or prime respondents to perceive a connection they would not perceive in the marketplace. Leading questions and "demand effects"—cues that telegraph what the surveyor wants to hear—are among the most common and most fatal defects. Equally important, the survey must replicate, as nearly as practicable, real marketplace conditions. The stimulus shown to respondents should resemble how consumers actually encounter the marks—with house brands, packaging, pricing context, and the clutter of competing products—rather than an artificial side-by-side comparison on a blank screen, which manufactures the very confusion it claims to detect.
A control. A sound survey isolates the confusion attributable to the challenged mark from background "noise"—guessing, pre-existing assumptions, and survey artifacts—by including a control group or control cell. The control is a stimulus identical to the test stimulus except that the accused element is swapped out. The difference between the test cell's confusion rate and the control cell's "noise" rate is the net confusion the survey actually measured. Without a control, a reported confusion figure is, in the view of many courts and survey scholars, uninterpretable: a 38% raw number tells you nothing if you do not know that 30% of respondents would have guessed "same company" no matter what they were shown.
Clean administration and coding. Finally, the survey should be administered properly—ideally double-blind, so that neither the interviewer nor the respondent knows the survey's purpose or sponsor—and the data must be accurately recorded, coded, and analyzed, with open-ended responses interpreted objectively rather than massaged toward the desired result. Modern online surveys add their own quality-control demands: trap questions and timing checks to confirm that respondents actually read and engaged with the survey rather than clicking through. Each of these requirements is a potential point of failure, and each is therefore a potential ground for a Daubert challenge. For a deeper treatment of how to build a defensible survey from scratch, see our companion guide on consumer survey expert methodology in trademark cases.
A Word on Formats: Eveready, Squirt, and the Aided Hybrid
Two survey formats dominate confusion litigation, and knowing the difference is essential to both attacking and defending a survey, because using the wrong format for the marketplace at issue is itself a ground for challenge.
The Eveready format takes its name from Union Carbide Corp. v. Ever-Ready Inc., 531 F.2d 366, 385–88 (7th Cir. 1976). Its defining feature is that it does not show respondents the senior user's mark first. The respondent is shown only the defendant's (junior) mark and asked, in effect, who puts out this product, and why they say so. Eveready is the gold standard for strong, well-known marks: if a mark is famous, a confused consumer should be able to name the senior user spontaneously, without being prompted, so the format tests genuine top-of-mind association rather than suggestion. It is the format least vulnerable to the charge of leading the witness, because nothing in the survey points the respondent toward the plaintiff.
The Squirt format takes its name from SquirtCo v. Seven-Up Co., 628 F.2d 1086 (8th Cir. 1980). Here the respondent is shown both parties' marks—often alongside others—and asked whether the products come from the same or different companies. Squirt is appropriate where consumers actually encounter the marks together or in close sequence in the real market (think two products on the same shelf), so that putting them side by side replicates reality rather than fabricating it. The danger is obvious: if consumers would never see the marks together, a Squirt survey can generate artificial confusion by creating a comparison the marketplace never presents. That is why format selection is itself a methodological judgment, and why a court will ask whether the chosen format fits the way buyers actually shop.
There is also an aided Eveready hybrid: it shows both parties' uses (a Squirt trait) but then tests for confusion with open-ended questions about the junior use (an Eveready trait), capturing some advantages of each. See Jerre B. Swann & R. Charles Henn Jr., Likelihood of Confusion Surveys: The Ever-Constant Eveready Format; The Ever-Evolving Squirt Format, 109 Trademark Rep. 671 (2019). The practical lesson is that there is no single "correct" survey design—only a design appropriate to the strength of the mark and the realities of the market. A challenger who can show that the proponent reached for Squirt to manufacture a comparison that consumers never face, or chose a format unsuited to the mark's fame, has the makings of a serious motion.
The Central Battleground: Weight Versus Admissibility
The single most important and most contested question in survey Daubert practice is whether a given flaw is serious enough to exclude the survey (admissibility) or merely something for the jury to discount after cross-examination (weight). The answer determines whether the survey reaches the jury at all, and courts have not always drawn the line in the same place.
For years, many courts—including some in the Second Circuit—defaulted to the proposition that technical and methodological deficiencies in a survey go to weight, not admissibility. Under this view, a survey with a questionable universe or imperfect questions would be admitted, and the opposing party would attack it through cross-examination and a competing expert, leaving the jury to decide how much to credit it. The intuition behind the default is genuine: surveys are rarely perfect, and the adversarial process is well suited to exposing flaws. The leading illustration is Fortune Dynamic, Inc. v. Victoria's Secret Stores Brand Management, Inc., 618 F.3d 1025 (9th Cir. 2010), where the district court had thrown out the plaintiff's confusion survey entirely—faulting it for comparing the products side by side, for being conducted online, for screening problems, and for being "highly suggestive"—and granted summary judgment of non-infringement. The Ninth Circuit reversed, holding that those criticisms went to the survey's weight, not its admissibility, and that the survey created a triable issue on likelihood of confusion. Fortune Dynamic is the case proponents cite when they want a survey to survive: real flaws, but flaws for the jury.
That default, however, has always had a limit, and the limit has grown sharper. Where a survey's flaws are fundamental—a wrong universe that measures the perceptions of the wrong population, leading questions that manufacture the very confusion being measured, the total absence of any control, a stimulus bearing no resemblance to marketplace reality—courts have held that the survey is so unreliable it should be excluded outright or given no weight, because the defects destroy the survey's probative value at its foundation rather than nibble at its margins. The line, imperfect but real, runs between flaws that affect how much a sound survey proves and flaws that mean the survey proves nothing about the relevant question. A confusion rate built on the wrong population is not a weak measurement of confusion; it is a measurement of something else entirely.
The 2023 amendment to Rule 702 has tilted this balance toward more rigorous gatekeeping. By making explicit that the reliable application of methodology is an admissibility question, and that the proponent must show admissibility by a preponderance, the amendment undercuts the reflexive "goes to weight" response for serious defects. A court today stands on firmer ground excluding a survey whose methodology does not reliably support its conclusion, and challengers increasingly frame their motions around the amended rule's text and the Advisory Committee's pointed critique of the old practice. The practical upshot: the "weight not admissibility" mantra remains valid for minor imperfections but is a weaker shield than it once was for fundamental ones. The honest framing for a 2026 litigant is that there is now a spectrum, with a meaningful and growing zone in the middle where a thoughtful judge can go either way—which is exactly where good lawyering decides the motion.
The Most Common Grounds for Excluding or Discounting a Survey
Survey challenges cluster around a recognizable set of defects. Mastering them is the heart of both attacking and defending a survey.
The wrong universe is the classic and often most powerful attack. If the survey sampled the wrong population—the senior user's customers instead of the junior user's prospective purchasers, or the general public instead of the specialized buyers who actually purchase the goods—the results do not measure legally relevant confusion. The Second Circuit has long treated selection of the proper universe as foundational, and a survey of the wrong population may be entitled to little or no weight. See Bristol-Myers Squibb Co. v. McNeil-P.P.C., Inc., 973 F.2d 1033 (2d Cir. 1992) (a Lanham Act case underscoring the centrality of the proper universe and the consequences of universe and methodology defects). A universe error is frequently fatal because it cannot be cured by cross-examination; the survey simply asked the wrong people, and no amount of jury argument turns the wrong people into the right ones.
Leading and suggestive questions are the next great vulnerability. A question that primes the respondent to assume a connection, or that hands over the "right" answer, generates confusion that exists only inside the survey instrument and nowhere in the marketplace. Courts have excluded or heavily discounted surveys whose questions telegraphed the desired response or created demand effects. The flip side is that a non-leading Eveready question—who puts out this product?—is far more defensible precisely because it supplies no cue. The challenge here often pairs with an expert's deposition admissions: if the proponent's own expert concedes that a question could be read as suggesting the connection, the motion writes itself.
The absence of a proper control is a third common ground, and one that survey scientists regard as close to disqualifying. Without a control group or cell, a survey cannot separate genuine confusion caused by the defendant's mark from background noise. A reported confusion rate with no control is, in the view of many courts and the Reference Guide, simply uninterpretable. Detailed Daubert analyses in the Southern District of New York have scrutinized control design closely. See Malletier v. Dooney & Bourke, Inc., 525 F. Supp. 2d 558 (S.D.N.Y. 2007) (rigorous, criterion-by-criterion examination of competing surveys, including control and methodology issues). Note the contrast with T-Mobile v. Aio, where the proponent's survey did employ a control (a brown color swapped in for magenta) and reported a net figure after subtracting it—one reason the court credited it. The presence or absence of a control is frequently the difference between a survey that survives and one that does not.
A mismatch between format and mark, or an artificial stimulus, is a fourth ground—the format problem discussed above, viewed through the lens of exclusion. Using a Squirt format where consumers never encounter the marks together, or constructing a stimulus that strips away each company's house branding and packaging, invites the charge that the survey measured something other than real-world confusion. Courts have excluded surveys whose stimuli or lineups bore little resemblance to marketplace conditions, while crediting surveys (like the one in T-Mobile) that took pains to replicate how consumers actually encounter the mark.
Sampling and execution defects round out the list: non-representative or too-small samples, interviewer bias, non-blind administration, failure to use quality-control screens in online surveys, and errors in coding or interpreting open-ended responses. Individually, some of these go only to weight—Fortune Dynamic shows that even "conducted online" and "imperfect screening" can be weight problems rather than admissibility problems. But in combination, or when severe, they can render a survey unreliable. The Southern District of New York has not hesitated to exclude confusion surveys riddled with methodological problems. See THOIP v. Walt Disney Co., 690 F. Supp. 2d 218 (S.D.N.Y. 2010) (excluding confusion-survey evidence on methodological grounds).
Two structural points tie these grounds together. First, one foundational flaw can be enough; you do not need a clean sweep. A fatal universe error sinks a survey even if everything else is flawless. Second, the combination matters: a survey with a borderline universe, a weak control, and a slightly leading question may survive each objection in isolation but fail under their cumulative weight, because at some point the court loses confidence that the number means anything. The disciplined challenger leads with the strongest foundational defect and uses the rest to show a pattern.
How the Second Circuit and EDNY Treat These Challenges
Because the Eastern District of New York applies Second Circuit law, the posture of the Circuit and the practice of the New York district courts are what matter most for litigants there, and the picture is one of rigorous but discerning scrutiny.
The Second Circuit's traditional framing is that survey defects often go to weight rather than admissibility, and the Circuit has cautioned against excluding surveys for imperfections the adversarial process can expose. See Schering Corp. v. Pfizer Inc., 189 F.3d 218 (2d Cir. 1999) (addressing the reliability and admissibility of survey evidence and treating surveys as a legitimate and often powerful form of proof). At the same time, the Circuit has consistently treated the proper universe as foundational and has recognized that a survey resting on fundamentally flawed methodology may be entitled to little or no probative weight, as Bristol-Myers Squibb reflects. The Circuit's stance is best understood not as a thumb on the scale for admission, but as a calibrated approach: minor flaws go to weight; fundamental flaws can be fatal. The Second Circuit's broader appellate posture in trademark matters—deferential to fact-bound determinations but exacting on questions of legal standard—reinforces this; we treat that posture at length in our guide to Second Circuit appellate standards in trademark cases.
The district courts within the Circuit—especially the Southern District of New York, whose trademark docket is among the largest in the country—have produced some of the most detailed Daubert survey analyses anywhere, and the Eastern District applies the same framework. Malletier v. Dooney & Bourke and THOIP v. Walt Disney show district courts willing to dissect surveys in depth and to exclude or sharply discount those with fundamental defects in universe, controls, questions, or stimulus. Denimafia Inc. v. New Balance Athletic Shoe, Inc., 2014 WL 814532 (S.D.N.Y. Mar. 3, 2014), shows the courts grappling capably with online surveys in a reverse-confusion posture, scrutinizing whether the survey expert correctly identified the relevant buyers (there, purchasers of premium denim) and replicated the actual online shopping environment. The lesson for practitioners in EDNY and SDNY is that these courts are sophisticated consumers of survey evidence. A methodologically sound survey will generally be admitted and its remaining quibbles left to the jury; a survey with a foundational defect faces a genuine risk of exclusion, especially under the post-2023 Rule 702 standard.
How aggressive, then, are these courts? The honest answer is selectively aggressive. They do not exclude surveys lightly for ordinary imperfections—the adversarial process remains the preferred crucible for those, and Fortune Dynamic's logic (though a Ninth Circuit case) reflects a widely shared instinct that weak surveys are often better cross-examined than excluded. But they will exclude, or strip of weight, surveys whose defects reach the foundation, and they have the methodological fluency to spot such defects without being told. A challenger who can show a fundamental flaw has a real path to exclusion; a challenger nitpicking a basically sound survey will usually be told to save it for the jury.
Can a Damages Expert Be Excluded for Failing to Apportion?
The gatekeeping framework is not confined to survey experts; it governs damages experts as well. And the answer to whether a damages expert can be excluded for failing to apportion is yes—through the same reliability and fit requirements, reinforced by Joiner's analytical-gap principle.
As we explain in damages apportionment in trademark cases, a trademark plaintiff may recover only the profits or damages attributable to the infringement, not the defendant's entire enterprise revenue. A damages expert whose model ignores that boundary—who builds a damages figure on company-wide revenue, sweeps in non-infringing product lines and unrelated business segments, or assumes without support that all of the defendant's profits flow from the infringing mark—offers an opinion untethered from the legally relevant facts. Such a model is vulnerable on two Rule 702 grounds at once. It lacks fit, because it does not measure the legally recoverable harm. And it fails the reliability and analytical-gap tests, because there is too great a gap between the data (total revenue) and the conclusion (damages caused by infringement); the apportionment assumption is ipse dixit rather than the product of reliable methodology. Joiner is the operative authority here, and it travels well from its toxic-tort origins to a damages spreadsheet: a court need not admit a number "connected to existing data only by the ipse dixit of the expert."
Courts therefore exclude—or sharply cabin—damages experts whose opinions rest on unapportioned or speculative models, just as they exclude survey experts whose surveys measure the wrong thing. The challenge typically argues that the expert failed to isolate infringement-related sales, applied an unsupported allocation (or none at all), and produced a figure that no reliable methodology connects to the wrong. Because the proponent now bears the explicit burden under amended Rule 702 to show reliable application by a preponderance, a damages expert who cannot defend the causal link between the infringement and the claimed figure is at serious risk. These principles dovetail with the substantive recovery rules we discuss in Lanham Act attorneys' fees under 15 U.S.C. § 1117(a), which shares § 1117's structure for monetary relief.
There is a second, independent path to defeating a damages opinion, and a careful challenger pleads both. If the expert's methodology or underlying data was not properly disclosed under Rule 26(a)(2), the testimony may be precluded under Rule 37(c)(1) regardless of how sound the methodology is—the "self-executing" sanction we analyze in Rule 37(c)(1) preclusion of undisclosed evidence. A damages opinion can thus fall two ways: excluded under Daubert for unreliability, or precluded under Rule 37(c)(1) for nondisclosure. The same dual structure applies to survey experts: an undisclosed survey backup file or a late-produced data set can be precluded even if the survey itself would have survived Daubert. The two motions are friends, not substitutes.
Technical and Qualification Challenges
The third category of expert challenge—aimed at technical experts and at qualifications generally—flows directly from Kumho Tire. Because gatekeeping reaches all expert testimony, a party may challenge not only a survey's methodology but also an expert's basic qualification to offer the opinion at all, and the reliability of any technical opinion the expert gives.
A qualification challenge argues that the witness lacks the knowledge, skill, experience, training, or education to opine on the subject. In the survey context, the Reference Guide is explicit that a survey expert must understand and apply best practices in sampling, questionnaire design and construction, and statistical analysis. A marketing generalist who has never run a probability sample, or a statistician with no exposure to questionnaire design, may be vulnerable—not because they lack credentials in some field, but because they lack the particular competence the opinion requires. A reliability challenge to a technical expert argues that, whatever the witness's pedigree, the specific opinion rests on no discernible methodology—pure ipse dixit—or on a methodology not reliably applied to the facts. Kumho makes clear that such experts get no pass simply because their field is not "scientific"; the court must still satisfy itself that the opinion is the product of reliable principles reliably applied, and Joiner supplies the tool when it is not.
In trademark cases, technical and qualification challenges arise around experts on marketplace conditions, financial damages, linguistics (especially in foreign-equivalents and translation disputes), and industry custom. The disciplined challenger frames the motion around the specific gap—this witness is not qualified in this discipline; this opinion rests on no stated method; this conclusion does not follow from the data offered—rather than a generic complaint that the expert is biased or wrong, which goes to weight and will be sent to the jury.
Building and Defending the Daubert Challenge
The anatomy of a survey Daubert fight is worth understanding from both sides, because the procedure shapes the strategy.
The challenge is ordinarily raised by a motion in limine to exclude under Rule 702 and Daubert—a Daubert motion is simply a species of motion in limine aimed at expert testimony—frequently filed alongside or in coordination with summary judgment, because excluding a survey can be the very move that wins summary judgment on the confusion issue (the dynamic we explore in Polaroid factors on summary judgment). District courts derive their authority to rule on these motions from Rules 104 and 103, and from their inherent power to manage trials. See Luce v. United States, 469 U.S. 38, 41 n.4 (1984). The court may decide the motion on the papers or hold a Daubert hearing, sometimes with live expert testimony. In response, it can exclude evidence that is clearly inadmissible, issue a preliminary ruling subject to revision at trial, or defer ruling until trial when context will sharpen the question. The proponent bears the burden—now explicitly a preponderance under amended Rule 702—to establish that the expert's methodology is reliable and fits the issues.
A seasoned challenger reaches for more than Rule 702 alone. The same survey can often be attacked under Rule 403 (the danger of unfair prejudice, confusion of the issues, or misleading the jury substantially outweighs probative value)—a particularly apt argument when a survey's pseudo-precise percentage threatens to overawe the jury, as we discuss in Federal Rule of Evidence 403 and unfair prejudice—and under Rule 401 (relevance, where the survey measures something the case does not turn on). And, as noted, Rule 37(c)(1) offers an independent disclosure-based ground. The well-built motion stacks these: lead with the foundational Rule 702 defect, add Rule 403 for the prejudice of a flashy-but-unreliable number, and reserve Rule 37(c)(1) for any disclosure gap. A judge inclined to give the survey a pass on one theory may exclude or limit it on another.
For the challenger, the disciplined approach is to target the foundation, not the margins: attack the universe, the control, the questions, the stimulus, the format choice, and the fit, framing each as a reliability or relevance failure under the amended rule rather than a mere imperfection. Marshal the Reference Guide's criteria and the survey case law to show departure from accepted practice, and retain a rebuttal survey expert to articulate why the defects are fundamental—ideally by extracting concessions from the proponent's own expert at deposition. Resist the temptation to throw every minor quibble at the survey; courts respect a focused attack on a genuine foundational flaw and discount a scattershot one (Fortune Dynamic is a cautionary tale for over-attacking).
For the proponent, the defense begins long before the motion—at the survey's design. A survey built from the start in conformity with the Reference Guide (correct universe, representative sample, non-leading questions in the right format, a properly designed control, realistic stimulus, double-blind administration, clean coding, online quality-control screens) is far more likely to survive. When the challenge comes, the proponent should meet it head-on by demonstrating compliance with accepted methodology and by characterizing the opponent's criticisms as ordinary imperfections that go to weight, not admissibility—citing Fortune Dynamic and Schering—while reserving for the jury the debate over how much the survey proves. The proponent should also be candid about limitations; an expert who overstates what the survey shows invites an analytical-gap exclusion under Joiner. And the proponent must keep the discovery house in order: a fully disclosed methodology and a complete production of backup data deprive the challenger of the Rule 37(c)(1) ground.
A Worked Example: The Confusion Survey
Consider an invented dispute. (This hypothetical is illustrative only and does not describe any real case.) "Northwind," a maker of premium outdoor jackets, sues "North Wind Gear," a startup selling similar jackets under a similar name, for trademark infringement in the Eastern District of New York. Northwind retains a survey expert who reports a 38% confusion rate. North Wind Gear moves to exclude the survey under Daubert, Rule 403, and Rule 37(c)(1).
The motion targets the foundation. First, universe: the survey sampled past purchasers of Northwind's own jackets rather than prospective purchasers of North Wind Gear's jackets, so it measured whether the senior user's existing customers recognize a similar name—not whether the junior user's buyers are likely to be confused about source. Under Bristol-Myers Squibb, that is a foundational universe error. Second, leading questions: rather than the open-ended Eveready inquiry ("who puts out this jacket?"), the survey asked whether the two jackets "come from the same company," priming a connection and supplying the answer. Third, no control: the survey reported 38% confusion with no control cell, so there is no way to know how much of that figure is genuine confusion versus guessing and noise—the very interpretability problem that doomed the surveys scrutinized in Malletier. Fourth, stimulus: respondents saw the two marks side by side on a blank screen, stripped of each company's distinct house branding and packaging—conditions that exist nowhere in the actual marketplace, and a misuse of the Squirt comparison where these jackets are never shelved together. Fifth, disclosure: the expert's underlying data file and verbatim coding were never produced, supplying a Rule 37(c)(1) hook.
How would an EDNY court likely treat this? The universe error alone is serious and arguably fatal, because the survey asked the wrong population; cross-examination cannot fix a survey that measured the wrong people's perceptions. The absence of a control compounds the problem by rendering the 38% figure uninterpretable, and the leading question and artificial stimulus suggest the figure was inflated by the instrument itself. Under the post-2023 Rule 702 standard, with the proponent bearing the burden to show reliable application by a preponderance, this survey faces a real risk of exclusion, not merely discounting. And if it is excluded, Northwind may have little admissible proof of actual confusion, weakening its position on that Polaroid factor at summary judgment—the leverage point that makes the Daubert motion worth filing.
Now change the facts and watch the result flip. Suppose the survey sampled prospective purchasers of North Wind Gear's jackets (correct universe), used a non-leading Eveready question appropriate to a reasonably strong mark, employed a properly designed control cell and reported a net confusion rate after subtracting it, showed the marks as consumers actually encounter them, was administered double-blind with timing and trap-question screens, and was fully disclosed in discovery. Now the challenger's remaining criticisms—perhaps about sample size or a single coding judgment—look like the ordinary imperfections that Fortune Dynamic consigns to the jury. A court would likely admit the survey and leave those quibbles to cross-examination. The methodology, not the topic, determines the outcome.
A Second Worked Example: The Unapportioned Damages Model
(Again hypothetical.) Suppose Northwind wins on liability and offers a damages expert who testifies that North Wind Gear must disgorge $9 million—the startup's entire gross revenue over the infringement period. On cross, the expert concedes that North Wind Gear sold jackets, boots, and unrelated camping equipment; that only the jacket line bore the accused mark; and that he made no attempt to separate jacket revenue from the rest, reasoning that "the brand drove the whole business." North Wind Gear moves to exclude.
This opinion is exposed on both Rule 702 prongs. It does not fit the law, because § 1117 permits recovery only of profits attributable to the infringement, not enterprise-wide revenue. And it fails Joiner: the leap from total revenue to infringement-caused damages rests on the expert's unsupported say-so that "the brand drove the whole business"—textbook ipse dixit, an analytical gap a court may refuse to bridge. Under amended Rule 702's preponderance standard, the proponent cannot carry the burden of showing a reliable application of accepted damages methodology to these facts. A court would likely exclude the opinion, or at minimum strike the portion built on non-jacket revenue, forcing Northwind to come forward with an apportioned figure. The takeaway mirrors the survey side: a damages model is only as admissible as its tether to the legally recoverable harm.
Practical Takeaways
For the challenger, the path to exclusion runs through the foundation, not the margins. Identify the specific, fundamental defect—wrong universe, leading questions, missing control, artificial stimulus or mismatched format, or a damages model untethered from infringement-related sales—and frame it as a reliability or fit failure under amended Rule 702, invoking the proponent's preponderance burden and Joiner's analytical-gap principle. Stack independent grounds: add Rule 403 where a pseudo-precise number threatens to overawe the jury, and Rule 37(c)(1) where the methodology or data was not disclosed. Retain a rebuttal expert to explain why the defect is foundational rather than cosmetic, ground the attack in the Reference Guide and the survey case law, and coordinate the motion with summary judgment where exclusion would carry the issue. Avoid the scattershot motion; a focused attack on a genuine foundational flaw is far more persuasive than a list of minor complaints, and Fortune Dynamic shows what happens to over-aggressive ones.
For the proponent, the best defense is built at the design stage. Commission surveys that conform from the outset to the accepted criteria—correct universe, representative sample, non-leading questions in a format suited to the mark and market, a properly designed control reported as a net figure, realistic stimulus, double-blind administration with quality-control screens, objective coding—and retain experts who will not overstate what the data show. When the challenge arrives, demonstrate methodological compliance, characterize the opponent's criticisms as weight-not-admissibility imperfections under Fortune Dynamic and Schering, and be ready for a Daubert hearing. Disclose the methodology and data fully in discovery so the testimony is not lost on the independent ground of nondisclosure. And tailor damages models to recoverable, infringement-related harm, so the opinion fits the law and survives the analytical-gap test.
For both sides, the unifying principle is that gatekeeping is real and, after the 2023 amendment, more demanding than the old "goes to weight" reflex suggested. Kumho places surveys squarely within the gatekeeping regime; the Reference Guide and the case law supply the yardstick; and the courts of the Second Circuit and EDNY apply that yardstick with sophistication—admitting sound surveys and excluding fundamentally flawed ones. A survey is only as valuable as its methodology is defensible, and the Daubert motion is where that methodology is tested. Master the criteria, attack or defend the foundation, and the most powerful evidence in a trademark case becomes either a decisive asset or a decisive vulnerability—depending entirely on how well it was built.
Frequently Asked Questions
Is a consumer survey required to win a trademark case? No. Likelihood of confusion can be proved through the full multi-factor analysis—mark similarity, proximity of goods, the senior mark's strength, the defendant's intent, evidence of actual confusion, and more—without any survey at all. A survey is often the most direct evidence of confusion, but it is one input, not a prerequisite. That said, in some contexts a party with the resources to run a survey who chooses not to may face an adverse inference that the survey would have been unfavorable, so the decision not to survey is itself strategic.
What is the single most common reason trademark surveys get excluded? A wrong or poorly defined universe. Because the legal question targets a specific population—usually the junior user's prospective purchasers in a forward-confusion case—a survey that samples the wrong people measures the wrong thing, and that defect generally cannot be cured by cross-examination. Missing controls and leading questions are close behind.
Does the 2023 amendment to Rule 702 really change anything? Yes, in emphasis and in litigating posture. The amendment did not invent new requirements, but it made two things textually explicit: the proponent must satisfy each admissibility requirement by a preponderance, and the reliable application of methodology is a question for the judge, not the jury. The Advisory Committee expressly criticized courts that had been admitting shaky experts on a "goes to weight" rationale. The practical effect is that challengers can now press harder on serious methodological defects, and the reflexive "weight not admissibility" answer is a weaker shield than it was.
What is the difference between an Eveready and a Squirt survey, and why does it matter? An Eveready survey shows respondents only the defendant's mark and asks who makes the product, testing spontaneous association—ideal for strong, well-known marks. A Squirt survey shows both marks and asks whether they come from the same company—appropriate only where consumers actually encounter the marks together in the market. Using a Squirt format where the marks are never seen side by side can manufacture artificial confusion, which is itself a ground for challenge. Choosing the right format is a methodological decision a court will scrutinize.
Can a damages expert be excluded just for failing to apportion? Yes. A damages model built on the defendant's entire revenue, without isolating the sales attributable to the infringement, fails Rule 702 on two fronts: it does not fit the law (which limits recovery to infringement-related profits), and it fails Joiner's analytical-gap test because the leap from total revenue to infringement damages rests on the expert's unsupported assertion. Such opinions are routinely excluded or sharply limited.
Are there grounds to exclude a survey beyond Rule 702? Yes. Rule 403 lets a court exclude a survey whose pseudo-precise number threatens unfair prejudice or jury confusion that substantially outweighs its probative value. Rule 401 addresses surveys that measure something irrelevant to the case. And Rule 37(c)(1) precludes survey testimony whose methodology or underlying data was not properly disclosed in discovery—an independent ground that can defeat even a methodologically sound survey. The strongest motions stack these theories.
How do online surveys fare under Daubert? Generally well, when properly designed. Courts assess online surveys by the same criteria as in-person ones and do not treat the online format as a flaw in itself. See Fortune Dynamic, 618 F.3d 1025; T-Mobile v. Aio, 991 F. Supp. 2d 888. What matters is universe, control, question quality, realistic stimulus, and quality-control measures (such as trap questions and timing checks) to confirm that respondents actually engaged with the survey.
Related Articles
- Consumer Survey Expert Methodology in Trademark Cases — how to design and execute a defensible survey, including the Eveready and Squirt formats in depth.
- Polaroid Factors on Summary Judgment in the Second Circuit — how survey evidence feeds the actual-confusion and strength factors, and how exclusion can carry summary judgment.
- Navigating the Maze of Trademark Confusion — the substantive likelihood-of-confusion test surveys are offered to prove.
- Damages Apportionment in Trademark Cases — why an unapportioned damages model is vulnerable to exclusion.
- Federal Rule of Evidence 403 and Unfair Prejudice — the second-line ground for excluding a flashy-but-unreliable survey.
- Rule 37(c)(1) Preclusion of Undisclosed Evidence — the independent ground on which an expert's undisclosed methodology or data can be precluded.
- Second Circuit Appellate Standards in Trademark Cases — how appellate review treats survey and confusion findings.
- Lanham Act Attorneys' Fees Under 15 U.S.C. § 1117(a) — the monetary-relief framework that damages experts must fit.
- Bench Trial vs. Jury Trial Issues in Trademark Litigation — how the trier of fact affects the strategy around survey evidence.
This article is provided for general informational purposes and does not constitute legal advice. The admissibility of expert testimony is fact-specific and committed to the sound discretion of the trial court; consult qualified litigation counsel about any particular matter.