Trademark Consumer Survey Design: A Practical Checklist

A trademark survey's value lives entirely in its methodology: the universe must be right, the format must fit, the questions must be neutral, the administration must be blind, the conditions realistic, and the confusion netted against a control before any percentage means anything. This checklist tracks the design choices in the order they matter—which is also the order a cross-examiner attacks them. For the doctrine, see Consumer Survey Expert Methodology in Trademark Cases; for admissibility, Daubert Challenges to Consumer Survey Experts.

Phase 1 — Decide whether to commission a survey at all

Confirm the issue genuinely turns on consumer perception.
Confirm the budget and schedule permit a rigorous design (often a five-/six-figure, multi-month undertaking).
Confirm a competent expert believes a defensible survey can actually be built for these marks and this market.
Retain an experienced, independent survey expert early.

WHY / traps. A mediocre survey is often worse than none—it hands the opponent a Daubert target and may produce a number that undercuts the party that paid for it. A survey is not mandatory; confusion can be proven by the other factors and real-world actual confusion. If a credible expert cannot in good conscience design a survey that finds what you hope, that is itself useful information about the case.

Phase 2 — Define the universe (the load-bearing decision)

Identify the theory of confusion: forward, reverse, sponsorship, etc.
Set the universe for forward confusion = prospective purchasers of the junior user's (defendant's) goods/services.
Flip the universe for reverse confusion = prospective purchasers of the senior user's goods/services.
For fame/dilution, use the broad general consuming public; for secondary meaning, the relevant purchasers.
Draft screening (filter) questions that actually capture that universe—right product category, timeframe, and price range.

WHY / traps. The wrong universe is usually fatal and cannot be cured by cross-examination or argument—you measured the wrong people. Two named errors: over-inclusive (sweeps in non-purchasers, diluting and distorting) and under-inclusive (excludes relevant purchasers, cannot generalize). The format does not change the universe; a Squirt survey on the wrong universe is exactly as defective as an Eveready one. Loose or tight screeners are a back door through which a universe problem walks right back in.

Phase 3 — Choose the format

Choose Eveready when the senior mark is strong/famous: show only the junior mark/product and ask open-ended questions (who puts this out, what makes you say so, what else do they make, do they need permission).
Choose Squirt when the senior mark is weaker or consumers genuinely encounter both marks together: show both marks and ask a neutral same-source/affiliation question.
For a Squirt design, build a lineup/array that salts the test marks among unrelated marks to avoid a naked two-way face-off.
Match the format to the lie of the ball—do not default to habit.

WHY / traps. Eveready relies on memory and minimizes suggestion—the gold standard for strong marks—but run on an obscure mark it under-counts confusion (respondents never heard of the plaintiff). Squirt risks suggestion: juxtaposing two marks nudges respondents to find a relationship no shopper would experience. If a plainly famous mark's expert chose Squirt, ask why—the answer is sometimes that Eveready showed too little confusion.

Phase 4 — Build the control and plan the net

Add a control cell identical to the test cell in every respect except the allegedly infringing feature (a plainly non-infringing mark or the feature altered/removed).
Plan to report net confusion = test cell − control cell, not the raw figure.
Make the control especially robust for Squirt designs, where juxtaposition inflates raw confusion.

WHY / traps. Without a control a reported confusion percentage is essentially uninterpretable—every survey generates background "noise" (guessing, pre-existing beliefs, instrument artifacts). The first question a sophisticated reader asks is not "what was the rate?" but "what was the control, and what is the net?" A Squirt survey with no control is one of the most exploitable artifacts in trademark litigation.

Phase 5 — Draft questions, stimuli, and administration

Use open-ended, non-leading questions; avoid demand effects that telegraph the desired answer.
Offer a genuine "don't know / no opinion" option so uncertainty is not coded as confusion.
Rotate stimulus and answer-choice order to neutralize order effects.
Probe spontaneous answers neutrally ("What makes you say that?").
Replicate marketplace realism: show full trade dress, packaging, color, and point-of-sale context—not bare word marks on a white screen.
Administer double-blind: neither interviewer nor respondent knows the sponsor or purpose.

WHY / traps. Leading questions and demand effects generate confusion that exists only in the instrument. Single-blind is not enough—an interviewer who knows the "right" answer can signal it unconsciously. An artificial side-by-side no shopper would ever see measures something other than real-world confusion; this has special bite in trade dress cases.

Phase 6 — Sample, code, and report

Choose the sampling method (probability ideal; non-probability online panel or mall intercept acceptable if properly executed and justified).
Size each cell (test and control) for a usable margin of error; report cell sizes, margin of error, and confidence level (commonly 95%).
Code open-ended responses objectively—blind coders, a written protocol, an inter-coder reliability check—and preserve and produce the verbatims.
Document the design rationale in writing and build the survey to survive a Daubert/Rule 702 challenge from day one.
Interpret the result with the net figure in mind (≈15%+ supports confusion; below ≈10% tends to negate; 10–15% is a gray zone), as one factor among the Polaroid/Sleekcraft set.

WHY / traps. Too small a sample yields a margin of error so wide the number is meaningless. Coding is where mischief hides—"I'm not really sure" can be coded as confused or uncertain depending on who holds the pen. A report that will not produce its verbatims is signaling something. Under amended Rule 702 (2023), the proponent must establish reliability by a preponderance and the opinion must reflect a reliable application of the method—fundamental defects (wrong universe, no control, leading questions, unrealistic stimulus) increasingly draw exclusion rather than mere "weight."

Surveys for other issues

Secondary meaning: test whether a descriptive term identifies a single source; control for respondents who would associate any term with one source.
Fame (dilution): sample the general consuming public, not a niche buyer pool.
Genericness — Teflon: tutor respondents on brand vs. common names, then have them classify a salted list (courts generally prefer this for cleaner data).
Genericness — Thermos: ask what respondents would call the product to capture spontaneous generic use.

WHY / traps. Survey methodology is a single discipline applied to different questions—each format must still satisfy the same universe, neutrality, execution, and coding standards.

Common mistakes

The wrong universe (the single most common fatal flaw).
No control, so the raw figure overstates confusion.
Wrong format for the mark's strength.
Leading questions and demand effects.
Artificial side-by-side stimuli divorced from the marketplace.
Single-blind (or unblinded) administration.
Tiny samples and unreported margins of error.
Subjective coding and withheld verbatims.

Primary authority

Rules: Fed. R. Evid. 702 (as amended 2023) and 703; Daubert v. Merrell Dow Pharms., Inc., 509 U.S. 579 (1993); Kumho Tire Co. v. Carmichael, 526 U.S. 137 (1999); Gen. Elec. Co. v. Joiner, 522 U.S. 136 (1997).
Formats: Union Carbide Corp. v. Ever-Ready, Inc., 531 F.2d 366 (7th Cir. 1976) (Eveready); SquirtCo v. Seven-Up Co., 628 F.2d 1086 (8th Cir. 1980) (Squirt); E.I. DuPont de Nemours & Co. v. Yoshida Int'l, Inc., 393 F. Supp. 502 (E.D.N.Y. 1975) (Teflon); King-Seeley Thermos Co. v. Aladdin Indus., 321 F.2d 577 (2d Cir. 1963) (Thermos).
Universe: Bristol-Myers Squibb Co. v. McNeil-P.P.C., Inc., 973 F.2d 1033 (2d Cir. 1992).
References: Federal Judicial Center, Reference Manual on Scientific Evidence (Reference Guide on Survey Research, Shari Seidman Diamond); McCarthy on Trademarks §§ 32:158 et seq.; INTA survey guidelines.

Survey design and admissibility are fact-specific; consult a qualified survey expert and trademark counsel.

Related resources

This checklist is general information, not legal advice. Consult qualified trademark litigation counsel and a qualified survey expert about any particular matter.