A vast grid of hundreds of near-identical grey paper comparison slips, every one matching, with a single slip glowing stark red, the lone error exposed in a sea of correct decisions.

Reading · Fingerprints

What's Your Error Rate?

For a hundred years the answer was “we don’t know.” Then a black-box study of latent print examiners finally measured it. This reading walks through the fingerprint numbers, what they mean, and what they don’t. It also shows how to say your error rate out loud without overstating it or apologising for it.

15 min readBased on the Accuracy and Reliability of Forensic Latent Fingerprint Decisions

The four words you have to be ready for

Sooner or later a cross-examiner asks the simplest question in the room: “What's your error rate?” It sounds like small talk. It isn't. Your whole discipline spent a century unable to answer it.

For most of the history of fingerprint evidence, the reply was we don't really know. There had never been a large, properly designed study of how often examiners get it wrong. That's the gap the National Academies report seized on, and the one defence lawyers learned to press. An examiner who fumbles the question, or worse, claims a zero error rate, hands the courtroom a gift.

In 2011 that changed. A team led by Bradford Ulery at the FBI Laboratory ran the first large-scale study of latent print examiners' decisions: 169 examiners, each comparing about 100 pairs of prints drawn from a pool of 744. For the first time, you could put a number on it.

A scope note before the numbers: this is a fingerprint reading. The error-rate question lands on every discipline, but the figures here were measured on latent print examiners and belong to fingerprints alone. Quoted from a firearms or document-examination bench, they're someone else's data. This reading won't make you memorise the fingerprint numbers and recite them. It's here so that when those four words come, you know what the study measured, what its numbers mean, and what they flatly do not mean. So you can say all of it without flinching.

“Five examiners made false positive errors for an overall false positive rate of 0.1%. Eighty-five percent of examiners made at least one false negative error for an overall false negative rate of 7.5%.”

— Ulery, Hicklin, Buscaglia & Roberts (2011)

A single large circular instrument filling the frame, mounted alone on a grey panel, its glass face completely empty, with no scale, no markings, no needle, no pivot, just one hard sliver of vermilion-red light glancing across the blank glass, the only colour in the frame. — Fig. 1 · An instrument built to give a reading, with nothing on its face. For a century the field’s error rate was a gauge no one had ever marked.

Inside the black box: what they measured

The design comes first, because a sharp lawyer probes it before anything else. Ulery's team used what they call a black box approach. They didn't try to dictate how an examiner should reach a decision. They put a known answer behind the curtain, fed examiners the prints, and recorded what came out. Ground truth was known to the researchers and hidden from the examiners. That's the only way to measure accuracy at all.

Learn the vocabulary, because you'll be cross-examined in it. Pairs of prints from the same finger are mated. Pairs from different fingers are nonmated. Call a nonmated pair an individualisation and you've made a false positive, an erroneous identification, the error that can convict an innocent person. Call a mated pair an exclusion and you've made a false negative, an erroneous exclusion, the error that lets a guilty one walk. Two ways to be wrong. They aren't feared equally.

The prints weren't easy ones. Subject-matter experts deliberately chose latents and exemplars across a broad range of quality, and the nonmated pairs came from hard AFIS searches against a database of more than 58 million people. These were the close, confusable non-matches — the kind where a false identification becomes possible in the first place.

“Our study is based on a black box approach, evaluating the examiners’ accuracy and consensus in making decisions rather than attempting to determine or dictate how those decisions are made.”

— Ulery, Hicklin, Buscaglia & Roberts (2011)

A plain matte-grey beam balanced across a single knife-edge fulcrum on a graphite surface; a small grey feather rests on the raised left end while a dense block of vermilion-red cast iron sits on the sunken right end, tipping the beam hard toward it, the iron block the only colour in the frame. — Fig. 2 · Two ways to be wrong are not the same weight: a false negative is a feather, a false positive an iron block that can convict the innocent.

The design left one thing out on purpose, and you need to be ready to say so directly. This test measured individual examiners at single decision points. It stripped away the safeguards operational casework wraps around those decisions: re-examining the original evidence, consulting colleagues, revisiting hard comparisons, quality-assurance review, and above all, verification by a second examiner. The authors are explicit that their results don't necessarily reflect the performance of the full operational process. That cuts both ways on the stand. We'll come back to it.

Challenge 01 · Put it to the test

What did the study test?

Counsel holds up a printout of the study and begins, almost casually.

The question

“This famous fingerprint study you’re relying on — it tested examiners on a computer, one at a time, with no second examiner checking them, didn’t it? So it doesn’t actually tell this jury how accurate *you* were in *this* case, does it?”

Your answerNot graded · think it through

III

The two numbers, and why they’re so different

Take the numbers themselves, because you should be able to state them without notes. The false-positive rate (erroneous identifications) was 0.1%. Five examiners made false positives, and no two ever made the same one. The false-negative rate (erroneous exclusions) was 7.5%, and 85% of examiners made at least one.

Look at how lopsided that is, because it tells you something true about your own discipline. False positives are rare. False negatives are far more common. That's no accident. As the authors point out, examiners work inside a culture where a false identification is treated as the graver sin, so the whole field stays cautious. It would rather miss a true match than manufacture a false one. The study caught that tendency exactly: vanishingly few erroneous identifications, but a sizeable number of missed exclusions.

And there's a humbling detail tucked inside the false-negative finding. Sixty-five percent of the examiners said they were unaware of ever having made an erroneous exclusion after training. Yet 85% made at least one on this test. What you feel your accuracy to be is not the same thing as your accuracy. That gap alone is reason enough to talk in measured numbers on the stand, not personal confidence.

“Eighty-five percent of examiners made at least one false negative error, despite the fact that 65% of participants said that they were unaware of ever having made an erroneous exclusion after training.”

— Ulery, Hicklin, Buscaglia & Roberts (2011)

Two tall identical clear-glass cylinders standing side by side on a graphite surface in faintly grey water; the left cylinder holds a single suspended vermilion-red droplet, the right holds dozens of red droplets clouding through it, red the only colour in the frame. — Fig. 3 · Side by side: barely a drop of false positives, a cloud of false negatives. The field guards hard against false identifications and pays for it in missed exclusions.

What the numbers don’t mean

This is where examiners get into trouble. They take a real, useful number and ask it to carry weight it can't bear. The Ulery numbers measure a discipline under test conditions. They aren't a readout of your personal accuracy in the case you're testifying about, and the authors are careful to fence them in.

Three fences matter most. The first is the gap between a controlled test and your casework. These were deliberately hard prints chosen for research, shown on a screen, stripped of operational safeguards. The authors state clearly that the rates are useful reference estimates but aren't representative of all situations and don't account for operational context. So “the error rate is 0.1%” isn't a sentence you can truthfully say about your own conclusion. It's a sentence about a study.

The second fence: the group isn't the individual. This is a consensus measurement across 169 examiners. The study couldn't even measure individual false-positive rates with precision, because they were too low. You can't read your own personal error rate off a population average. The third fence: the average itself can mislead. The authors warn that averaging across such a varied population has limited value. Examiners differed substantially in skill, and the study's spread of performance even overstates the true variability. A single headline number flattens all of that.

“The rates measured in this study provide useful reference estimates that can inform decision making and guide future research; the results are not representative of all situations, and do not account for operational context and safeguards.”

— Ulery, Hicklin, Buscaglia & Roberts (2011)

A single person seen from behind in sharp focus, the one vermilion-red figure, set against a vast out-of-focus grey crowd stretching away behind them. — Fig. 4 · A study measures the whole field. It can’t tell you the odds on your one comparison.

So don't bury the study. Cite it correctly. Offer the numbers as exactly what they are: the best large-scale evidence the field has about how often examiners err, with explicit limits on how far they travel. The examiner who says “this study found a 0.1% false-positive rate under test conditions, and here's why that doesn't transfer straight to my casework” sounds like a scientist. The one who says “my error rate is zero” sounds like someone who never read it.

Is the expertise even real? (Yes, and here’s the catch)

A deeper challenge lurks under the error-rate question, and a good lawyer reaches for it: is fingerprint expertise real at all, or are you just a confident person staring at smudges? For a long time the field had no clear answer. A companion study, published the same year by Jason Tangen and colleagues in Australia, set out to test it head-on.

The design was simple and brutal. Thirty-seven qualified, court-practising fingerprint experts and thirty-seven untrained university students judged the same pairs of prints. There were matches, random non-matches, and the hard category: similar non-matches pulled from a national database, the close, confusable look-alikes. If expertise is real, the experts should pull away from the novices exactly where it gets hardest.

They did, decisively. On the similar-but-different prints, the ones built to fool you, the experts wrongly said only about 0.68% of them came from the same source. The novices made that error on more than half of the same pairs. That gap is the training doing real work. It's the single best answer to “how do we know you're any better than this jury?” You have data showing trained examiners crush untrained ones on exactly the comparisons that matter.

“We have shown that qualified, court-practicing fingerprint experts are exceedingly accurate compared with novices, but are not infallible.”

— Tangen, Thompson & McCarthy (2011)

On a graphite surface a shallow heap of near-identical grey look-alike pebbles with a single vermilion-red pebble among them; a steady hand on the left holds aside a small fistful of pure grey pebbles with no red, while a second hand on the right has scooped up a clumsy fistful that includes the single vermilion-red pebble, red the only colour in the frame. — Fig. 5 · Same impossible heap of look-alikes: the trained hand keeps the one bad apple out; the untrained hand grabs a fistful and scoops up the error. That gap is the expertise.

Notice the rest of that sentence, because the cross-examiner will: exceedingly accurate … but are not infallible. The same study that vindicates your training also records that the experts still, now and then, made the worst kind of error, declaring that prints from two different people came from the same source. Tangen's team arrive at the line that ought to govern your whole approach to the stand. The question is no longer whether examiners err. It's how to acknowledge the errors they make. Expertise is real and fallible. Both halves are true, and a credible witness says both out loud.

“The issue is no longer whether fingerprint examiners make errors, but rather how to acknowledge those errors.”

— Tangen, Thompson & McCarthy (2011)

Challenge 02 · Put it to the test

Are you really any better than us?

Counsel gestures toward the jury and asks the relevance question head-on.

The question

“Strip away the uniform and the certificates. Is there any actual scientific evidence that a trained fingerprint examiner sees something the twelve people in this jury box couldn’t — or are you just more confident about your guesses?”

Your answerNot graded · think it through

Verification, inconclusives, and the limits of consensus

If false negatives are common and false positives are rare but real, what stands between the rare false positive and a wrongful conviction? The most reassuring finding in Ulery's data is about verification. Independent re-examination by a second examiner, much like blind verification, caught every false positive in the study, and most of the false negatives. It worked for the same reason no two examiners ever made the same false positive: an error one examiner makes usually isn't an error the next one repeats on the same prints. That's your strongest answer to the false-positive worry. The answer isn't “I don't make mistakes.” It's “a second examiner, blind to my conclusion, checks them independently.”

There's a catch worth saying out loud. The study notes that verification of exclusions, and blind verification in particular, isn't standard practice in many agencies, mostly because of the sheer volume. So the safeguard that catches false negatives is often the one used least. If your lab does blind verification, say so and explain it. If it doesn't, know that gap before a lawyer finds it.

A desaturated grey lab where a second examiner re-examines prints at their own workstation, the first examiner’s conclusion set face-down beside them as a single red folder, hidden from them. — Fig. 6 · A second examiner, blind to your call, re-checks independently, and rarely repeats the same mistake.

And one finding surprises juries. Examiners don't always agree with each other. Each pair in the study was looked at by an average of 23 examiners, and the consensus was limited. They frequently differed on whether prints were even suitable for a conclusion, and it wasn't unusual for one examiner to call something inconclusive while another individualised the very same comparison. An inconclusive is a legitimate call, not a failure or a dodge: the information on hand simply wasn't enough. And unlike a false positive or false negative, the study is careful to note there's no absolute ground-truth criterion for whether reaching a conclusion at all was appropriate. The best yardstick the field has is the pooled judgement of experts — and pooling genuinely works: averaging the independent judgements of several examiners measurably outperforms even strong individuals (Tangen, Kent & Searston, 2020). So if you called something inconclusive and a colleague would have individualised it, that isn't necessarily an error by either of you. It's the real texture of a subjective comparison, and you can say so without a flicker of embarrassment.

“Examiners frequently differed on whether fingerprints were suitable for reaching a conclusion.”

— Ulery, Hicklin, Buscaglia & Roberts (2011)

VII

How to state your error rate clearly

Put it together and you have a way to answer those four words that's calm, accurate, and almost impossible to impeach. You don't claim perfection, and you don't crumble into “well, anything's possible.” You cite the best evidence the field has, you state its limits, and you keep the two kinds of error apart.

The one claim you can never make is a zero error rate. Ulery and Tangen between them show that skilled examiners, real expertise, clearly sharper than novices, still err. Claim zero error and you've contradicted your own discipline's flagship study, marking yourself, to any prepared lawyer, as someone who either hasn't read the science or won't be straight about it. The strongest place to stand in that room is the straight one.

Is that a fair way to state your error rate?

Each is something an examiner might say on the stand about their accuracy or error rate. Call each one, defensible or overstatement, before you reveal it. Grounded in Ulery (2011).

“A large published study found a false-positive rate of about 0.1% under test conditions, though that’s a measure of the discipline, not of my work in this case.”

“I distinguish two errors: false identifications are rare, but erroneous exclusions are far more common, and most examiners in the study made at least one.”

“My method has a zero error rate. I have never made a mistake.”

“The error rate for fingerprints is 0.1%, so the chance I’m wrong in this case is one in a thousand.”

“I’ve never been aware of making an erroneous exclusion, so my false-negative rate must be essentially zero.”

“A second examiner verified my identification blind, without knowing my conclusion.”

“These were the very difficult test prints, so my routine casework is obviously far more accurate than that study.”

“I can’t give you a precise error rate for myself, since individual false-positive rates were too low to measure even in the study, but here is what the discipline-wide research shows.”

Look at what the defensible answers have in common. Every one cites the study correctly, names its limits, and never lets the group rate masquerade as a personal probability. Every overstatement does the reverse: it borrows the study's authority and drops its caveats. That's the whole game.

What to carry into the witness box

01When asked your error rate, cite the large-scale study, roughly 0.1% false positives and 7.5% false negatives under test conditions, rather than a personal guess.
02Always separate the two errors: false identifications (rare, can convict the innocent) from erroneous exclusions (common, can free the guilty).
03Say the limits out loud: it’s a controlled test of the discipline, not a readout of your accuracy in this case, and a group rate isn’t your personal probability.
04Never claim a zero error rate. Your own field’s flagship studies show skilled examiners still err.
05You have hard evidence the expertise exists: trained examiners vastly outperform novices on the hardest look-alike prints.
06Rely on blind verification. An independent second examiner caught every false positive in the study because they can’t repeat an error they never saw.

Challenge 03 · Put it to the test

So what’s your error rate?

You’re on the stand. Counsel has saved the simplest question for last.

The question

“Let me ask it straight. What is the error rate for what you did in this case? Is it zero — you’re certain? Or is it that famous one-in-a-thousand figure, in which case you’re telling this jury there’s a real chance you’ve identified the wrong man?”

Your answerNot graded · think it through

Ask the tutor

Still have questions about the study?

Ask anything about Accuracy and Reliability of Forensic Latent Fingerprint Decisions. The tutor answers from the document itself — and keeps one eye on how it might come up under cross-examination.

Your question↩ to send · ⇧↩ for newline

References

Ulery, B. T., Hicklin, R. A., Buscaglia, J., & Roberts, M. A. (2011). Accuracy and reliability of forensic latent fingerprint decisions. Proceedings of the National Academy of Sciences, 108(19), 7733–7738.
Tangen, J. M., Thompson, M. B., & McCarthy, D. J. (2011). Identifying fingerprint expertise. Psychological Science, 22(8), 995–997.
Tangen, J. M., Kent, K. M., & Searston, R. A. (2020). Collective intelligence in fingerprint analysis. Cognitive Research: Principles and Implications, 5(1), 23.
National Research Council. (2009). Strengthening Forensic Science in the United States: A Path Forward. Washington, DC: The National Academies Press.
Office of the Inspector General. (2006). A Review of the FBI’s Handling of the Brandon Mayfield Case. Washington, DC: U.S. Department of Justice.
Cole, S. A. (2005). More than zero: Accounting for error in latent fingerprint identification. Journal of Criminal Law and Criminology, 95(3), 985–1078.

Next reading

“Inconclusive” Is Not the Safe Answer

Keep going

Put this into practice, or go deeper with the tutor on the full study.

Ask the tutor Practise a session