21 Securing Your Fingerprints: Fuzzy Vaults
21.1 Some quick background on fingerprints
A fingerprint (image / sensor reading) is stored as a collection of triples, called minutiae. Each minutia is located where a ridge of the fingerprint splits into two, or where a ridge ends. The
\(\)i^th\( minutia is the triple \)(x_i, y_i, θ_i)\(, where \)x_i\( and \)y_i\( are the \)x\(- and \)y\(-coordinates of a point on the finger, and \)θ_i\( indicates the direction of the ridge that created the minutia point, relative to the plane. Since measurements cannot be infinitely precise, we quantize each minutia \)(x_i,y_i,θ_i)\( and embed it into \)F_q\( for some large enough prime power \)q\(. A (quantized) fingerprint reading is thus naturally modeled as an unordered collection of elements of \)F_q\(, i.e.\ an element \)f F_qt\( (the set of all subsets of \)F_q\( of size exactly \)t\(), for some integer \)t ≥1\( that is the number of minutiae recorded. The set-valued (rather than vector-valued) representation is deliberate: two readings of the same finger are subject to translation/rotation, varying pressure, and incomplete overlap of the sensed region, so the specific ordered values \)(x_i,y_i,θ_i)\( are not individually meaningful and two readings need not even have the same number of minutiae in correspondence -- but two readings of the same finger should have significant \emph{overlap} as sets. \)
Fix any off-the-shelf secure hash function
\(\)h\( for strings. Given a fingerprint reading \)f\(, sort its minutiae (e.g.\ lexicographically) to produce a string, and store \)h(f)\( in place of \)f\(. This naive solution is inadequate. Even setting aside the possibility of correcting translation, rotation and pressure errors, the following issue is fundamental: two readings \)f, f’\( of the very same finger essentially never yield exactly the same set of (quantized) minutiae, because \begin{enumerate} \item translation and rotation occur even between two readings of the same finger, \item the pressure applied varies between readings, \item each reading may only partially overlap the ideal fingerprint region (the finger may be tilted), and \item the minutiae form an unordered set rather than a vector, so even after sorting, small perturbations to individual coordinates can change the sorted string entirely. \end{enumerate} Consequently \)f ≠f’\( (as sets, or as sorted strings) even though \)f\( and \)f’\( come from the same finger, so \)h(f) ≠h(f’)\( in general, and a standard string hash function gives no way to test whether two hashes come from readings that are merely \emph{close} (i.e.\ have large set overlap) rather than identical. Existing hash functions for strings are built for vectors, not sets, so this naive approach fails. \)
21.2 The Fuzzy Vault Problem
We first isolate a cleaner problem than "hash a fingerprint": given a secret string \(\)s\(, use a fingerprint \)f\( to "lock" \)s\( so that (1) \)s\( can later be "unlocked" using a close-enough fingerprint reading \)f’\(, but (2) no one else should be able to recover \)s\( from the locked value. (Solving this fuzzy vault problem in turn solves the problem of designing a secure hash function for fingerprints: pick \)s\( at random, store the locked version of \)s\( together with \)h(s)\( for an off-the-shelf secure string hash function \)h\(.) \)
21.2.1 Formal Problem Statement
For any integer
\(\)t ≥1\(, let \)F_qt\( denote the set of all subsets of \)F_q\( of size exactly \)t\(. The fuzzy vault problem has the following components: \begin{itemize} \item Integers \end{itemize}\)k ≥1\(, \)n ≥t ≥1\(; \item a secret \)s F_q^k\(; \item a fingerprint \)f F_qt\(; \item a locking function \)LOCK : F_q^k ×F_qt F_qn\(; \item an unlocking function \)UNLOCK : F_qt ×F_qn F_q^k\(. \)
The goal is to design \(\)LOCK\( and \)UNLOCK\( satisfying, for some parameter \)c < t\(: \begin{enumerate} \item \textbf{($c$-completeness)}: for any \end{enumerate}\)f, f’ F_qt\( with \)|f - f’| ≤c\(, \[ \mathrm{UNLOCK}\bigl(\mathrm{LOCK}(s,f), f'\bigr) = s. \] \item \textbf{(Soundness)}: it should be "hard" for an adversary to recover \)s\( from \)LOCK(s,f)\( alone. (The notion of "hard" is only informally discussed here, via Lemma~ \ref{lem:c21-random-vault-count} in Section~ \ref{sec:c21-soundness-informal}.) \)
(\(\)c\(-completeness) corresponds to the matching property required of a fingerprint hash function (recognizing readings of the same finger), while (Soundness) corresponds to its security property. \)
21.2.2 Two Futile Attempts
For this subsection, unless stated otherwise, the original fingerprint \(\)f\( is given by a fixed set \){α_1,…,α_t}\(. \begin{thmenv}[Attempt 1: random chaff points] \label{constr:c21-attempt1} \uses{def:c21-fuzzy-vault-problem} To get good soundness, take the fingerprint \end{thmenv} \)f = {α_1,…,α_t}\( and add \)n-t\( uniformly random additional values of \)F_q\( (called \emph{chaff} points) to it, and output the resulting \)n\(-element set as the vault; the secret \)s\( itself is not stored in any recognizable form. Intuitively, an adversary who only sees the vault sees \)n\( points that look uniformly random, and cannot single out which \)t\( of them came from \)f\( (hence cannot recover \)s\(), giving good soundness. However, this scheme has terrible completeness: given a second reading \)f’\( and a value in \)f’ ∩LOCK(s,f)\(, there is no way to tell whether the matching value came from one of the genuine points of \)f\( or from one of the randomly added chaff points, so there is no way to recover \)s\( even when \)f’\( is very close to \)f\(. \)
Given
\(\)s = (s_0,…,s_k-1) F_q^k\( and \)f = {α_1,…,α_t}\( (so here \)n = t\(), let \)P_s(X) = ∑_i=0^k-1 s_i X^i\( be the associated message polynomial (as in the Reed-Solomon encoding of \)s\(), and define \[ \mathrm{LOCK}_2(s,f) = \bigl\{ (\alpha _1, P_s(\alpha _1)), \dots , (\alpha _t, P_s(\alpha _t)) \bigr\} . \] The intuition behind \)LOCK_2\( is the following: given another fingerprint \)f’ = {β_1,…,β_t}\( close enough to \)f\(, i.e. \)|f ∖f’| ≤c\(, for every value in \)f ∩f’\( we know the corresponding \)P_s\(-value from the vault, so we can decode the Reed-Solomon code from erasures to recover \)s\(. \)
Input: vault \(\){(α_1,y_1),…,(α_t,y_t)} = LOCK_2(s,f)\( and another fingerprint \)f’ = {β_1,…,β_t}\(. \textbf{Output:} \)s\(, if \)|f ∖f’| ≤c\(. \begin{enumerate} \item For \end{enumerate} \)i = 1,…,t\(: if there exists \)j [t]\( with \)α_i = β_j\(, set \)z_j ←y_i\(; otherwise leave \)z_j\( marked as an erasure ("\)?\("). \item Let \)z = (z_1,…,z_t)\(. \item Run the Reed-Solomon erasure-decoding algorithm of Theorem 14.2.1 to correct \)z\( from erasures, for the Reed-Solomon code with evaluation points \){β_1,…,β_t}\(, and output the resulting message as \)s\(. \)
The pair
\(\)(LOCK_2, UNLOCK_2)\( of functions is \)(t-k)\(-complete. Further, both functions can be implemented in polynomial time. \)
Assume
\(\)|f ∖f’| ≤t-k\(. Since both \)f\( and \)f’\( have exactly \)t\( values, this means that \)z\( (as constructed in Algorithm~ \ref{alg:c21-unlock2}) has at most \)t-k\( erasures. Thus, by Theorem 14.2.1, Step 3 of Algorithm~ \ref{alg:c21-unlock2} outputs \)s\(, and \)UNLOCK_2\( can be implemented in polynomial time. The polynomial running time of \)LOCK_2\( follows from the fact that one can encode a Reed-Solomon codeword (i.e.\ evaluate \)P_s\( at \)t\( points) in polynomial time. \)
Unfortunately the pair \(\)(LOCK_2, UNLOCK_2)\( has terrible soundness. This is because the vault \){(α_1,y_1),…,(α_t,y_t)}\( has the true fingerprint \)f\( sitting in the first coordinate of every pair, in the clear. An adversary can simply read off these values and present \)f’ = {α_1,…,α_t} = f\( as a second fingerprint reading, whence \)UNLOCK_2(LOCK_2(s,f), f’) = s\(: the vault is "broken." \)
21.3 The Final Fuzzy Vault
We now combine the good soundness (intuition) of Construction 452 with the good completeness of Construction 453 / Algorithm 454 to get the final fuzzy vault scheme.
Input: fingerprint \(\)f = {α_1,…,α_t}\( and secret \)s = (s_0,…,s_k-1) F_q^k\(. \textbf{Output:} a vault \)R\( (an \)n\(-element subset of \)F_q ×F_q\() with \)f\( locking \)s\(. \begin{enumerate} \item Initialize \end{enumerate} \)R ←∅\(, \)T ←∅\(, and let \)P_s(X) = ∑_i=0^k-1 s_i X^i\(. \item For \)i = 1,…,t\(: set \)T ←T ∪{α_i}\( and \)R ←R ∪{(α_i, P_s(α_i))}\(. \item For \)i = t+1,…,n\(: \begin{enumerate} \item let \end{enumerate}\)α\( be a uniformly random element of \)F_q ∖T\(, and set \)T ←T ∪{α}\(; \item let \)γ\( be a uniformly random element of \)F_q ∖{P_s(α)}\(, and set \)R ←R ∪{(α,γ)}\(. \)
Randomly permute \(\)R\( and return \)R\(. \)
That is, \(\)LOCK_3(s,f)\( consists of the \)t\( genuine points \)(α_i, P_s(α_i))\( from \)LOCK_2(s,f)\(, together with \)n-t\( additional random \emph{chaff} points \)(α,γ)\( with \)α\( a fresh element of \)F_q\( and \)γ≠P_s(α)\(, all randomly permuted together. \)
Input: vault \(\){(α_1,y_1),…,(α_n,y_n)} = LOCK_3(s,f)\( and another fingerprint \)f’ = {β_1,…,β_t}\(. \textbf{Output:} \)s\(, if \)|f ∖f’| ≤c\(. \begin{enumerate} \item For each \end{enumerate} \)j = 1,…,t\(: if there exists \)i [n]\( with \)α_i = β_j\(, set \)z_j ←y_i\(; otherwise leave \)z_j\( marked as an erasure ("\)?\("). \item Let \)z = (z_1,…,z_t)\(. \item Run the Reed-Solomon errors-and-erasures decoding algorithm of Theorem 14.2.2 to correct \)z\( from errors and erasures, for the Reed-Solomon code with evaluation points \){β_1,…,β_t}\(, and output the resulting message as \)s\(. \)
The pair
\(\)(LOCK_3, UNLOCK_3)\( of functions is \)(t-k)/2\(-complete. Further, both functions can be implemented in polynomial time. \)
Assume
\(\)|f ∖f’| ≤(t-k)/2\(. Since both \)f\( and \)f’\( have exactly \)t\( values, this implies \)|f ∩f’| ≥(t+k)/2\(. Further, for each \)j [t]\( with \)β_j f ∩f’\( we have (by construction of the vault in Algorithm~ \ref{constr:c21-lock3}) that \)z_j = P_s(β_j)\(. In other words, \)z\( (as constructed in Algorithm~ \ref{alg:c21-unlock3}) has at most \)(t-k)/2\( errors, where we count as an error any position \)j\( for which \)z_j\( is assigned a value but \)z_j ≠P_s(β_j)\( (this happens exactly when \)β_j f ∩f’\( but \)β_j\( nonetheless matches some vault point, i.e.\ a chaff point, or a genuine point of \)f\( not equal to \)β_j\('s intended match). More precisely, if \)z\( has \)e\( errors and \)s’\( erasures with respect to the codeword corresponding to \)s\(, then \)2e + s’ ≤t-k\(. Thus, by Theorem 14.2.2, Step 3 of Algorithm~ \ref{alg:c21-unlock3} outputs \)s\(, and \)UNLOCK_3\( can be implemented in polynomial time. The polynomial running time of \)LOCK_3\( follows from the fact that one can encode a Reed-Solomon codeword in polynomial time. \)
21.3.1 Soundness
To avoid too many technical details, the book gives only a high-level argument for the soundness of the vault produced by Construction 456. Given a vault \(\){(α_1,y_1),…,(α_n,y_n)} = LOCK_3(s,f)\(, there are exactly \)t\( values (those \)α_j f\() at which the polynomial \)P_s(X)\( agrees with the vault. An intuitive way to argue soundness is to argue that there are many \emph{other} secrets \)s’ F_q^k\( such that \)P_s’\( also agrees with the vault in exactly \)t\( positions. (One can formalize this intuition into a formal definition of soundness, but this formalization is omitted here.) Instead, the following related result -- for a vault with \emph{completely} random points, rather than one produced by \)LOCK_3\( – is proved, as a first step towards such an argument. \begin{thmenv}[Lemma 21.3.2] \label{lem:c21-random-vault-count} \uses{def:c21-fuzzy-vault-problem} Let \end{thmenv} \)V = {(x_1,y_1),…,(x_n,y_n)}\( be \)n\( independent uniformly random points from \)F_q ×F_q\(. Then, in expectation, there are at least \[ \frac{1}{3} \cdot q^k \cdot \left( \frac{n}{qt} \right)^t \] polynomials \)P(X)\( of degree at most \)k-1\( such that, for exactly \)t\( values of \)j [n]\(, we have \)P(x_j) = y_j\(. \)
Fix a polynomial
\(\)P(X)\( of degree at most \)k-1\( and an index \)j [n]\(. For any \)x_j F_q\(, since \)y_j\( is chosen uniformly at random in \)F_q\( independently of \)x_j\(, the probability that \)P(x_j) = y_j\( is exactly \)1/q\(. Furthermore, these events (over \)j [n]\() are mutually independent, since the \)n\( points of \)V\( are independent. Hence the probability that \)P(X)\( agrees with \)V\( in exactly \)t\( positions is \[ \binom {n}{t} \cdot \left(\frac{1}{q}\right)^t \cdot \left(1 - \frac{1}{q}\right)^{n-t} \; \geq \; \frac{1}{3} \left(\frac{n}{qt}\right)^t . \] Since there are \)q^k\( polynomials of degree at most \)k-1\( over \)F_q\(, linearity of expectation gives that the expected number of such polynomials agreeing with \)V\( in exactly \)t\( positions is at least \[ q^k \cdot \frac{1}{3} \left(\frac{n}{qt}\right)^t = \frac{1}{3} \cdot q^k \cdot \left(\frac{n}{qt}\right)^t, \] as claimed. \)
Two aspects of Lemma 459 fall short of a full soundness guarantee for \(\)LOCK_3\(: (i) it applies to a vault \)V\( with completely random points, whereas one would want the analogous statement for \)V = LOCK_3(s,f)\( (whose \)t\( genuine points are not random pairs, only the \)n-t\( chaff points are); and (ii) it gives a bound in expectation, whereas one would want an exponential lower bound that holds with high probability (e.g.\ via an "inverse Markov inequality"). The book leaves the resolution of both points as an exercise. \)