Proposal for Implementing Sybil-Resilient Identifier and Proof of Uniqueness in Concordium Node

chportma · September 15, 2024, 6:33am

As far as I can tell, the solution you are proposing now is essentially equivalent to Solution 2 in the other thread, which you told us was unacceptable. Adding the PRF after the hash doesn’t change anything to the cryptographic properties, because they are both deterministic functions, so brute force attacks still apply. You have essentially just defined a new hash function which is the composition of the original hash function and the PRF. More specifically,

if the input data is not too long, brute force can be used to recover it,
if the input data is very long but partially known (e.g., you know someone’s name but not their passport number), brute force can be used to recover the missing data,
if the input data is known but not whether the user has subscribed to a service, by computing the CUID and comparing it to those subscribed we can identify if the user is taking part.

In your own words:

Sybil-Resilient Identifier and Proof of Uniqueness

Privacy Issues Under New Guidance: The proposal suggests improving privacy by hashing identity attributes before revealing them to the service provider. While hashing may initially obscure the original data, the FTC guidance emphasizes that hashes do not qualify as anonymization if they create a persistent identifier that can be tracked across contexts. Hashes are susceptible to brute-force attacks or dictionary attacks, especially when common or low-entropy identity attributes are used.

Regulatory Risks: Even though the hashed identity string might appear anonymous, it is still classified as personal data under GDPR and similar frameworks because it remains linkable to individuals. The FTC has explicitly stated that treating hashes as anonymous is misleading, and companies claiming that hashing de-identifies data could face enforcement actions. For GDPR compliance, treating hashed data as anonymous without the ability to fully de-link it from the original identity could lead to significant fines.

I have removed the sentence about consistent hashes across services in the above quote, because that is solved by adding the context as you did.

These privacy weaknesses were the reason we came up with Solution 3, where a seed/key generated by a trusted IDP is used in the PRF. But then setting it up is more complicated because it requires a third party.

I now have no idea if you prefer Solution 2 with weaker privacy be easier set-up, or Solution 3 with better privacy but requires a trusted 3rd party to hold a secret key and help generated the uniqueness key.

Cheers,
Christopher