Sybil-Resilient Identifier and Proof of Uniqueness

VikingTechGuy · August 23, 2024, 2:52am

Thanks for posting the proposals @tschudid

Response to Proposal: Sybil-Resilient Identifier and Proof of Uniqueness

The proposal outlines several approaches for generating Sybil-resistant identifiers, with varying levels of privacy protection. Evaluating these models against AesirX’s mission for privacy, particularly focusing on indirect zero-knowledge proofs (ZKPs), and in light of the latest FTC guidance on hashing, reveals critical privacy issues that need addressing. Below is a breakdown of the concerns and regulatory risks, along with recommendations for improvement.

1. Direct Attribute Revelation (First Model):

Privacy Issues: This approach requires users to reveal identity attributes such as their first name, last name, and birthdate to the service provider. Although zero-knowledge proofs (ZKPs) are used to verify the correctness of the revealed information, the service provider still gains access to these attributes, creating significant privacy risks. Not only does this expose users to profiling and potential data breaches, but it also places trust entirely in the service provider to handle the data responsibly.

Regulatory Risks: Under GDPR, revealing and storing such identifiable information makes the service provider a data processor, triggering obligations related to data security, transparency, and obtaining explicit user consent. This approach would also likely be considered non-compliant with the FTC’s updated stance on privacy, which stresses that even when data is obscured through techniques like hashing or pseudonymization, it is still considered personal data if it can be linked back to an individual.

2. Hashing Identity Attributes (Second Model):

Privacy Issues Under New Guidance: The proposal suggests improving privacy by hashing identity attributes before revealing them to the service provider. While hashing may initially obscure the original data, the FTC guidance emphasizes that hashes do not qualify as anonymization if they create a persistent identifier that can be tracked across contexts. Hashes are susceptible to brute-force attacks or dictionary attacks, especially when common or low-entropy identity attributes are used. Additionally, consistent hashes across different services can be used to track users, thereby compromising privacy.

Regulatory Risks: Even though the hashed identity string might appear anonymous, it is still classified as personal data under GDPR and similar frameworks because it remains linkable to individuals. The FTC has explicitly stated that treating hashes as anonymous is misleading, and companies claiming that hashing de-identifies data could face enforcement actions. For GDPR compliance, treating hashed data as anonymous without the ability to fully de-link it from the original identity could lead to significant fines.

Link to read more on Hashing as Identifiers.

3. Context-Dependent Identifiers with PRF (Third Model):

Privacy Improvements: This approach introduces context-dependent identifiers generated using a pseudo-random function (PRF), which adds some privacy benefits by making identifiers unique to specific services. This can prevent cross-service tracking and linkability if implemented correctly. However, the model’s effectiveness still hinges on the robustness of the PRF and how consistently the identifiers are generated across contexts. According to the latest FTC guidance, even context-dependent identifiers could be treated as personal data if they act as persistent identifiers and facilitate tracking.

Regulatory Risks: Although this model performs better from a privacy perspective, companies must ensure that PRF-derived identifiers do not create patterns that can be linked across services. If any persistent identifier is generated that allows tracking, the organization would be classified as a data processor under GDPR. This means handling these identifiers would require the same level of regulatory compliance, including transparency, obtaining informed consent, and secure data handling practices.

4. General Issues with Centralized Identity Providers (IDPs):

Data Processor Classification: Centralized identity providers (IDPs) that issue credentials and store uniqueness keys are likely to be classified as data controllers under GDPR. Any site or service interacting with these IDPs would be subject to joint processing responsibilities, making them liable for how user data is managed and shared. Furthermore, placing too much reliance on centralized IDPs introduces significant privacy risks, including large-scale data breaches and profiling concerns, which run counter to AesirX’s principles of decentralization and user empowerment.

5. Updated Privacy Considerations:

Avoid Misleading Claims About Anonymization: The latest FTC guidance clearly states that companies should not claim that hashing anonymizes data if it does not fully de-link the data from individuals. In the context of this proposal, it is crucial to avoid misleading users about the privacy benefits of these models. Persistent identifiers, whether hashed or PRF-derived, must be treated as personal data and handled accordingly.

Enhance Transparency and Consent Mechanisms: Even with improved privacy protections through cryptographic techniques, organizations must still be transparent about how identifiers are generated and used. Consent mechanisms should clearly explain the potential for tracking and data sharing, ensuring that users are fully informed about how their data is being processed and safeguarded.

Focus on Indirect Zero-Knowledge Proofs: Aligning with AesirX’s privacy-first approach, solutions should focus more on indirect ZKPs that avoid revealing or storing any sensitive information, even in hashed form. This approach minimizes the risk of tracking and re-identification, better aligning with both the GDPR and the FTC’s stricter guidelines.

Conclusion and Recommendations:

The context-dependent identifier model shows potential, but it must be carefully managed to ensure that identifiers are not linkable across different services. Over-reliance on hashing or centralized identity systems introduces privacy and regulatory risks that are avoidable with better privacy-by-design principles.

For AesirX-aligned implementations, the use of first-party, decentralized solutions that avoid the need for tracking or linking user data remains the most privacy-compliant approach. Implementing indirect ZKPs ensures that users’ identities remain protected while still providing the necessary verifications for Sybil resistance.

Outline of Proposed Model from AesirX
(Focusing only on Uniqueness of ID and Anti-Sybil with optimal compliance of privacy)

1. Mechanism of ID Generation and Validation:

Inputs: The model takes three distinct components from a user’s identity credential:

Base ID credentials (e.g., first name, last name, date of birth).
Document type (e.g., passport, driver’s license).
Document ID (e.g., passport number or driver’s license number).

Hashing Process: These three components are combined and hashed together using a secure cryptographic hash function. The resulting hashed value is a unique identifier (UID) that cannot be easily reverse-engineered due to the complexity of the inputs (similar to the “three-body problem” concept in physics, where the interaction between three elements creates unpredictability).

Compare Function: The hashed UID is never directly revealed. Instead, it is stored in a secure environment where it is only used for comparison purposes. For example, when a new user attempts to register, the system hashes the same three components and checks whether the resulting UID already exists in the system. If it does, it indicates that the ID has been used before, blocking duplicate registrations. This ensures that the same ID credentials cannot be reused to create multiple identities.

Indirect Proof: The process functions as an indirect proof of uniqueness because the hashed UID is only accessible via the comparison function. It is impossible to trace the UID back to the original inputs, protecting user privacy.

2. Data Processing and Collection by the Site Owner:

In the AesirX model, the site owner would collect and store the hashed UID to ensure that a user is not registering multiple wallets or credentials with the same underlying identity. However, the site owner never gains access to the original ID components, only the resulting hash.

Assessment of Privacy and Compliance

1. Privacy Strengths:

Indirect Proof with Secure Hashing: Unlike direct attribute revelation or single-component hashing (as seen in the Concordium proposal), this approach leverages the complexity of combining three components, making it extremely difficult to reverse-engineer the original inputs. The “three-body” principle creates significant unpredictability, enhancing privacy.

Minimized Data Exposure: The original ID components are never exposed, and the UID is only accessible through a comparison function. This setup significantly reduces the risk of tracking, profiling, or cross-service linkability, aligning well with AesirX’s commitment to privacy by design.

No Persistent Identifiers Across Services: Since the UID is only used within the specific context of preventing duplicate registrations, it avoids the pitfalls of cross-service tracking, a key concern in the FTC’s guidance.

2. Regulatory Compliance:

FTC and GDPR Compliance: The FTC guidance emphasizes that hashing alone does not make data anonymous if the hashed value can be used as a persistent identifier. In this model, the UID is only generated for comparison and is never exposed or used beyond this function. This greatly reduces the risk of non-compliance. However, site owners would still need to ensure that the hashing function is robust and that no information is leaked during the comparison process.

Data Processor Classification: In this setup, the site owner still handles the hashed UID, making them a potential data processor under GDPR. However, since they never store or access the original ID components, the compliance burden is significantly reduced. Explicit consent would still be required from users, along with transparency about the purpose and handling of their data.

3. Sybil-Resistance and Security:

Strong Sybil-Resistance: The combined hashing of three inputs ensures that even if a user has multiple ID documents (e.g., two passports), it is much harder for them to generate multiple UIDs that evade detection. This approach provides robust protection against Sybil attacks.

Resilience Against Attacks: The three-component structure increases entropy, making brute-force or dictionary attacks highly impractical. Additionally, the inability to reverse-engineer the UID further enhances security.

Comparison with the 3 Model Proposal

Direct Attribute Revelation (First Model):

Proposal Model 1: Exposes identity attributes directly, relying on the service provider to handle this data securely.

AesirX Proposal: Never exposes the original identity attributes; the UID is only accessible via comparison, significantly reducing the risk of data breaches or misuse.

Hashing Identity Attributes (Second Model):

Proposal Model 2: Uses a simple hash of identity attributes, which can still be vulnerable to brute-force attacks or re-identification, especially given the FTC’s concerns about treating hashes as anonymous.

AesirX Proposal: Enhances security by combining three diverse components, creating a more unpredictable and secure hashed output, aligned with modern privacy guidelines.

Context-Dependent Identifiers with PRF (Third Model):

Proposal Model 3: Introduces context-specific identifiers, which improve privacy but still rely on persistent identifiers across contexts, risking tracking.

AesirX Proposal: Avoids persistent identifiers entirely by limiting the UID’s scope to a single function (detecting duplicate registrations). This minimizes the potential for tracking or misuse.

Conclusion and Recommendations

The AesirX proposal offers a more privacy-preserving solution than the three proposed models by leveraging a multi-component hashing approach that is harder to reverse-engineer and is only used within a narrowly defined context. This aligns closely with both GDPR and FTC guidelines, reducing the regulatory risks associated with persistent identifiers.

The indirect proof model of ensuring uniqueness without exposing the original data strikes a balance between robust Sybil-resistance and user privacy, offering a superior alternative to the approaches discussed in the three model proposal.

By focusing on minimizing data exposure and using indirect proofs, the AesirX approach provides a stronger privacy model that remains compliant with evolving regulations and privacy best practices.

I hope i have helped to clarify why the privacy compliance aspects of designing these solutions and features are essential and why our previously proposed model since start of this year is still the better solution.

PS: Our proposal (also on this forum) also contains Seamless ID with focus on prioritizing eIDs as trusted IdentityProviderType to even enhance anti sybil attack measures, as the governmental issued eIDs are the strongest in the market and fully supports eIDAS2.0 which is the future direction EU has taken as well as the use for this as a future base to extend to Seamless KYC also, but in this response i have focused only on the first part, resolving the problem of uniqueness in the most ideal method.