Proposal for Asynchronous Handling of Zero-Knowledge Proofs (ZKP) in Concordium Node

Introduction

This proposal aims to enhance the performance and scalability of the Concordium Node by implementing asynchronous handling for Zero-Knowledge Proof (ZKP) operations. The Concordium Node already supports ZKP generation and verification as part of its core functionality. By integrating asynchronous processing, we can improve responsiveness and reduce latency during ZKP-related operations, especially in network requests and database handling.


Motivation

While the current ZKP mechanisms in the Concordium Node ensure privacy and security, operations involving ZKP generation, verification, and transmission can become bottlenecks, especially under heavy network load. Asynchronous processing allows the node to perform these operations without blocking other critical tasks, resulting in better throughput and lower latency.

Key benefits of asynchronous ZKP processing:

  • Non-blocking ZKP operations: Improves the node’s responsiveness by ensuring that the node does not have to wait for ZKP operations to complete before moving to other tasks.
  • Concurrent task handling: Reduces the impact of high network and transaction volume on the node.
  • Scalability: Enhances the node’s ability to handle more ZKP operations simultaneously, improving overall system performance.

Conceptual Approach

1. ZKP Verification with Asynchronous Processing

  • Objective: Enable asynchronous ZKP verification during transaction and block validation to improve processing speed and ensure the node doesn’t block during heavy load.
async fn verify_zkp_async(proof: &ZKPProof, statement: &ZKPStatement) -> Result<bool, ZKPError> {
    // Asynchronous ZKP verification
    match zkp_library::verify_async(proof, statement).await {
        Ok(valid) => Ok(valid),
        Err(e) => Err(ZKPError::VerificationFailed(e.to_string())),
    }
}

fn process_block(block: &Block) -> Result<(), ConsensusError> {
    let zkp = block.get_zkp()?;
    let is_valid = tokio::spawn(verify_zkp_async(&zkp.proof, &zkp.statement)).await??;
    if !is_valid {
        return Err(ConsensusError::InvalidZKP);
    }
    Ok(())
}
  • Focus Areas:
    • Update the ZKP verification function to use asynchronous processing.
    • Ensure proper handling of async calls during block and transaction validation to prevent performance bottlenecks.

2. Asynchronous Handling of ZKP-Related Database Operations

  • Objective: Ensure that the storage and retrieval of ZKP-related data are handled asynchronously to improve I/O efficiency and prevent database access from blocking other node operations.
async fn store_zkp_async(block_id: &BlockId, zkp: &ZKPProof) -> Result<(), StorageError> {
    let encoded_zkp = serialize(zkp)?;
    db.put_async(block_id, encoded_zkp).await?;
    Ok(())
}

async fn retrieve_zkp_async(block_id: &BlockId) -> Result<ZKPProof, StorageError> {
    let encoded_zkp = db.get_async(block_id).await?;
    let zkp = deserialize(&encoded_zkp)?;
    Ok(zkp)
}
  • Focus Areas:
    • Convert ZKP-related database operations (store_zkp and retrieve_zkp) to asynchronous functions.
    • Ensure efficient I/O operations that minimize latency in database access.

3. Asynchronous Transmission of ZKP Proofs Between Nodes

  • Objective: Ensure that ZKP proofs are transmitted between nodes asynchronously to avoid delays in communication and improve overall network performance.
async fn send_zkp_async(proof: &ZKPProof, to_node: &NodeId) -> Result<(), NetworkError> {
    let message = NetworkMessage::ZKP(proof.clone());
    network.send_async(to_node, message).await?;
    Ok(())
}

async fn receive_zkp_async(message: NetworkMessage) -> Result<(), NetworkError> {
    match message {
        NetworkMessage::ZKP(proof) => {
            process_zkp_async(proof).await?;
        },
        _ => {},
    }
    Ok(())
}
  • Focus Areas:
    • Implement asynchronous network communication for transmitting and receiving ZKP proofs.
    • Ensure that ZKP proofs are securely transmitted and received without blocking other network activities.

4. Node Startup and Configuration (Asynchronous Integration)

  • Objective: Update the node startup logic to incorporate asynchronous handling for ZKP operations, ensuring that the node remains responsive even during high load.
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    // Node initialization...

    // ZKP async processing integration
    if let Some(url_arg) = env::args().find(|arg| arg.starts_with("--hotsync-url=")) {
        let url = url_arg.split('=').nth(1).expect("Missing URL for --hotsync-url flag");
        hotsync_async(url).await?;
    }

    // Start the Concordium node
    // ...
}
  • Focus Areas:
    • Ensure that ZKP operations are non-blocking during node startup.
    • Add logging and error-handling mechanisms to track and manage async ZKP operations.

Performance Impact and TPS Capacity Estimates

With the introduction of asynchronous handling for ZKP-related operations, we anticipate the following improvements in performance and Transaction Processing Speed (TPS) capacity:

1. Transaction Processing Speed (TPS) Improvement

By enabling asynchronous processing for ZKP verification and database operations, we estimate that the node’s TPS could increase due to reduced blocking in ZKP validation:

  • Current TPS: ~2000 TPS
  • Estimated TPS with Async Processing: ~4000-6000 TPS

This is based on the expected reduction in latency for ZKP verification and database operations, allowing the node to handle more transactions concurrently.

2. Latency Reduction

  • Latency Improvement: We estimate that the introduction of asynchronous processing could reduce the latency of ZKP-related operations by up to 50%, as these operations would no longer block other node tasks.

3. Network and Database Throughput

Asynchronous handling of ZKP transmission and storage will improve the node’s throughput, enabling it to handle more network traffic and database operations concurrently:

  • Network Throughput: Estimated to increase by 2-3 times due to non-blocking communication for ZKP proofs.
  • Database Throughput: Expected to improve by 30-50% as database operations for ZKP storage/retrieval will be handled asynchronously.

4. Node Scalability

By optimizing ZKP processing through async operations, we expect overall node scalability to improve, enabling Concordium to handle a higher volume of nodes without performance degradation:

  • Scalability Factor: Node scalability could improve by 2-3 times, especially under heavy transaction loads.

Implementation Steps

  1. Modify ZKP Verification:
  • Convert existing ZKP verification functions into asynchronous ones and integrate them into the transaction and block validation processes.
  1. Update Database Handling:
  • Modify the storage and retrieval functions for ZKP-related data to be asynchronous, ensuring that database operations don’t block other node tasks.
  1. Implement Asynchronous Network Communication:
  • Enable asynchronous sending and receiving of ZKP proofs to ensure non-blocking network communication between nodes.
  1. Adjust Node Configuration and Startup:
  • Update the node’s startup process to include asynchronous processing for ZKP operations, ensuring smooth operation during initialization.

Testing Plan and Strategy

1. Functional Testing:

  • Ensure that asynchronous ZKP verification and transmission work correctly in different scenarios.
  • Test the async ZKP database operations to verify that data is stored and retrieved efficiently.

2. Performance Testing:

  • Measure the latency and throughput of ZKP verification before and after implementing async processing.
  • Compare node performance under load to ensure async processing improves efficiency without introducing new bottlenecks.

3. Stress Testing:

  • Simulate high network load and transaction volumes to ensure that async ZKP handling scales well.
  • Test for network interruptions or database delays to verify that async error-handling mechanisms are effective.

4. Backward Compatibility:

  • Ensure that the async handling of ZKP operations remains backward-compatible with nodes that do not implement async functionality.

Summary

This proposal builds upon the foundation laid in our first proposal for the --hotsync feature. Together, these enhancements are designed to significantly improve the performance, scalability, and responsiveness of the Concordium Node.

  • Asynchronous ZKP Handling: By implementing non-blocking operations for ZKP verification, database storage, and network communication, the node will experience:
    • Transaction Processing Speed (TPS) improvement from 2000 TPS to an estimated 4000-6000 TPS due to better resource utilization and concurrent processing.
    • Latency reduction by up to 50% for ZKP-related operations, leading to faster block validation and transaction processing.
    • Scalability improvement by 2-3 times, allowing the node to support increased transaction loads and higher node participation in the network.
  • –Hotsync Feature: The first proposal introduced the --hotsync feature, allowing nodes to sync from the latest database dump instead of starting from the genesis block. This reduces sync times from several days to just 20-35 minutes on a 10 Gbps connection, or 12-15 hours on a 10 Mbps connection.

Combined Benefits of Asynchronous ZKP and --Hotsync Features

Together, the asynchronous ZKP handling and --hotsync feature present a comprehensive performance upgrade for the Concordium network:

  • Node Sync Time Reduction: With --hotsync, node sync times are drastically reduced, and asynchronous handling of ZKP operations ensures faster block and transaction validation after syncing.
  • Combined TPS Improvement: When both features are implemented, the network can handle higher TPS due to reduced latency and improved efficiency in block validation. We estimate that the combined improvements could boost TPS from the current 2000 TPS to an estimated 5000-7000 TPS.
  • Improved Network Throughput: With faster sync times and higher TPS, the Concordium network can process a larger volume of transactions while maintaining the security and privacy of ZKP verification.
  • Scalability and Network Growth: Both proposals contribute to increasing the network’s scalability, ensuring that the Concordium Node can handle higher traffic and more nodes without performance degradation.

These combined optimizations will lead to a more robust and scalable Concordium Node, ensuring long-term network efficiency and resilience. The combination of reduced node sync times and increased TPS will allow Concordium to support a growing user base and transaction volume with minimal impact on performance.

Also read:

1 Like

Upon further review of the Haskell implementation in the Concordium node, I realize that my initial proposal for async ZKP handling was somewhat generic and did not consider some of the nuances in Concordium’s existing cryptographic routines. After studying the current structure more closely, here are the refinements I suggest for implementing asynchronous handling for Zero-Knowledge Proofs (ZKP):

  1. Parallelism in Cryptographic Operations: The Haskell concurrency model (via async and STM) can be leveraged more effectively for ZKP verification and generation. Rather than processing ZKPs in a fully sequential manner, we could offload computationally expensive parts to parallel threads, utilizing multicore capabilities to enhance throughput.
  2. Database Interactions: A specific area where async I/O operations could be beneficial is the interaction between ZKP validation and the node’s database layer. Implementing non-blocking database reads/writes for ZKP-related data can free up resources for other tasks, particularly during high transaction throughput. My deeper analysis suggests that using Haskell’s IORef or MVar abstractions would allow for safe, non-blocking state updates during proof verification.
  3. Bounded Task Queues: To prevent overwhelming the system during peak load, a bounded task queue could be implemented to manage asynchronous ZKP verification requests, ensuring that the node doesn’t become bottlenecked by excessive ZKP operations. This is crucial for maintaining network resilience under heavy transaction load.

These optimizations are critical as they align with Concordium’s performance goals, helping to reduce transaction latency and improve TPS (transactions per second) by ensuring ZKP operations do not block critical paths.

I invite feedback on these refinements, particularly on how we can best balance async cryptographic operations with the node’s consensus logic to maintain both performance and security.