Introduction
This proposal aims to enhance the performance and scalability of the Concordium Node by implementing asynchronous handling for Zero-Knowledge Proof (ZKP) operations. The Concordium Node already supports ZKP generation and verification as part of its core functionality. By integrating asynchronous processing, we can improve responsiveness and reduce latency during ZKP-related operations, especially in network requests and database handling.
Motivation
While the current ZKP mechanisms in the Concordium Node ensure privacy and security, operations involving ZKP generation, verification, and transmission can become bottlenecks, especially under heavy network load. Asynchronous processing allows the node to perform these operations without blocking other critical tasks, resulting in better throughput and lower latency.
Key benefits of asynchronous ZKP processing:
- Non-blocking ZKP operations: Improves the node’s responsiveness by ensuring that the node does not have to wait for ZKP operations to complete before moving to other tasks.
- Concurrent task handling: Reduces the impact of high network and transaction volume on the node.
- Scalability: Enhances the node’s ability to handle more ZKP operations simultaneously, improving overall system performance.
Conceptual Approach
1. ZKP Verification with Asynchronous Processing
- Objective: Enable asynchronous ZKP verification during transaction and block validation to improve processing speed and ensure the node doesn’t block during heavy load.
async fn verify_zkp_async(proof: &ZKPProof, statement: &ZKPStatement) -> Result<bool, ZKPError> {
// Asynchronous ZKP verification
match zkp_library::verify_async(proof, statement).await {
Ok(valid) => Ok(valid),
Err(e) => Err(ZKPError::VerificationFailed(e.to_string())),
}
}
fn process_block(block: &Block) -> Result<(), ConsensusError> {
let zkp = block.get_zkp()?;
let is_valid = tokio::spawn(verify_zkp_async(&zkp.proof, &zkp.statement)).await??;
if !is_valid {
return Err(ConsensusError::InvalidZKP);
}
Ok(())
}
- Focus Areas:
- Update the ZKP verification function to use asynchronous processing.
- Ensure proper handling of async calls during block and transaction validation to prevent performance bottlenecks.
2. Asynchronous Handling of ZKP-Related Database Operations
- Objective: Ensure that the storage and retrieval of ZKP-related data are handled asynchronously to improve I/O efficiency and prevent database access from blocking other node operations.
async fn store_zkp_async(block_id: &BlockId, zkp: &ZKPProof) -> Result<(), StorageError> {
let encoded_zkp = serialize(zkp)?;
db.put_async(block_id, encoded_zkp).await?;
Ok(())
}
async fn retrieve_zkp_async(block_id: &BlockId) -> Result<ZKPProof, StorageError> {
let encoded_zkp = db.get_async(block_id).await?;
let zkp = deserialize(&encoded_zkp)?;
Ok(zkp)
}
- Focus Areas:
- Convert ZKP-related database operations (
store_zkp
andretrieve_zkp
) to asynchronous functions. - Ensure efficient I/O operations that minimize latency in database access.
- Convert ZKP-related database operations (
3. Asynchronous Transmission of ZKP Proofs Between Nodes
- Objective: Ensure that ZKP proofs are transmitted between nodes asynchronously to avoid delays in communication and improve overall network performance.
async fn send_zkp_async(proof: &ZKPProof, to_node: &NodeId) -> Result<(), NetworkError> {
let message = NetworkMessage::ZKP(proof.clone());
network.send_async(to_node, message).await?;
Ok(())
}
async fn receive_zkp_async(message: NetworkMessage) -> Result<(), NetworkError> {
match message {
NetworkMessage::ZKP(proof) => {
process_zkp_async(proof).await?;
},
_ => {},
}
Ok(())
}
- Focus Areas:
- Implement asynchronous network communication for transmitting and receiving ZKP proofs.
- Ensure that ZKP proofs are securely transmitted and received without blocking other network activities.
4. Node Startup and Configuration (Asynchronous Integration)
- Objective: Update the node startup logic to incorporate asynchronous handling for ZKP operations, ensuring that the node remains responsive even during high load.
#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
// Node initialization...
// ZKP async processing integration
if let Some(url_arg) = env::args().find(|arg| arg.starts_with("--hotsync-url=")) {
let url = url_arg.split('=').nth(1).expect("Missing URL for --hotsync-url flag");
hotsync_async(url).await?;
}
// Start the Concordium node
// ...
}
- Focus Areas:
- Ensure that ZKP operations are non-blocking during node startup.
- Add logging and error-handling mechanisms to track and manage async ZKP operations.
Performance Impact and TPS Capacity Estimates
With the introduction of asynchronous handling for ZKP-related operations, we anticipate the following improvements in performance and Transaction Processing Speed (TPS) capacity:
1. Transaction Processing Speed (TPS) Improvement
By enabling asynchronous processing for ZKP verification and database operations, we estimate that the node’s TPS could increase due to reduced blocking in ZKP validation:
- Current TPS: ~2000 TPS
- Estimated TPS with Async Processing: ~4000-6000 TPS
This is based on the expected reduction in latency for ZKP verification and database operations, allowing the node to handle more transactions concurrently.
2. Latency Reduction
- Latency Improvement: We estimate that the introduction of asynchronous processing could reduce the latency of ZKP-related operations by up to 50%, as these operations would no longer block other node tasks.
3. Network and Database Throughput
Asynchronous handling of ZKP transmission and storage will improve the node’s throughput, enabling it to handle more network traffic and database operations concurrently:
- Network Throughput: Estimated to increase by 2-3 times due to non-blocking communication for ZKP proofs.
- Database Throughput: Expected to improve by 30-50% as database operations for ZKP storage/retrieval will be handled asynchronously.
4. Node Scalability
By optimizing ZKP processing through async operations, we expect overall node scalability to improve, enabling Concordium to handle a higher volume of nodes without performance degradation:
- Scalability Factor: Node scalability could improve by 2-3 times, especially under heavy transaction loads.
Implementation Steps
- Modify ZKP Verification:
- Convert existing ZKP verification functions into asynchronous ones and integrate them into the transaction and block validation processes.
- Update Database Handling:
- Modify the storage and retrieval functions for ZKP-related data to be asynchronous, ensuring that database operations don’t block other node tasks.
- Implement Asynchronous Network Communication:
- Enable asynchronous sending and receiving of ZKP proofs to ensure non-blocking network communication between nodes.
- Adjust Node Configuration and Startup:
- Update the node’s startup process to include asynchronous processing for ZKP operations, ensuring smooth operation during initialization.
Testing Plan and Strategy
1. Functional Testing:
- Ensure that asynchronous ZKP verification and transmission work correctly in different scenarios.
- Test the async ZKP database operations to verify that data is stored and retrieved efficiently.
2. Performance Testing:
- Measure the latency and throughput of ZKP verification before and after implementing async processing.
- Compare node performance under load to ensure async processing improves efficiency without introducing new bottlenecks.
3. Stress Testing:
- Simulate high network load and transaction volumes to ensure that async ZKP handling scales well.
- Test for network interruptions or database delays to verify that async error-handling mechanisms are effective.
4. Backward Compatibility:
- Ensure that the async handling of ZKP operations remains backward-compatible with nodes that do not implement async functionality.
Summary
This proposal builds upon the foundation laid in our first proposal for the --hotsync
feature. Together, these enhancements are designed to significantly improve the performance, scalability, and responsiveness of the Concordium Node.
- Asynchronous ZKP Handling: By implementing non-blocking operations for ZKP verification, database storage, and network communication, the node will experience:
- Transaction Processing Speed (TPS) improvement from 2000 TPS to an estimated 4000-6000 TPS due to better resource utilization and concurrent processing.
- Latency reduction by up to 50% for ZKP-related operations, leading to faster block validation and transaction processing.
- Scalability improvement by 2-3 times, allowing the node to support increased transaction loads and higher node participation in the network.
- –Hotsync Feature: The first proposal introduced the
--hotsync
feature, allowing nodes to sync from the latest database dump instead of starting from the genesis block. This reduces sync times from several days to just 20-35 minutes on a 10 Gbps connection, or 12-15 hours on a 10 Mbps connection.
Combined Benefits of Asynchronous ZKP and --Hotsync Features
Together, the asynchronous ZKP handling and --hotsync
feature present a comprehensive performance upgrade for the Concordium network:
- Node Sync Time Reduction: With
--hotsync
, node sync times are drastically reduced, and asynchronous handling of ZKP operations ensures faster block and transaction validation after syncing. - Combined TPS Improvement: When both features are implemented, the network can handle higher TPS due to reduced latency and improved efficiency in block validation. We estimate that the combined improvements could boost TPS from the current 2000 TPS to an estimated 5000-7000 TPS.
- Improved Network Throughput: With faster sync times and higher TPS, the Concordium network can process a larger volume of transactions while maintaining the security and privacy of ZKP verification.
- Scalability and Network Growth: Both proposals contribute to increasing the network’s scalability, ensuring that the Concordium Node can handle higher traffic and more nodes without performance degradation.
These combined optimizations will lead to a more robust and scalable Concordium Node, ensuring long-term network efficiency and resilience. The combination of reduced node sync times and increased TPS will allow Concordium to support a growing user base and transaction volume with minimal impact on performance.
Also read: