Load Testing - Plan
This document outlines the first load test for the Shannon upgrade. IT IS NOT intended to be an exhaustive evaluation of the entire system's performance. IT IS intended to give visibility into the business logic of the platform, and create a baseline for future load tests.
All Pocket loadtest issues on GitHub can be found here: https://github.com/pokt-network/poktroll/issues?q=label%3Aloadtest+sort%3Aupdated-desc
Goals
De-riskthe network’s feasibility to have completely permissionless services & actorsThis is intended for
scalabilitypurposes and does not account forSybilattacksStress testthe SMT (Sparse Merkle Trie) and how it is being usedBuild intuitioninto the cost of operating the network for all of the stakeholders involved, both on & off chainGain visibilityinto basic metrics (disk, RAM, CPU, ingress/egress traffic, etc.…) for our network actorsUncoverpotential bugs, bottlenecks or concurrency issues in the onchain & offchain codeDocument and designa process that’ll act as the foundation for future load-testing efforts
Non-Goals
Exhaustive benchmarkingof all traditional performance metrics across our tools & packages (key-value stores,http libs, etc…)Sybil attackstokenomic considerationsPerform any of the following tests:
smoke tests,spike tests,fuzzy testing,chaos testing,soak testingetc…Evaluatingthe performance or results ofproxy services/data nodesSelecting new toolsor libraries as a direct outcome of these results.Mimicking the scaleof Morse (v0) todayAccounting for
Failure cases, since the primary focus is just evaluating happy path scaleAnything to do with
Quality of Serviceas it is concerned from today’s Gateway POV
Origin Document
This forum post from Morse is a good starting point to gain an understanding of why load testing is important and critical in Shannon:
https://forum.pokt.network/t/block-sizes-claims-and-proofs-in-the-multi-gateway-era/5060/9
Load Profiles
Variable Parameters
RelaysPerSecond
1 rps
10,000 rps
+100 rps every 10 blocks
GatewayCount
1
10
+1 gateway every 100 blocks
ApplicationCount
5
1,000
+10 apps every 10 blocks
SupplierCount
5
1,00
+1 every 100 blocks
ProxyService / DataNode
0 / ∞
0 / ∞
Mocked to avoid being a performance bottlneck
Constant Parameter
BlockTime- A constant block time between 10s and 60s will be selected for the benchmarks in this testRequestType- We will use a “dummy” backing data node / proxy service that leveragesnginxto return a200or 500 randomlyRequestDistribution- TheRelaysPerSecondwill be evenely distributedVirtualUsers- For simplicity, we will assume a1:1mapping of virtual users (i.e. curl clients) toApplications
Out-of-scope Parameters
Governance Parametershave not been implemented yet and are therefore out-of-scope
Measurements
What to measure?
Chain State Size
What:
A
pie chartorstacked bar chartof how the data in the Blocks is distributedA
line chartshowing state growth over size
Why:
Get an estimate of the cost of data publishing (i.e. TIA tokens)
Get an estimate of data distribution (where to focus short-term optimizaiton efforts)
Example:
30%20%10%10%10%10%10%State Distribution in one blockProofs [30]Claims [20]Applications [10]Suppliers [10]Gateways [10]Signatures [10]Misc [10]
Validators
What: Multiple line charts to capture Disk (size & iops), RAM, CPU , Network usage (ingress/egress)
Why:
Proof Validation-RAM&CPUcould be a potential bottleneckBlock generation-RAM&CPUcould be a bottleneck in preparing new blocksBlock Publishing-Txaggregation (ingress) andBlockpublishing (egress) could be more expensive than expected w.r.t network usageData Availability State-Diskcould be a limiting factor depending on how quickly state grows
Proof Validation
❓
❓
❓
Block Generation
❓
Block Publishing
❓
Data Availability State
❓
PATH Gateway (Application, Gateway, etc…)
What: Multiple line charts to capture Disk (size & iops), RAM, CPU , Network usage (ingress/egress)
Why:
Relay ProxiesIngress/egress of relays could add up to large networking costsCaches & State- All the caching & state can have impact across the boardRequest Processing- Signature generation, request marshaling / unmarshaling, etc…Response handling- Slow supplier responses could increase pending relays at thePATH Gatewaylevel (i.e. RAM)
Relay Proxies
❓
❓
Caches & State
❓
❓
Request Processing
❓
???
❓
RelayMiner (Supplier, SMT, etc..)
What: Multiple line charts to capture Disk (size & iops), RAM, CPU , Network usage (ingress/egress)
Why:
SMT- TheSMTis one of the most important parts of the end-to-end flow which has impact onRAM,CPU&diskCaches & State- All the caching & state can have impact across the boardRequest Processing- Signature generation, request marshaling / unmarshaling, etc…Request generation- Generating the actual response to the request via the dummy service
SMT
❓
❓
❓
❓
Caches & State
❓
Request Processing
❓
Response Generation
❓
❓
❓
Out-of-scope
The exact details of the implementation are out-of-scope and will be developed adhoc along the way. The following is a non-exhaustive list of items we will figure out along the way:
Data Collection
Concrete Analysis Methodology
Templates for format reporting
Architecture / Component Diagram
Legend:
🔵 Pocket Specific Actors
🟣 Pocket Network dependencies
🟠 New tooling that needs to be build
--Asynchronous request-Synchronous request
This GitHub does not render colored mermaid diagrams, you can also access the image here:
https://github.com/pokt-network/poktroll/assets/1892194/bf965457-bdd0-4a35-a75c-cfcddfbaab5e
TODO_IMPROVE: Improve the colors for readability purposes per the comment here: https://github.com/pokt-network/poktroll/pull/286#pullrequestreview-1806384312.
Diagram text (for reference):
Network
Blocks([Tx])
(un)Stake Txs
metrics
Claim/Prove
Req/Res
Req/Res
metrics
metrics
Suppliers / RelayMiner
Supplier N
Sparse Merkle Tree
Proxy Service
(ngingx)
Supplier 1
Gateways
PATH Gateway 1
PATH Gateway N
Applications
Virtual User 1
Virtual User N
Metrics Sync
Script Trigger
(manual)
DA Network
Pocket Validator
C
Tool Requirements
Deployment Environment
Ability to deploy the environment (and tooling) on LocalNet & DevNet
Request Source / Generator
A script/tool to generate NNN requests per second
Script - Instructions
Ramp-up & ramp-down strategy
Instructions on when & how to execute commands (manually) to ramp-up & down
Script - Tools
Commands to periodically
trigger manual stake/unstake txsCommands to periodically
scale up suppliers & gatewaysCommands to periodically
add a new virtual userCommand to periodically
increase the number of requests per seconds
Was this helpful?
