Nxt forging algorithm: simulating approach andruiman,
[email protected] andruiman,
[email protected]∗ October 17, 2014
Abstract
In this paper paper we inve investig stigate ate propert properties ies of the forgin forging g algori algorithm thm used in PoS PoS crypto-cu crypto-curren rrencies cies networks networks such as Nxt. The approach approach we are using is statistical statistical modeling and simulation. simulation. We analyzed analyzed the current implemented implemented algorithm and found some weaknesses of it. We have have found found that that time time for block block genera generatio tion n depends depends on balan balance ce disdistribution over network accounts and even in the simplest case with one node it cannot converge to specified value of 1 minute per block. We also present some newer regulation techniques which help to avoid those issues and allow adapt nodes to generate a block in the specified average time interval independent on balance distribution for static and dynamic cases.
Keywords: PoS crypto-currencies, forging, statistical simulation
1
PrePre-in intr trodu oduct ctio ion n
We begin series of papers concentrated on the PoS algorithms themselves and their implement implementations ations in the Nxt. Our final goal is to develop develop a workworking model which can simulate different algorithms and approaches fast and with with analyz analyzabl ablee data. data. Tha Thatt model model we plan plan to implemen implementt based based on a mix of mathematical statistical simulations (like this paper), formal logic proof (with the help of the COQ system, http://coq.inria.fr/ ) and a fast prototyping language (haskell , http://www.haskell.org/ at the moment). ∗
To support this work work please use Nxt address: address: NXT-L892-ZK NXT-L892-ZKXZ-2J XZ-2JJY-AD9 JY-AD9JV JV
1
While we do not care yet much about performance that would be a reasonab sonable le choic hoicee as we belie believ ve. Plea Please se see see deta detail ilss of our our plan planss in the the prepre1 2 ceding papers at and . To simula simulate te someth something ing we need need that that it should should be predic predictab table, le, measur measurabl ablee and modifiable. modifiable. We start start with some some basic basic entities of the model and come close to the forging algorithm which we’d like to be investigated so we can play with params and see what happens in the simula simulatin ting. g. So this this paper paper consid considers ers the forgin forgingg algori algorithm thm from differen ferentt sides. sides. Author Author must note the outsta outstandi nding ng wo work rk of mthcl at http: //www.docdroid.net //www.docdroid.net/ecmz/forg /ecmz/forging0-5-2.pdf.html ing0-5-2.pdf.html and his precise investigation tigation of the probabilistic probabilistic properties properties of the forging forging algo. In our paper we observed some of the same results as he did using statistical simulation and propose propose some some differe different nt correc correctin tingg procedu procedure res. s. We’d like like to note note that that this this paper doesn’t belong to strict strict math math papers. papers. We skip some details details,, don don’t ’t try to prove theorems, don’t describe the numerical experiments with superaccuracy and however some of the data is of course available, our goal is to make an impression of different regulative procedures and results to realize what worth and what worthless (at least yet) to include in the simulating system, which parameter is important or even critical and which is less important for the network excellence.
2
Intr Introd oduc ucti tion on
The forging process considered as opposite to the mining is used in the Proofof-Stake (PoS) crypto-currencies networks to build a blockchain, which is the block sequence, containing all the network specific data in a structured typeclass. For details see https://wiki.nxtcrypto.org . However an algorithm of forging can be examined from the mathematical point of view, following the goal to construct an optimal and effective core network clients, whose collective work leads to the specified network behavior. We divide our paper in some sections moving from the easiest case to more realistic, discovering necessary properties of the forging process to be implemented. Let us consider a network of N N nodes, where each node corresponds to some user account, but not vice versa (we versa (we think of sleeping accounts). Each account corresponds to some balance value V value V n , which are not together exceed the total system balance: accounts (in nodes) we have V n = V . V . So for live accounts 1 2
http://chepurnoy.org/blog/2014/10/inside-a-proof-of-stake-cryptocurrency-part-1/ http://chepurnoy.org/blog/2014/10/inside-a-proof-of-stake-cryptocurrency-part-2/
2
inequality Nxt netw network ork V = 109 . Further urther,, we denot denotee V n ≤ V . V . In the Nxt the blockchain sequence as Bm and define some time interval within which we’d like to have the block to be generated in average: Et m = τ τ where tm is interval between m and (m (m − 1) bloc blocks. ks. In the the Nxt τ τ = 60 (measured in second seconds). s). We also also have have the zero block B0 which is called genesis block. Each block has a special measure which is called base target H m. To add add a non-deterministic entity we suggest that each node can generate pseudorandom (natural) numbers pnm somehow distributed between 0 and enough big number, say P . They are calle called d hits. hits. In the pnm numbers n stands for P . They the node number number and m for the curren currentt block. block. Hereaf Hereafter ter we suggest suggest the uniform distribution with infinitesimal measure dp/P (we dp/P (we may think that p is continuous as P is very big). big). In the Nxt P = 264 − 1. The starti starting ng base P is very 3 target is defined so that the estimation Et to H 0 = 2VP τ . In E t1 = τ and is equal to H the Nxt H Nxt H 0 = 2·102 ·60 = 153722867. We also yet suggest the static blockchain which means B means B nm = Bn m ≡ Bm through all the paper. The algorithm which is examined and currently implemented in the Nxt is the following (we’ll refer it as original ): ):
64 9
tnm = p nm /(V n H m−1 ); min(V H 0 , 2H m−1 ); H max max = min(V max(1, H m−1 /2); H min min = max(1, = t nm H m−1 /τ ; H c = t /τ ; min(H max max(H min )). H m = min(H max , max(H min , H c )). Rewrite the latter equations as follows substituting tnm, max’s and min’s: = p nm/(V n τ ); H c = p τ ); min(V H 0 , 2H m−1 , max(1, max(1, H m−1 /2, H c )). )). H m = min(V In the next sections we consider the following cases: (1) one node - permanent balance, (2) one node - changing balance, (3) multi-node - permanent balance, balance, (4) multi-node multi-node - changing changing balance. balance. Another important question we’d like to investigate is what we expect from from the perfect perfect algorith algorithm. m. At the momen momentt our expectat expectation ionss are: are: (1) perfect algorithm should be immune to the total balance distribution that is no matter how forging coins are distributed between accounts (2) it should be 3
Hereaf Hereafter ter we calcul calculate ate the mean mean value alue as an avera average ge within within the series and denote denote
Ex k ≡ lim 1/K →∞ K →∞
x
i
i≤K
3
immune to sudden forging balance change (due to transactions or just turning machines off) (3) it should be proportional to account’s forging balance that is with total amount of blocks generated we expect that estimation of contribution of each node is proportional to it’s forging balance.
3
One One node node - perm perman anen entt bal balan ance ce
So we start our examination with the simplest case: N = 1, V 1 = V . V . Rewriting the algorithm we have: H c = p m /(V τ ); τ ); min(V H 0 , 2H m−1 , max(1, max(1, H m−1 /2, pm /(V τ ))) H m = min(V τ ))).. Let us further normalize all the calculated stuff by 2 H 0 to simplify notes. So, let p let p ∈ U [ and set U [ε; 1] and H 0 = 0.5;
min(V /2, 2H m−1 , max(H max(H m−1 /2, pm )) H m = min(V
where we introduce ε = 1/P > 0 small enough. enough. Analyt Analytica icall soluti solution on for 1 EH m ≡ lim M H m is not straightforward so we will use numerical M →∞ →∞
m≤M
result resultss someti sometimes mes to demons demonstra trate te the propertie properties. s. To get fast simula simulatio tion n results we use Excel and Gnumeric (maybe not excellent choice but sometimes it works). The distribution of H H m we got looks like on the figure below.
4
The distribution of H H m looks pretty smooth and we’ve got the mean value of around 0. 0.5. How However ever for one node with with permane permanent nt balance balance we may may try not to tune H m at all, setting H m ≡ 0. 0 .5 for all m with expectable result of E [ p pm /H m ] = Ep E pm/H 0 = 0.5/0.5 = 1. As we can see later, keeping H m close to constant is a good idea while dealing with permanent overall balance. But what happened with the mean time interval τ 1 . Althou Although gh E H m = 0.5, τ 5, τ 1 didn’t converged to unit and is about 1. 1 .3 − 1.4. Here is a distribution for t for t m :
Again we will not yet try to prove it analytically and consider much more easier task, i.e. what is E [ both p 1,2 ∈ U [1; E [ p p1 /p2 ] with both p U [1; P ]? P ]? We believe it is P P
E [ p p1/p2 ] =
1
1 p1 /p2 dp1 dp2 = 2 P
P 2
1
2
P 2
− 0.5 ln P ≈
ln P 2
1
where we neglect a value of 1 P 2 in the brackets. This example answered us what we’ll get if we regulate H m naively setting H m = pm ; tm = pm /H m−1 : Et m converges to ln P /2 with P P big enough Ep rather than to unit, although EH = 1. Th Thee giv given origi original nal algor algorit ithm hm is much more complex to solve analytically but what we demonstrated is that obviously E [A/B] = EA/EB even for statistically independent variables. A/B ] This question goes further to theoretical equation of the kernel functions m
m
5
f ( f (x), g (x) which play here roles of probability density functions: E g xE f f
1 =
x
(1/x (1/x))f ( f (x)dx
xg( xg (x)dx = dx = 1.
So playing with the simplest case we have realized that tuning of H m should be not so rough and we’ll try some methods to overcome the problem of one-node time convergence with more interesting methods, than merely constanting H m , in the next next sect section ion.. Main Main goal is to fin find d a good way way for for probability propagation from p from p m to t to t m through H through H m caring about rather E rather E tm but maybe not EH m as mean time interval is more valuable property than mean base target of the block. The latter will play a role when we’ll come to dynamic blockchain and even blocktree where we’ll be looking for the longest and therefore most trustful chain.
4
One One node node - cha chang ngin ing g bal balan ance ce
To see how we can regulate H m when account’s balance changes from block to block let us introduce a new random entity, say km . The distri distribut bution ion of km is not important to our goal so let km = k0 + αq αq with q ∈ U [0 U [0;; 1] for further simulations (the latter is the case when Ek m exists but we may also be in principl principlee inter interest esting ing when it doesn’t doesn’t). ). With With the given given km we set t set t m = p = p m /H m−1 km . Start with the naive approach and let H m = H m−1 tm supposing than we regulate H regulate H m depending on the last time of block. We have that is not very very good and althou although gh H m = pm km and tm = p p kk . And that m
m
m−1
m−1
we know that K = E p
ln P 2
and we can add this coefficien coefficientt directly directly to H m : H m = pm km K p (which in fact perfectly works) but we don’t know anything a anything a priori about about the distribution of km and the correspondent value of E kk , because it depends on user actions. So we need some more elegant method to regulate H regulate H m. There is no proper way to expect user actions but we can try to calculate some mean values on the fly to achiev achievee our goal. Doing Doing this we pursue pursue two two aims aims of estimati estimating ng the mean value of account balance and get it relatively local, that is we won’t wait for decades to calculate the correct mean value because generally speakin speakingg the balance balance distribu distributio tion n might might have have no mean mean value alue at all. all. So we try to use some moving average value as a local estimation of the mean. For p
p
m
m−1
≈
m
m−1
6
that we choose some window within which we calculate the average value of forging balance and use this for local regulating of H H m . We have m
1 Rm = ki ; r i=m−r+1
= H 0 Rm H m = H
where r is the window size. And actually actually we have rather rather good goo d results with Et m close to unit and with the distribution like:
Now it’s important to notice that the latter results are not so excellent as they they could could seem due to follo followin wingg reason reasons: s: (1) we still probably probably don’t want to know anything about current forging balance (because it’s not easily convertible to the case of multi node) and (2) the distribution of t of tm has a long tail tail and high value valuess below below the unit. unit. It would would be more more prefe preferab rable le if it looks like like gaussian gaussian distribution distribution around the unit. So we proceed proceed our investigat investigation ion with other types of regulation. What if we tune H m directly based on the time of block measurements and define: m 1 = H 0 Rm . Rm = ti ; H m = H r i=m−r+1
In this case we have E tm < 1 < 1 with distribution like
7
We see that the distribution is better as it has shorter tail but we have over-regulated H m and got mean mean time time less less than than the goal. goal. Also Also the mean value of block time depends on the km distribution what is unacceptable. That is because we don’t let t let t m to relax between H m changes. So let it relax: m
1 Rm = ti , if r i=m−r+1
mod (m, r) = 0; Rm = 1 otherwise;
= H m−1 Rm . H m = H
And after this we actually have Et m a bit more than unit, relatively smooth H m (it’s getting constant between changes) and the distribution of tm looking like
8
with short tail and almost uniform distribution before the tail. Not bad for now. Let’s go to the multi-node case.
5
Mult Multii node node - perma permane nen nt bala balanc nce e
The only thing we need from this case is to realize the mean time of a block dependence on the forging balance distribution between nodes. So for some regulations it actually depends. Suppose nodes to be not concurrent that is the first found block is acceptable by the system and instantly redistributed betwe between en nodes. So the winner winner is the node which which find findss a block block in shorter shorter time. In the uniform and permanent permanent balance case this time is proportional proportional to a random number generated by each node, so tnm ∼ pnm where pnm are still uniformly distributed. So tm = min tnm. n
Let’s calculate the estimation E min p min pnm . It is N −1
N pdp dq 1
0
1
= N
p
1
1
p(1 p(1 − p) p)N −1 dp = dp = −
0
p d[(1 − p) p)N ] =
0
1
=
(1 − p) p)N dp = dp =
1 2Ep = . + 1 N + 1 N +
0
So if we naively put E put E tnm = N = N (which (which is the case when H ≈ H 0 ) to be proportional to account’s balance we have Et m = N 2N → 2(N 2(N → ∞). Also it’s +1 important that the resulting mean value depends on the balance distribution if we don’t tune H m carefully carefully. Actually Actually we already have have a method to do this even in the case of changing balance. By the way we notice that for the original algorithm of forging we observe the mean time of ∼ τ 1 · β where β where β ∈ [1; 2] and depends on the forging balance balance distributio distribution. n. In the real Nxt network network the final block time value value is around around 1 .9 at the moment.
9
6
Mult Multii node node - cha chang ngin ing g bala balanc nce e
We use the regulation method of the section 4 for each node supposing that nodes immediately share the solved block and the correspondent H H values. So = R m H 0 ; H nm nm ≡ H m = R R0 = 1; Rm =
R
m−1
r
m
ti , if
mo d (m, r) = 0; Rm = R = R m−1 otherwise;
i=m−r+1
= p nm /(H nm tnm = p nm V n ); tm = min tnm . n
We get the mean value of tm close to unit and the following simulated distribution:
We see that the latter distribution decreases while argument goes more positive from zero but it has almost nothing common with the gaussian which we believe is one of the best when talking about some value more likely to be a constant. constant. Let’s try to get it more concentrate concentrated d around the goal value of the unit. To do this we present present some simple simple method method which which we called called pool-in-nodes .
7 mthcl’s algorithm Here at https://nxtforum.o https://nxtforum.org/proof-of-stake-algorith rg/proof-of-stake-algorithm/forging m/forging-208 -208 8/ 40/ a new forging algo with two extra regulating parameters had been proposed. We shall investigat investigatee it also to reveal reveal its statistical statistical properties. properties. The 10
main idea was that it that it should be a bit more difficult to decrease the BaseTarget than to incre increase it. Here, Here, the paramete parameterr bias is a number number between between 0 and 1, e.g. 1/2. This should dramatic dramatically ally decr decreease the prob probabili ability ty of long times between blocks. Now, the constant K is chosen in such a way that the expected time between blocks is 1 minute (so, K is a function of bias). It is difficult to calculate K exactly, because the balance equation for the stationary measure of the system is too complicated. So it was proposed to simulate the process and get parameter K parameter K numerically. numerically. However we can find in mthcl’s paper4 the adapted algorithm with one extra parameter, namely γ and the γ and the second β β defined by γ . We will use the latter version of the algorithm as it described in the paper (see pages 21–22): H 0 = 1; ln p/((H mV n ), p ∈ U [0 1]; tnm = − ln p/ U [0;; 1]; tm = min tnm ; n
if t if t if if tt
m
H m+1 = H m ·
m m m
≥ 2 =⇒ 2; ∈ (1;2) =⇒ tm ; ∈ (1/ (1/2;1] =⇒ (1 − γ (1 (1 − tm )); ≤ 1/2 =⇒ 1/β
−1
= (1 − γ/2) β = γ/ 2) .
The simulation simulation results results for H for H m give the following distribution:
4
http://www.docdroid.net/ecmz/forging0-5-2.pdf.html
11
which seems very good and is also presented in the paper as a numerical solution solution for PDF of the base target H target H .. Also the algorithm algorithm is highly adapting for distribution of the stake between nodes even in a case of the fluctuating node’s balance. balance. The number number of generated generated blocks is quite well proportional proportional to the stake portion of the generating node. For the block time distribution we have the picture like (it was intentionally broken at 3):
with mean value around the unit for γ ≈ 0.5. This This distribu distributio tion n is a little little similar to what we got in the section 6 but with more descending shape because of the Exp distribution distribution for the hits. Howev However er the distribution distribution above allows allows small intervals intervals more likely and sometimes sometimes allows large interv intervals. For our simulation the interval for t m was up to 20. We suggest that the distribution of the block time should be more concentrated around the unit and never (or almost never) run out a reasonable neighborhood and that is the main reason to proceed proceed our investigat investigation. ion. Neverthe Nevertheless less the examined algorithm is better than original due to high stability to the immediate balance distributio distribution, n, bette b etterr proportionalit proportionality y and good mean value. value. The regulating regulating parameter γ parameter γ taken taken from the neighborhood of 0.5 works well in wide limits of modified stake stake distribution. distribution. So for the given given network network it can be chosen chosen once for a long run.
12
8
Pool-i ool-inn-nod nodes es
Recall that a mean value of uniformly distributed numbers is asymptotically normally normally distributed. distributed. Previously Previously we calculated calculated a node internal internal block time as tnm = p nm/(H mV n ). Now suppose that we distinguish between block and sub-block, or maybe one likes to call them super-block and just block. Each sub-block is generated with less difficulty, so the normal base target is roughly multiplied by some predefined number (say 16 or 32) which is equivalent to dividi dividing ng the hit by the same number. number. Let us denote denote is like like w. Th Then en the the procedu procedure re is the same. same. Nodes Nodes build build the sequen sequence ce of sub-bl sub-block ockss and after after sub-blockss built the real block is generated. generated. The remained remained questions questions of w sub-block what information to be included in the final block and how fee should be distributed between contributing nodes we’ll analyze in the next paper. The simple idea is to behave like a pool and distribute the cumulative fee between nodes, which generated at least one sub-block proportional to number of subblocks generated. Each sub-block time is distributed like in the section 6 within 6 within [0; P /(wH m (max V n ))] and while we have been before regulating regulating H have E tm close to unit now H m to have E we believe that the mean value of a sum of such w t-values w t-values goes to unit more gaussian-like. So we define tm =
t
w m
w ≤w
; tw m = min tw nm ; tw nm = p w nm /(H w nm V n ); H w nm = H = H nm nm w ≡ H m w.
n
And actually we’ve we’ve got what we hoped. The distribution distribution looks like much much more gaussian and concentrated around the unit:
13
Now we finished our examination and proceed it in future works.
9
Conc Conclu lusi sion on and and fut futur ure e wor work k
What What we realize realize from from our invest investiga igatio tion n are the follo followin wing: g: (1) the original original algorithm of forging is not immune to the balance distribution and even for one node converges to the mean block time more than unit (2) there is an adapting algorithm which solves both the issues and offer the regulation convergen vergentt to unit and immune to changing changing balance distribution distribution.. The number number of found blocks is proportional to node’s forging balance as the local node’s time is inverse proportional to it, hits are uniformly distributed and nodes share blocks immediately after they have been generated. Future uture wo work rk includ includes: es: (1) concurre concurrent nt nodes model model that that is nodes nodes may may choose the blocks sequence on which forge based on its cumulative base target (2) model for asynchronous and delayed process of blocks exchange (3) analyzing of attacks opportunities in the different forging models.
14