Glossario

Seleziona una delle parole chiave a sinistra ...

Numerical ComputationPseudorandom Number Generation

Momento della lettura: ~15 min

When random numbers are needed in a scientific computing application, we generally use deterministic processes which mimic the behavior of random processes. These are called pseudo-random number generators (PRNG). A PRNG takes an initial value, called the seed, and uses it to produce a sequence of numbers which are supposed to "look random". The seed determines the sequence, so you can make the random number generation in a program reproducible by providing an explicit seed. If no seed is provided, a different one will be used each time the program runs.

A simple PRNG is the linear congruential generator: fix positive integers M, a, and c, and consider a seed X_0 \in \{0,1,\ldots,M-1\}. We return the sequence X_0, X_1, X_2, \ldots, where X_n = \mathrm{mod}(aX_{n-1}+c,M) for n \geq 1.

Exercise
A sequence of numbers X_0, X_1,\dots is periodic with period p > 0 if p is the smallest number such that X_k = X_{k+p} for all k \geq 0. We say that a linear congruential generator (LCG) with c=0 is full-cyle if the generated sequence has a period p = M - 1. For what values of a is the LCG with c = 0, M = 5, and X_0 = a full-cycle?

Solution. The LCG is full-cycle only if a \in \{2, 3\}. The period is 1 when a \in \{0, 1\} and it is 2 if a = 4.

Since the sequence of numbers produced by a PRNG is determined by the initial seed, we cannot say that the sequence of numbers is random. The pseudo part of the term pseudorandom is meant to emphasize this distinction. However, we can subject the sequence of numbers to a battery of statistical tests to check whether it can be readily distinguished from a random sequence.

For example, we can check that each number appears with approximately equal frequency. For example, a sequence purporting to sample uniformly from \{1,2,3,4,5\} which begins

\begin{align*}1,1,5,5,5,1,1,5,1,1,5,5,5,1,1,1,2,1,5,5,1,2,1,5,1,1,1,\ldots\end{align*}

is probably not a good pseudorandom number generator. However, some clearly non-random sequences pass the basic frequency test:

\begin{align*}\{a_n\}_{n=1}^\infty = \{1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,\ldots\}\end{align*}

To detect such failures, we need more advanced tests. For example, we can split up the sequence into pairs of consecutive terms and ensure that these pairs are also approximately equally represented. Since \{1,2\} appears often and \{2,1\} never appears in \{a_n\}_{n=1}^\infty, the pair frequency test is sufficient to distinguish this sequence from a random one. We can apply the same idea with blocks of length 3 or 4 or more.

Exercise
Consider the sequence \{\mathrm{mod}(3\cdot2^n,11)\}_{n=1}^{100}. Use Julia to show that each number from 1 to 10 appears exactly 10 times in this sequence. Also, use Julia to show that a_{2k} is smaller than a_{2k-1} for far more than half the values of k from 1 to 50. Hint: countmap(a) tells you how many times each element in the collection a appears. To use this function, do using StatsBase first.

Repeat these tests on the sequence whose k th term is the k th digit in the decimal representation of \pi: reverse(digits(floor(BigInt,big(10)^99*π))).

Solution. Only 10 of the 50 pairs have the even-indexed term larger than the odd-indexed term:

using StatsBase
a = [6]
for i=1:99
  push!(a,mod(2a[end],11))
end
countmap(a) # every number appears 10 times
sum(a[2i] < a[2i-1] for i=1:50) # returns 40

Repeating with the digits of \pi, we find that 27 of the first hundred blocks of 2 have even-indexed digit smaller than the one before it.

A PRNG is cryptographically secure if an agent who knows the algorithm used to generate the numbers (but who does not know the value of the seed) cannot feasibly infer the k the number generated based on observation of the first k-1 numbers generated.

Most PRNGs are not cryptographically secure. In other words, they hold up well under the scrutiny of general statistical tests, but not to tests which exploit knowledge of the specific algorithm used.

As a simple example of a PRNG that is not cryptographically secure, consider the digits of \pi starting from some unknown position. This sequence does behave statistically as though it were random (as far as we know), but an adversary who knew you were using successive digits of \pi would be able to use several values output by your PRNG to find your position and start anticipating subsequent values.

Bruno
Bruno Bruno