Skip to content

Latest commit

 

History

History
274 lines (213 loc) · 11.8 KB

README.md

File metadata and controls

274 lines (213 loc) · 11.8 KB

RandomExtensions

Tests Status codecov

This package explores a possible extension of rand-related functionalities (from the Random module); the code is initially taken from JuliaLang/julia#24912. Note that type piracy is committed! While hopefully useful, this package is still experimental, and hence unstable. User feedback, and design or implementation contributions are welcome.

This does essentially four things:

  1. define distribution objects, to give first-class status to features provided by Random; for example rand(Normal(), 3) is equivalent to randn(3); other available distributions: Exponential, CloseOpen (for generation of floats in a close-open range) and friends, Uniform (which can wrap an implicit uniform distribution);

  2. define make methods, which can combine distributions for objects made of multiple scalars, like Pair, Tuple, or Complex, or describe how to generate more complex objects, like containers;

  3. extend the rand([rng], [S], dims) API to allow the generation of other containers than arrays (like Set, Dict, SparseArray, String, BitArray);

  4. define a Rand iterator, which produces lazily random values.

Point 1) defines a Distribution type which is incompatible with the "Distributions.jl" package. Input on how to unify the two approaches is welcome.

Point 2) is really the core of this package. make provides a vocabulary to define the generation of "scalars" which require more than one argument to be described, e.g. pairs from 1:3 to Int (rand(make(Pair, 1:3, Int))) or regular containers (e.g. make(Array, 2, 3)). The point of calling make rather than putting all the arguments in rand directly is simplicity and composability: the make call always occurs as the second argument to rand (or first if the RNG is omitted). For example, rand(make(Array, 2, 3), 3) creates an array of matrices. Of course, make is not necessary, in that the same can be achieved with an ad hoc struct, which in some cases is clearer (e.g. Normal(m, s) rather than something like make(Float64, Val(:Normal), m, s)).

As an experimental feature, the following alternative API is available:

  • rand(T => x) is equivalent to rand(make(T, x))
  • rand(T => (x, y, ...)) is equivalent to rand(make(T, x, y, ...))

This is for convenience only (it may be more readable), but may be less efficient due to the fact that the type of a pair containing a type doesn't know this exact type (e.g. Pair => Int has type Pair{UnionAll,DataType}), so rand can't infer the type of the generated value. Thanks to inlining, the inferred types can however be sufficiently tight in some cases (e.g. rand(Complex => Int, 3) is of type Vector{Complex{Int64}} instead of Vector{Any}).

Point 3) allows something like rand(1:30, Set, 10) to produce a Set of length 10 with values from 1:30. The idea is that rand([rng], [S], Cont, etc...) should always be equivalent to rand([rng], make(Cont, [S], etc...)). This design goes somewhat against the trend in Base to create containers using their constructors -- which by the way may be achieved via the Rand iterator from point 4). Still, I like the terse approach here, as it simply generalizes to other containers the current rand API creating arrays. See the issue linked above for a discussion on these topics.

For convenience, the following names from Random are re-exported in this package: rand!, AbstractRNG, MersenneTwister, RandomDevice (rand is in Base). Functions like randn! or randstring are considered to be obsoleted by this package so are not re-exported. It is still necessary to import Random separately in order to use functions which don't extend the rand API, namely randsubseq, shuffle, randperm, randcycle, and their mutating variants.

There is not much documentation for now: rand's docstring is updated, and here are some examples:

julia> rand(CloseOpen(Float64)) # equivalent to rand(Float64)
0.7678877639669386

julia> rand(CloseClose(1.0f0, 10)) # generation in [1.0f0, 10.0f0]
6.62467f0

julia> rand(OpenOpen(2.0^52, 2.0^52+1)) == 2.0^52 # exactness not guaranteed for "unreasonable" values!
true

julia> rand(Normal(0.0, 10.0)) # explicit μ and σ parameters
-8.473790458128912

julia> rand(Uniform(1:3)) # equivalent to rand(1:3)
2

julia> rand(make(Pair, 1:10, Normal())) # random Pair, where both members have distinct distributions
5 => 0.674375

julia> rand(make(Pair{Number,Any}, 1:10, Normal())) # specify the Pair type
Pair{Number, Any}(1, -0.131617)

julia> rand(Pair{Float64,Int}) # equivalent to rand(make(Pair, Float64, Int))
0.321676 => -4583276276690463733

julia> rand(make(Tuple, 1:10, UInt8, OpenClose()))
(9, 0x6b, 0.34900083923775505)

julia> rand(Tuple{Float64,Int}) # equivalent to rand(make(Tuple, Float64, Int))
(0.9830769470405203, -6048436354564488035)

julia> rand(make(NTuple{3}, 1:10)) # produces a 3-tuple with values from 1:10
(5, 9, 6)

julia> rand(make(NTuple{N,UInt8} where N, 1:3, 5))
(0x02, 0x03, 0x02, 0x03, 0x02)

julia> rand(make(NTuple{3}, make(Pair, 1:9, Bool))) # make calls can be nested
(2 => false, 8 => true, 7 => false)

julia> rand(make(Complex, Normal())) # each coordinate is drawn from the normal distribution
1.5112317924121632 + 0.723463453534426im

julia> rand(make(Complex, Normal(), 1:10)) # distinct distributions
1.096731587266045 + 8.0im

julia> rand(Normal(ComplexF64)) # equivalent to randn(ComplexF64)
0.9322376894079347 + 0.2812214248483498im

julia> rand(Set, 3)
Set{Float64} with 3 elements:
  0.0675168818514279
  0.31058418699493895
  0.15029104540378424

julia> rand!(ans, Exponential())
Set{Float64} with 3 elements:
  1.082312697650858
  1.2984094155972015
  0.016146678329819485

julia> rand(1:9, Set, 3) # if you try `rand(1:3, Set, 9)`, it will take a while ;-)
Set{Int64} with 3 elements:
  4
  7
  1

julia> rand(Dict{String,Int8}, 2)
Dict{String, Int8} with 2 entries:
  "vxybIbae" => 42
  "bO2fTwuq" => -13

julia> rand(make(Pair, 1:9, Normal()), Dict, 3)
Dict{Int64, Float64} with 3 entries:
  9 => 0.916406
  3 => -2.44958
  8 => -0.703348

julia> using SparseArrays

julia> rand(SparseVector, 0.3, 9) # equivalent to sprand(9, 0.3)
9-element SparseVector{Float64, Int64} with 3 stored entries:
  [1]  =  0.173858
  [6]  =  0.568631
  [8]  =  0.297207

julia> rand(Normal(), SparseMatrixCSC, 0.3, 2, 3) # equivalent to sprandn(2, 3, 0.3)
2×3 SparseMatrixCSC{Float64, Int64} with 2 stored entries:
          -1.5617   
 0.572305           

# like for Array, sparse arrays enjoy to be special cased: `SparseVector` or `SparseMatrixCSC`
# can be omitted in the `rand` call (not in the `make` call):

julia> rand(make(SparseVector, 1:9, 0.3, 2), 0.1, 4, 3) # possible, bug ugly output when non-empty :-/
4×3 SparseMatrixCSC{SparseVector{Int64,Int64},Int64} with 0 stored entries

julia> rand(String, 4) # equivalent to randstring(4)
"5o75"

julia> rand("123", String, 4) # like above, String creation with the "container" syntax ...
"2131"

julia> rand(make(String, 3, "123")) # ... which is as always equivalent to a call to make
"211"

julia> rand(String, Set, 3) # String considered as a scalar
Set{String} with 3 elements:
  "jDbjXu9b"
  "0Lo75VKo"
  "webpNhfY"

julia> rand(BitArray, 3) # equivalent to, but unfortunately more verbose than, bitrand(3)
3-element BitVector:
 1
 1
 0

julia> rand(Bernoulli(0.2), BitVector, 10) # using the Bernoulli distribution
10-element BitVector:
 0
 1
 0
 1
 0
 0
 0
 0
 0
 1

julia> rand(1:3, NTuple{3}) # NTuple{3} considered as a container, equivalent to rand(make(NTuple{3}, 1:3))
(3, 3, 1)

julia> rand(1:3, Tuple{Int,UInt8, BigFloat}) # works also with more general tuple types ...
(3, 0x02, 2.0)

julia> rand(1:3, NamedTuple{(:a, :b)}) # ... and with named tuples
(a = 3, b = 2)

julia> RandomExtensions.random_staticarrays() # poor man's conditional modules!
# ugly warning

julia> rand(make(MVector{2,AbstractString}, String), SMatrix{3, 2})
3×2 SArray{Tuple{3,2},MArray{Tuple{2},AbstractString,1,2},2,6} with indices SOneTo(3)×SOneTo(2):
 ["SzPKXHFk", "1eFXaUiM"]  ["RJnHwhb7", "jqfLcY8a"]
 ["FMTKcBY8", "eoYtNntD"]  ["FzdD530L", "ux6sWGMU"]
 ["fFJuUtJQ", "H2mAQrIV"]  ["pt0OYFJw", "O0fCfjjR"]

julia> Set(Iterators.take(Rand(RandomDevice(), 1:10), 3)) # RNG defaults to Random.default_rng()
Set{Int64} with 2 elements: # note that the set can end up with less than 3 elements if `Rand` generates duplicates
  5
  9

julia> collect(Iterators.take(Uniform(1:10), 3)) # distributions can be iterated over, using Random.default_rng() implicitly
3-element Vector{Int64}:
 9
 6
 8

julia> rand(Complex => Int) # equivalent to rand(make(Complex, Int)) (experimental)
4610038282330316390 + 4899086469899572461im

julia> rand(Pair => (String, Int8)) # equivalent to rand(make(Pair, String, Int8)) (experimental)
"ODNXIePK" => 4

In some cases, the Rand iterator can provide efficiency gains compared to repeated calls to rand, as it uses the same mechanism as array generation. For example, given a = zeros(1000) and s = BitSet(1:1000), a .+ Rand(s).() is three times faster than a .+ rand.(Ref(s)).

Note: as seen in the examples above, String can be considered as a scalar or as a container (in the rand API). In a call like rand(String), both APIs coincide, but in rand(String, 3), should we construct a String of length 3 (container API), or an array of strings of default length 8 ? Currently, the package chooses the first interpretation, partly because it was the first implemented, and also because it may actually be the one most useful (and offers the tersest API to compete with randstring). But as this package is still unstable, this choice may be revisited in the future. Note that it's easy to get the result of the second interpretation via either rand(make(String), 3), rand(String, (3,)) or rand(String, Vector, 3).

How to extend: the make function is meant to be extensible, and there are some helper functions which make it easy, but this is still experimental. By default, make(T, args...) will create a Make{maketype(T, args...)} object, say m, which contain args... as fields. For type stable code, the rand machinery likes to know the exact type of the object which will be generated by rand(m), and maketype(T, args...) is supposed to return that type. For example, maketype(Pair, 1:3, UInt) == Pair{Int,UInt}. Then just define rand for m like documented in the Random module, e.g. rand(rng::AbstractRNG, sp::SamplerTrivial{<:Make{P}}) where {P<:Pair} = P(rand(sp[][1]), rand(sp[][2])). For convenience, maketype(T, ...) defaults to T, which means that for simple cases, only the rand function has to be defined. But in cases like for Pair above, if maketype is not defined, the generated type will be assumed to be Pair, which is not a concrete type (and hence suboptimal).

This package started out of frustration with the limitations of the Random module. Besides generating simple scalars and arrays, very little is supported out of the box. For example, generating a random Dict is too complex. Moreover, there are too many functions for my taste: rand, randn, randexp, sprand (with its exotic rfn parameter), sprandn, sprandexp, randstring, bitrand, and mutating counterparts (but I believe randn will never go away, as it's so terse). I hope that this package can serve as a starting point towards improving Random.