## Randomized matrix algorithms.
# Our overall point: A matrix of random entries is not random.
# More technically correct is:
# A matrix of random entries has "non-random" properties.
## Point 0. A sum of indep. random values isn't all that random.
x = rand(1000)
sum(x)
## Point 1. A matrix of random indep. entries isn't all that random.
using LinearAlgebra
A = rand(50,50)
A = A + A'
maximum(eigvals(A))
## Q: What is the eigenvector?
A = randn(1000,1000)
A = A + A'
lams,V = eigen!(A)
i = argmax(lams)
vi = V[:,i]
## A: The eigenvector appears to be converging to the normalized
# version of all ones!
##m
# ## This is an instance of a property called concentration.
# Concentration is just the property that a collection of random
# variable samples should be behave like the mean.
using Plots
pyplot()
n = 10_000
X = randn(n,5)
plot(cumsum(X ./ (1:n), dims=2), xscale=:log10)
gui()
## The same holds true for a collection of matrices.
using StatsBase
function eigtrial(n,nt)
lams = Vector{Float64}()
for t=1:nt
A = rand(50,50)
A = A + A'
push!(lams, eigvals!(A)...)
end
h = fit(Histogram, lams; nbins = 50)
h = normalize(h)
return h
end
h = eigtrial(50,100)
plot(h.edges, h.weights)
##
h = eigtrial(50,100)
plot!(h.edges, h.weights)
##
h = eigtrial(50,100)
plot!(h.edges, h.weights)
##
h = eigtrial(50,100)
plot!(h.edges, h.weights)
##
# See, this is not random at all! It's basically
# the same every time!
## The idea of randomized-matrix algorithms is to
# use this non-randomness of random-matrices to
# make things faster!
## Let's look at the eigenfaces matrix.
using DelimitedFiles
A = readdlm(download("http://www.cs.purdue.edu/homes/dgleich/cs515-2019/homeworks/Yale_64.csv"),',')
##
M = A'
U,Sig,V = svd(M)
## A given matrix-times-random matrix is often called
# a sketch because it preserves certain properties due
# to what happens with random matrices being well-behaved
# and deterministic. It's a "sketch" betcause it's smaller.
# The algorithm here is from Halko Martinsson and Tropp.
# TODO, write the reference!
function simple_sketch(M::AbstractMatrix, k::Int, p::Int=10)
m,n = size(M)
S = M*randn(n,k+p)
return Matrix(qr!(S).Q)
end
Q = simple_sketch(M, 5)
##
B = Q'*M
UB,SigB,VB = svd(B)
Uhat = Q*UB
## Compare the two
[Sig[1:5] SigB[1:5]]
## Look at the Vector
norm(Uhat[:,1] - U[:,1])
##
plot(
heatmap(reshape(abs.(Uhat[:,1]), 64, 64), title="Random",color=:blues),
heatmap(reshape(abs.(U[:,1]), 64, 64), title="Exact",color=:blues),
yflip=true, )
gui()
##
# So the field is often dominated by analysis showing
# that these results are not a fluke! In fact, we can
# get quite precise bounds on how much work is needed
# in a variety of situations to get good accuracy.
##
plot(
heatmap(reshape(abs.(Uhat[:,2]), 64, 64), title="Random",color=:blues),
heatmap(reshape(abs.(U[:,2]), 64, 64), title="Exact",color=:blues),
yflip=true, )
gui()