## Lecture 4, sparse matrix operations.
#= We are going to look at the Candyland matrix!
See some analysis online:
* <http://www.lscheffer.com/CandyLand.htm>
* <http://datagenetics.com/blog/december12011/index.html>
We are studying a slightly different set of rules
with respect to the sticky states. In one set
of rules (on those pages), a sticky state just holds
your character determinstically for one step. In our
set of rules, a sticky state holds you until you draw
a specific color. So if you see some differences,
that is why.
=#
## We have build the Markov transition matrix for you!
using DelimitedFiles
using SparseArrays
data = readdlm("candyland-matrix.csv",',')
TI = Int.(data[:,1])
TJ = Int.(data[:,2])
TV = data[:,3]
T = sparse(TI,TJ, TV, 140,140)
coords = readdlm("candyland-coords.csv",',')
x = coords[:,1]
y = coords[:,2]
cells = Int.(readdlm("candyland-cells.csv",','))
start,dest = cells[1:2]
2-element Vector{Int64}: 140 134
##
sum(T,dims=1)
1×140 Matrix{Float64}: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … 0.0 1.0 1.0 1.0 1.0 1.0 1.0
##
findall(vec(sum(T,dims=1)) .< 0.9)
1-element Vector{Int64}: 134
## Let's see how Julia stores thigns
typeof(T)
SparseMatrixCSC{Float64, Int64}
## What can we do with T?
fieldnames(typeof(T))
(:m, :n, :colptr, :rowval, :nzval)
(:m, :n, :colptr, :rowval, :nzval)
##
T.colptr
141-element Vector{Int64}: 1 19 37 55 73 74 92 110 128 146 164 182 200 ⋮ 2210 2220 2229 2237 2244 2244 2245 2246 2249 2252 2255 2273
##
T.nzval
2272-element Vector{Float64}: 0.09375 0.09375 0.09375 0.09375 0.09375 0.09375 0.0625 0.015625 0.0625 0.0625 0.046875 0.046875 0.0625 ⋮ 0.0625 0.0625 0.015625 0.0625 0.0625 0.046875 0.046875 0.015625 0.015625 0.015625 0.015625 0.015625
##
T.rowval
2272-element Vector{Int64}: 2 3 4 5 6 7 8 9 10 11 12 13 14 ⋮ 7 8 9 10 11 12 13 20 42 69 92 102
## Why sparse matrices?
# The first reason is memory!
print(varinfo(r"T"))
A = Array(T)
print(varinfo(r"A")) # so it takes about 10% of the memory!
| name | size | summary | |:---- | ----------:|:---------------------------------------------------------------- | | T | 36.758 KiB | 140×140 SparseMatrixCSC{Float64, Int64} with 2272 stored entries | | TI | 17.789 KiB | 2272-element Vector{Int64} | | TJ | 17.789 KiB | 2272-element Vector{Int64} | | TV | 17.789 KiB | 2272-element Vector{Float64} | | name | size | summary | |:---- | -----------:|:----------------------- | | A | 153.164 KiB | 140×140 Matrix{Float64} |
## The second reason is speed as you'll see on the homework.
## The downside is that some operations need a lot more care.
## Let's write our own matvec given the raw arrays
function mymatvec(TI,TJ,TV,x,m)
y = zeros(m)
for nzi in 1:length(TI)
i = TI[nzi]
j = TJ[nzi]
v = TV[nzi]
y[i] += v*x[j]
end
# A fancy way
#for (i,j,v) in zip(TI,TJ,TV)
# y[i] += v*x[j]
# end
return y
end
x = randn(140)
mymatvec(TI,TJ,TV,x,140) == T*x
true
##
y = mymatvec(TI,TJ,TV,x,140)
z = T*x
140-element Vector{Float64}: -0.05009905647012144 -0.22061218649942774 -0.09376967937123634 -0.30269272998328123 -0.4142411662777157 -0.4142411662777157 -0.35015022066581003 -0.28714951826734675 -0.017238332370831402 -0.42714734603389765 -0.42639454696999557 -0.34917341288596915 -0.27575108695768524 ⋮ -0.14914329410221044 -0.23488323223120677 -0.19659438969249066 -0.21572021910896808 -0.14924061007912873 -0.29842399595249375 0.24772616108499199 0.46937962973020614 -0.7194342688800975 4.617057342353504 -0.25827164988195656 0.0
##
## Let's see how Julia stores thigns
typeof(T)
SparseMatrixCSC{Float64, Int64}
## What can we do with T?
fieldnames(typeof(T))
(:m, :n, :colptr, :rowval, :nzval)
##
T.colptr
141-element Vector{Int64}: 1 19 37 55 73 74 92 110 128 146 164 182 200 ⋮ 2210 2220 2229 2237 2244 2244 2245 2246 2249 2252 2255 2273
##
T.nzval
2272-element Vector{Float64}: 0.09375 0.09375 0.09375 0.09375 0.09375 0.09375 0.0625 0.015625 0.0625 0.0625 0.046875 0.046875 0.0625 ⋮ 0.0625 0.0625 0.015625 0.0625 0.0625 0.046875 0.046875 0.015625 0.015625 0.015625 0.015625 0.015625
##
T.rowval
2272-element Vector{Int64}: 2 3 4 5 6 7 8 9 10 11 12 13 14 ⋮ 7 8 9 10 11 12 13 20 42 69 92 102
##
function cscmatvec(colptr,rowval,nzval,x,m)
y = zeros(m)
for j=1:length(colptr)-1 # for each column ...
for nzi=colptr[j]:colptr[j+1]-1 # for each entry in the column
i = rowval[nzi]
v = nzval[nzi]
y[i] += v*x[j]
end
end
return y
end
cscmatvec(T.colptr,T.rowval,T.nzval,x,140) == T*x
true
## Note that getting a random column in T is fast
# Note, run this a few times, the first time
# in Julia tends to be unreliable.
@time T[:,(rand(1:size(T,2)))]
0.033037 seconds (26.12 k allocations: 1.750 MiB, 99.78% compilation time)
140-element SparseVector{Float64, Int64} with 17 stored entries: [9 ] = 0.015625 [20 ] = 0.015625 [42 ] = 0.015625 [69 ] = 0.015625 [92 ] = 0.015625 [102] = 0.015625 [124] = 0.09375 [125] = 0.09375 [126] = 0.09375 [127] = 0.09375 [128] = 0.09375 [129] = 0.09375 [130] = 0.0625 [131] = 0.046875 [132] = 0.046875 [133] = 0.0625 [134] = 0.125
## But getting a random row in T is slow
@time T[rand(1:size(T,1)),:]
0.024496 seconds (18.01 k allocations: 1.218 MiB, 99.79% compilation time)
140-element SparseVector{Float64, Int64} with 11 stored entries: [43] = 0.0625 [44] = 0.0625 [45] = 0.0625 [46] = 0.0625 [48] = 0.0625 [49] = 0.09375 [50] = 0.09375 [51] = 0.09375 [52] = 0.09375 [53] = 0.09375 [54] = 0.09375
## Okay, maybe we need a bigger matrix
S = sprandn(100000,100000,10/100000)
100000×100000 SparseMatrixCSC{Float64, Int64} with 1000524 stored entries: ⎡⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎤ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎣⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎦
##Note that getting a random column in S is fast
@time S[:,rand(1:size(T,2))]
0.000012 seconds (4 allocations: 288 bytes)
100000-element SparseVector{Float64, Int64} with 6 stored entries: [5492 ] = -0.73382 [21669] = -0.812271 [46908] = -0.210555 [52075] = 0.0483097 [72567] = -0.0525567 [86414] = -0.102364
## But getting a random row in S is slow
@time S[rand(1:size(T,1)),:]
0.000616 seconds (8 allocations: 1024 bytes)
100000-element SparseVector{Float64, Int64} with 11 stored entries: [299 ] = 0.345405 [8679 ] = -0.123843 [18970] = -0.228814 [24782] = 1.02155 [30110] = -0.637553 [44628] = 0.212392 [56864] = 0.706521 [82412] = 0.93487 [89347] = -1.74611 [90550] = -0.353132 [94031] = -0.818578
## Also note that there is NO way we could possibly store S with dense
length(S)*8/1e9 # it would take 80GB of memory, which is 10x my Mac.
80.0
## And we can go bigger
S = sprandn(10000000,10000000,10/10000000)
10000000×10000000 SparseMatrixCSC{Float64, Int64} with 100013479 stored entries: ⎡⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎤ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎢⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎥ ⎣⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⎦
##Note that getting a random column in S is fast
@time S[:,rand(1:size(T,2))]
0.000014 seconds (4 allocations: 384 bytes)
10000000-element SparseVector{Float64, Int64} with 12 stored entries: [32841 ] = 1.07611 [199734 ] = -0.433621 [1822585] = 0.95794 [2139478] = -0.604192 [2547398] = -0.0881245 [3136716] = 0.788702 [3236962] = -1.83681 [3300971] = -0.110756 [4834678] = -1.52597 [7589493] = 0.313444 [7794805] = -1.61942 [9230011] = -0.144024
## But getting a random row in S is slow
@time S[rand(1:size(T,1)),:]
0.058894 seconds (6 allocations: 352 bytes)
10000000-element SparseVector{Float64, Int64} with 7 stored entries: [380418 ] = 0.41565 [419894 ] = -1.74231 [3724358] = -1.61303 [4332260] = -1.60416 [5362837] = -0.891753 [8527827] = 0.988961 [9115564] = 0.303262