Accurate RAPr computations with the Symbolic Toolbox
Our initial attempt at using the symbolic toolbox may have been slightly incorrect. This code attempts to use the symbolic
Contents
Experimental setup
This experiment should be run from the rapr/experiments/correctness directory. To reduce the computational load, we'll load the cs-stanford graph and just take the largest strong component.
cwd = pwd; dirtail = 'experiments/correctness'; if strcmp(cwd(end-length(dirtail)+1:end),dirtail) == 0 warning('%s should be executed from rapr/%s\n', mfilename, dirtail); end load ('../../data/harvard500.mat'); % just use the largest strong component for this experiment; P = Pcc; A = Acc; %load ('../../data/example-small.mat'); A(1,:) = ones(1,size(A,1)); n=size(P,1); v=ones(n,1)./n;
Setup where to do the integrals
symexact = 0;
Add the path
addpath('../../matlab');
Generate the linear system
The goal with this experiment is to symbolically solve the PageRank system in the value of alpha for a non-trivial case. To do this, we have to be a little careful with how we write everything. (For the harvard500 matrix, this isn't explicitly necessary, but this code was originally written for the case when the matrix was a little bit bigger.
Generate our symbolic variable alpha
clear alpha; alpha=sym('alpha','real');
Generate the right hand side and the base matrix for the PageRank vector.
e = sym(ones(n,1)); b = (1-alpha)*e/n; fprintf('... build identity matrix ... '); tic M = diag(e); fprintf('... done! %f sec\n', toc);
... build identity matrix ... ... done! 0.082332 sec
Setup the PageRank matrix as a symbolic matrix.
fprintf('... assembling Pagerank matrix ...\n'); tic [nzi nzj nzv] = find(A); d = sum(A,2); nextupdate = floor(nnz(A)/100); for i=1:nnz(A) M(nzj(i),nzi(i))=M(nzj(i),nzi(i))-alpha/d(nzi(i)); if i > nextupdate, fprintf('... ... %i/%i in %7f sec\n', i, nnz(A),toc); nextupdate=nextupdate+floor(nnz(A)/100); end end
... assembling Pagerank matrix ... ... ... 20/1963 in 0.033364 sec ... ... 39/1963 in 0.063281 sec ... ... 58/1963 in 0.094449 sec ... ... 77/1963 in 0.124971 sec ... ... 96/1963 in 0.156218 sec ... ... 115/1963 in 0.187364 sec ... ... 134/1963 in 0.217492 sec ... ... 153/1963 in 0.246644 sec ... ... 172/1963 in 0.273132 sec ... ... 191/1963 in 0.299286 sec ... ... 210/1963 in 0.329855 sec ... ... 229/1963 in 0.360720 sec ... ... 248/1963 in 0.391900 sec ... ... 267/1963 in 0.423053 sec ... ... 286/1963 in 0.453224 sec ... ... 305/1963 in 0.483371 sec ... ... 324/1963 in 0.514498 sec ... ... 343/1963 in 0.544032 sec ... ... 362/1963 in 0.572222 sec ... ... 381/1963 in 0.603010 sec ... ... 400/1963 in 0.634221 sec ... ... 419/1963 in 0.665352 sec ... ... 438/1963 in 0.696165 sec ... ... 457/1963 in 0.727348 sec ... ... 476/1963 in 0.758461 sec ... ... 495/1963 in 0.789239 sec ... ... 514/1963 in 0.820010 sec ... ... 533/1963 in 0.851174 sec ... ... 552/1963 in 0.881973 sec ... ... 571/1963 in 0.912423 sec ... ... 590/1963 in 0.943707 sec ... ... 609/1963 in 0.974840 sec ... ... 628/1963 in 1.005955 sec ... ... 647/1963 in 1.037101 sec ... ... 666/1963 in 1.067899 sec ... ... 685/1963 in 1.097357 sec ... ... 704/1963 in 1.128545 sec ... ... 723/1963 in 1.159672 sec ... ... 742/1963 in 1.190776 sec ... ... 761/1963 in 1.221835 sec ... ... 780/1963 in 1.252999 sec ... ... 799/1963 in 1.284102 sec ... ... 818/1963 in 1.315209 sec ... ... 837/1963 in 1.346376 sec ... ... 856/1963 in 1.377485 sec ... ... 875/1963 in 1.408601 sec ... ... 894/1963 in 1.439755 sec ... ... 913/1963 in 1.470873 sec ... ... 932/1963 in 1.501988 sec ... ... 951/1963 in 1.533138 sec ... ... 970/1963 in 1.564340 sec ... ... 989/1963 in 1.595483 sec ... ... 1008/1963 in 1.626715 sec ... ... 1027/1963 in 1.657833 sec ... ... 1046/1963 in 1.688943 sec ... ... 1065/1963 in 1.720074 sec ... ... 1084/1963 in 1.751224 sec ... ... 1103/1963 in 1.782309 sec ... ... 1122/1963 in 1.813447 sec ... ... 1141/1963 in 1.844595 sec ... ... 1160/1963 in 1.875715 sec ... ... 1179/1963 in 1.906833 sec ... ... 1198/1963 in 1.938006 sec ... ... 1217/1963 in 1.969126 sec ... ... 1236/1963 in 2.000345 sec ... ... 1255/1963 in 2.031504 sec ... ... 1274/1963 in 2.062650 sec ... ... 1293/1963 in 2.093765 sec ... ... 1312/1963 in 2.124944 sec ... ... 1331/1963 in 2.156123 sec ... ... 1350/1963 in 2.187233 sec ... ... 1369/1963 in 2.218331 sec ... ... 1388/1963 in 2.249475 sec ... ... 1407/1963 in 2.280579 sec ... ... 1426/1963 in 2.311653 sec ... ... 1445/1963 in 2.345934 sec ... ... 1464/1963 in 2.376740 sec ... ... 1483/1963 in 2.407848 sec ... ... 1502/1963 in 2.439243 sec ... ... 1521/1963 in 2.470549 sec ... ... 1540/1963 in 2.501904 sec ... ... 1559/1963 in 2.533212 sec ... ... 1578/1963 in 2.565560 sec ... ... 1597/1963 in 2.596850 sec ... ... 1616/1963 in 2.628116 sec ... ... 1635/1963 in 2.659341 sec ... ... 1654/1963 in 2.690476 sec ... ... 1673/1963 in 2.721677 sec ... ... 1692/1963 in 2.752877 sec ... ... 1711/1963 in 2.784009 sec ... ... 1730/1963 in 2.815109 sec ... ... 1749/1963 in 2.846259 sec ... ... 1768/1963 in 2.877385 sec ... ... 1787/1963 in 2.908501 sec ... ... 1806/1963 in 2.939660 sec ... ... 1825/1963 in 2.970777 sec ... ... 1844/1963 in 3.001583 sec ... ... 1863/1963 in 3.032765 sec ... ... 1882/1963 in 3.066104 sec ... ... 1901/1963 in 3.097307 sec ... ... 1920/1963 in 3.128570 sec ... ... 1939/1963 in 3.159696 sec ... ... 1958/1963 in 3.190809 sec
Solve the symbolic system and save the output
fprintf('... solving symbolic system ...\n'); tic x = M\b; fprintf('... done! %f sec\n', toc); save 'exact-sym-inv.mat' x
... solving symbolic system ... ... done! 257.398744 sec
Write the functions for Mathematica
For the har500 system, we cannot integrate the expressions explicitly, so we export them to Mathematica for highly accurate (32 digit) numerical integration. I really wish that Matlab's symbolic toolbox had a way to do this.
syms a; xa = subs(x,alpha,a); fid=fopen('exact-sym-inv.mf','wt'); for i=1:n, fprintf(fid,'%s\n',char(xa(i))); end fclose(fid); clear a;
Integrate expressions exactly
For some smaller systems, this function can integrate the results itself. So if symexact is set, then it'll solve systems here. Otherwise, it'll load the results from Mathematica. (Which means you have to go in and compute the results with Mathematica separately.)
% Setup our standard list of parameters params = [ 0, 0, 0.6, 0.9 % uniform [0,6,0.9] 2, 16, 0, 1 % skewed right 1, 1, 0.1, 0.9 % equicentric -0.5,-0.5, 0.2, 0.7 % bi-modal ]; if symexact for i=1:size(params,1) a=params(i,1);b=params(i,2); l=params(i,3); r=params(i,4); results(i).d = alphadist('beta',a,b,l,r); f = (alpha-sym(l))^b*(sym(r)-alpha)^a; %wf = int(f,alpha,l,r); wf = (r-l)^(a+b+1)*beta(a+1,b+1); % computed un-normalized ex and ex2 ex = zeros(n,1); ex2=ex; dpts=unique(round(linspace(1,n,100))); dp=1; tic; for j=1:n ex(j) = double(int(x(j)*f,alpha,l,r)); ex2(j) = double(int(x(j)*f,alpha,l,r)); if j>=dpts(dp) dt=toc; etr=n*dt/j; fprintf('dist=%2i node=%6i etr=%7f sec\n',i,j,etr); end end ex = ex./double(wf); ex2 = ex2./double(wf); stdx = sqrt(ex2-ex.^2); exactex(:,i) = ex; exactstdx(:,i) = stdx; results(i).ex.exact = ex; results(i).stdx.exact = stdx; end end
Load results from files
if ~symexact, for i=1:size(params,1) a=params(i,1);b=params(i,2); l=params(i,3); r=params(i,4); results(i).d = alphadist('beta',a,b,l,r); mathematicadata = struct2cell(load(sprintf('har500-b%i.ex.mat',i))); results(i).ex.exact = mathematicadata{1}; mathematicadata = struct2cell(load(sprintf('har500-b%i.ex2.mat',i))); results(i).ex2.exact = mathematicadata{1}; results(i).stdx.exact = sqrt(results(i).ex2.exact-results(i).ex.exact.^2); end end
Compare against algorithms
Now, let's see how all the algorithms to. We'll use the same parameters from the convergence computations.
Nmc=1e5; Npd=Nmc; gqNs=0:10:100; pceNs=gqNs; mcalg='direct';pcealg='direct';gqalg='direct';
Compute a monte-carlo approximation and store the convergence results. The mcrapr code takes an additional (final) parameter which allows us to check intermediate convergence against an exact solution. If this is set, then convergence results are reported with reference to these exact solutions instead of the local change in solution.
tic; for di=1:length(results) eex = results(di).ex.exact; estdx = results(di).stdx.exact; d = results(di).d; [ex stdx alphas conv]=mcrapr(P,Nmc,d,mcalg,[],[eex estdx]); results(di).conv.mc=conv; end fprintf('... %7i monte carlo samples took %f seconds\n', Nmc, toc); save 'exact-har500-conv-results.mat' results % save intermediate results
... 100000 monte carlo samples took 721.960976 seconds
Compute a path damping approximation and store the convergence results. Path damping also takes the extra exact solutions.
tic; for di=1:length(results) eex = results(di).ex.exact; estdx = results(di).stdx.exact; d = results(di).d; [ex stdx pdcoeffs conv] = pdrapr(P,results(di).d,0,Npd,[],[eex estdx]); results(di).conv.pd=conv; end fprintf('... %7i path damping iterations took %f seconds\n', Npd, toc); save 'exact-har500-conv-results.mat' results % save intermediate results
Warning: not computing stdx Warning: not computing stdx Warning: not computing stdx Warning: not computing stdx ... 100000 path damping iterations took 733.958640 seconds
Compute a series of GQ approximations and explicitly difference in the result.
tic; for di=1:length(results) eex = results(di).ex.exact; estdx = results(di).stdx.exact; d = results(di).d; conv=zeros(length(gqNs),3); for Ni=1:length(gqNs) [ex stdx] = gqrapr(P,gqNs(Ni)+1,d,gqalg); conv(Ni,:) = [gqNs(Ni),norm(ex-eex,1),norm(stdx-estdx,1)]; end results(di).conv.gq=conv; %figure(di); semilogy(conv(:,1),conv(:,2),'.-',conv(:,1),conv(:,3),'*--'); end fprintf('... %i GQPR solves took %f seconds\n',sum(gqNs)+sum(gqNs+1),toc); save 'exact-har500-conv-results.mat' results % save intermediate results
... 1111 GQPR solves took 4.207093 seconds
Compute a seris of PCE approximations and explicitly compute the changes in the result. (Note that this should be identical to the GQ results, but ends of changing a little due to numerical accuracy issues near 10^{-16}.)
tic for di=1:length(results) eex = results(di).ex.exact; estdx = results(di).stdx.exact; d = results(di).d; conv=zeros(length(pceNs),3); for Ni=1:length(pceNs) [ex stdx] = pcerapr(P,pceNs(Ni),d,pcealg); conv(Ni,:) = [pceNs(Ni),norm(ex-eex,1),norm(stdx-estdx,1)]; end results(di).conv.pce=conv; end fprintf('... %i PCEPR solves took %f seconds\n',sum(pceNs)+sum(pceNs+1),toc)
Warning: stdx not computed when N=0 Warning: stdx not computed when N=0 Warning: stdx not computed when N=0 Warning: stdx not computed when N=0 ... 1111 PCEPR solves took 36.928198 seconds
Save the results
save 'exact-har500-conv-results.mat' results % save intermediate results
Plot the convergence results by algorithm.
Each distribution is drawn with its native colors from previous plots with points indicated by '.' for an expectation and '+' for a standard deviation. We do not es
algs={'mc','pd','gq','pce'}; nas=length(algs); ls={'LineWidth',0.8};ms={'MarkerSize',15}; s={'-.','-','--',':'}; c={[1 0.5 0.5],[0.5 0.5 1],[0 0.75 0.5],[1 0 0.25]}; for ai=1:nas figure(ai); set(gcf,'Color','w','defaultaxesfontsize',12); clf; for di=1:length(results) cd=results(di).conv.(algs{ai}); x=cd(:,1); y1=cd(:,2); if log10(max(x))>2.5, pfun='loglog'; else pfun='semilogy'; end if size(cd,2)>2, h=feval(pfun,x,y1,['.',s{di}],x,cd(:,3),['+',s{di}]); else h=feval(pfun,x,y1,['.',s{di}]); end, set(h(1),ms{:}); hold on; set(h,'Color',c{di},ls{:}); pbaspect([2.5 1 1]); ylim([1e-16 1]); box off; print(gcf,sprintf('exact-har500-conv-%s.eps',algs{ai}),'-depsc2'); end end



