Show uncertainty in Morrison et al.s GeneRank matrix
Morrison et al. define the GeneRank variant of PageRank to compute the important of genes from a set of gene expression measurement combined with a set of connectivity information about genes. They report that alpha=0.75-0.85 gives the best results. We'll work with their data an examine what happens for a large set of parameters in our mode.
Contents
Setup the experiment
This experiment should be run from the rapr/experiments/generank directory
cwd = pwd; dirtail = 'experiments/generank'; if strcmp(cwd(end-length(dirtail)+1:end),dirtail) == 0 warning('rapr:dir','%s should be executed from rapr/%s\n',mfilename,dirtail); end addpath('../../matlab'); % ensure we have the RAPr codes available
Load the data. The file generank.mat comes from http://www.biomedcentral.com/1471-2105/6/233/additional/ and then we converted the 4k by 4k matrix w_All to be a transition matrix for our codes.
However, we left the vector v = expr_data as is from the file, so we need to normalize it to be a probability vector in this code. The normalization comes from the generank.m file available from the same webpage.
load('../../data/generank.mat'); n=size(P,1); % normalize v to have sum one v = abs(v); v = v/max(v);
Evaluate the algorithm for uniform measures
For the GeneRank problem, the choice of A should not be driven by a random surfer model as in the PageRank case. Instead, the choice is literally an unknown parameter alpha. Consequently, a uniform distribution makes the most sense. The authors of the GeneRank paper claim that 0.75 <= alpha <= 0.85 give the most interesting results. However, what should we pick inside this interval?
In this experiment, we'll assume
% set N for the gqrapr algorithm N=50; pts=0:.2:1; npts = length(pts); tic; ktex=zeros(npts); ktstdx=ktex; % kendall tau ex,stdx for pi=1:npts,for pj=pi+1:npts l=pts(pi); r=pts(pj); fprintf('starting [%3.1f,%3.1f]...',l,r); tic d=alphadist('unif',l,r);eA=d.mf(1);eA=eA(end); xeA=(speye(n)-eA*P')\v; xeA=xeA./norm(xeA,1); % compute x(E(A)) [ex stdx] = gqrapr(P,50,d,'direct',v); % compute E[x(A)], Std[x(A)] ktex(pi,pj)=ktau(xeA,ex); ktstdx(pi,pj)=ktau(xeA,stdx); % compute taus fprintf (' ... done! %f secs\n', toc); end, end % save the output save 'generank-unif.mat' pts ktex ktstdx
starting [0.0,0.2]... ... done! 35.821255 secs starting [0.0,0.4]... ... done! 33.965974 secs starting [0.0,0.6]... ... done! 34.079978 secs starting [0.0,0.8]... ... done! 33.965636 secs starting [0.0,1.0]... ... done! 34.014011 secs starting [0.2,0.4]... ... done! 34.197860 secs starting [0.2,0.6]... ... done! 33.969022 secs starting [0.2,0.8]... ... done! 33.911383 secs starting [0.2,1.0]... ... done! 34.155754 secs starting [0.4,0.6]... ... done! 33.969462 secs starting [0.4,0.8]... ... done! 33.969453 secs starting [0.4,1.0]... ... done! 34.150154 secs starting [0.6,0.8]... ... done! 34.005013 secs starting [0.6,1.0]... ... done! 33.908104 secs starting [0.8,1.0]... ... done! 34.225825 secs
Display the output as a matlab table with colored cells. This code generates the input to a table for the paper. It colors the Matlab cells with the ktau strength. Positive values generate red cells and negative values generate blue cells. Unfortunately, we don't see any negative examples in these cases. We generate two tables, one for ex, and one for stdx.
load 'generank-unif.mat'; npts=length(pts); ccv = @(x) [1 2-x 2-x].*[1 0.5 0.5]*(x>=0)+... % red for pos [x+2 x+2 1].*[0.5 0.5 1]*(x<0); % blue for neg ccc = @(x) sprintf('\\cellcolor[rgb]{%0.2f,%0.2f,%0.2f}',ccv(x)); fprintf('ktau dist between x(E(A)) and E(x(A)) for A uniform \n'); disp(ktex); fprintf('... latex table code ... \n'); vals = ktex; fprintf(' & %1.1f', pts(2:end)); fprintf('\\\\ \\hline \n'); for pi=1:npts fprintf('%1.1f & ', pts(pi)); for pj=1:npts if pj>pi, fprintf('%s %5.3f', ccc(vals(pi,pj)), vals(pi,pj)); end if pj==npts, fprintf('\\\\ \n'); elseif pj>1, fprintf(' & '); end end end, fprintf('\n'); fprintf('ktau dist between x(E(A)) and Std(x(A)) for A uniform \n'); disp(ktstdx); fprintf('... latex table code ... \n'); vals = ktstdx; fprintf(' & %1.1f', pts(2:end)); fprintf('\\\\ \\hline \n'); for pi=1:npts fprintf('%1.1f & ', pts(pi)); for pj=1:npts if pj>pi, fprintf('%s %5.3f', ccc(vals(pi,pj)), vals(pi,pj)); end if pj==npts, fprintf('\\\\ \n'); elseif pj>1, fprintf(' & '); end end end, fprintf('\n');
ktau dist between x(E(A)) and E(x(A)) for A uniform Columns 1 through 3 0 0.999153910796035 0.995902751402379 0 0 0.998748454806139 0 0 0 0 0 0 0 0 0 0 0 0 Columns 4 through 6 0.988453636705646 0.972598044724126 0.934971190579693 0.993625016100065 0.980177974331386 0.943758824880184 0.997934794570811 0.988209166869682 0.954465025719488 0 0.995758622380715 0.967386599824762 0 0 0.983612654549161 0 0 0 ... latex table code ... & 0.2 & 0.4 & 0.6 & 0.8 & 1.0\\ \hline 0.0 & \cellcolor[rgb]{1.00,0.50,0.50} 0.999 & \cellcolor[rgb]{1.00,0.50,0.50} 0.996 & \cellcolor[rgb]{1.00,0.51,0.51} 0.988 & \cellcolor[rgb]{1.00,0.51,0.51} 0.973 & \cellcolor[rgb]{1.00,0.53,0.53} 0.935\\ 0.2 & & \cellcolor[rgb]{1.00,0.50,0.50} 0.999 & \cellcolor[rgb]{1.00,0.50,0.50} 0.994 & \cellcolor[rgb]{1.00,0.51,0.51} 0.980 & \cellcolor[rgb]{1.00,0.53,0.53} 0.944\\ 0.4 & & & \cellcolor[rgb]{1.00,0.50,0.50} 0.998 & \cellcolor[rgb]{1.00,0.51,0.51} 0.988 & \cellcolor[rgb]{1.00,0.52,0.52} 0.954\\ 0.6 & & & & \cellcolor[rgb]{1.00,0.50,0.50} 0.996 & \cellcolor[rgb]{1.00,0.52,0.52} 0.967\\ 0.8 & & & & & \cellcolor[rgb]{1.00,0.51,0.51} 0.984\\ 1.0 & & & & & \\ ktau dist between x(E(A)) and Std(x(A)) for A uniform Columns 1 through 3 0 0.166309589847077 0.211662997572823 0 0 0.256138192362956 0 0 0 0 0 0 0 0 0 0 0 0 Columns 4 through 6 0.261256606597638 0.316928868412181 0.389021139955952 0.304770583942663 0.355768479115878 0.414426755326689 0.342394989568625 0.381473370237008 0.413101985078932 0 0.38231968037935 0.380679948203753 0 0 0.325921247370399 0 0 0 ... latex table code ... & 0.2 & 0.4 & 0.6 & 0.8 & 1.0\\ \hline 0.0 & \cellcolor[rgb]{1.00,0.92,0.92} 0.166 & \cellcolor[rgb]{1.00,0.89,0.89} 0.212 & \cellcolor[rgb]{1.00,0.87,0.87} 0.261 & \cellcolor[rgb]{1.00,0.84,0.84} 0.317 & \cellcolor[rgb]{1.00,0.81,0.81} 0.389\\ 0.2 & & \cellcolor[rgb]{1.00,0.87,0.87} 0.256 & \cellcolor[rgb]{1.00,0.85,0.85} 0.305 & \cellcolor[rgb]{1.00,0.82,0.82} 0.356 & \cellcolor[rgb]{1.00,0.79,0.79} 0.414\\ 0.4 & & & \cellcolor[rgb]{1.00,0.83,0.83} 0.342 & \cellcolor[rgb]{1.00,0.81,0.81} 0.381 & \cellcolor[rgb]{1.00,0.79,0.79} 0.413\\ 0.6 & & & & \cellcolor[rgb]{1.00,0.81,0.81} 0.382 & \cellcolor[rgb]{1.00,0.81,0.81} 0.381\\ 0.8 & & & & & \cellcolor[rgb]{1.00,0.84,0.84} 0.326\\ 1.0 & & & & & \\
Evaluate the algorithm for a set of beta measures
All of our codes work for a beta distribution as well. In this case, we'll look at a beta distribution over the interval [0,1]. Using an appropraite beta function allows us to heavily weight choices in the interval [0.75,0.85], but still consider values outside that interval. A Beta(4,13) puts even approximately weight on [0.75-0.85] as on [0.65,0.75] and isn't a bad guestimate of this range. In this experiment, we'll test a range of beta distributions for values of a,b less than 16.
% set N for the gqrapr algorithm N=50; pts=1:3:16; npts = length(pts); tic; ktex=zeros(npts); ktstdx=ktex; % kendall tau ex,stdx for pi=1:npts, for pj=1:npts a=pts(pi); b=pts(pj); fprintf('starting (%i,%i)...',a,b); tic d=alphadist('beta',a,b);eA=d.mf(1);eA=eA(end); xeA=(speye(n)-eA*P')\v; xeA=xeA./norm(xeA,1); % compute x(E(A)) [ex stdx] = gqrapr(P,50,d,'direct',v); % compute E[x(A)], Std[x(A)] ktex(pi,pj)=ktau(xeA,ex); ktstdx(pi,pj)=ktau(xeA,stdx); % compute taus fprintf (' ... done! %f secs\n', toc); end, end save 'generank-beta.mat' pts ktex ktstdx
starting (1,1)... ... done! 34.097790 secs starting (1,4)... ... done! 33.919597 secs starting (1,7)... ... done! 34.160217 secs starting (1,10)... ... done! 33.837667 secs starting (1,13)... ... done! 33.742732 secs starting (1,16)... ... done! 33.697770 secs starting (4,1)... ... done! 33.950472 secs starting (4,4)... ... done! 33.495354 secs starting (4,7)... ... done! 33.525062 secs starting (4,10)... ... done! 33.567836 secs starting (4,13)... ... done! 33.579954 secs starting (4,16)... ... done! 34.099114 secs starting (7,1)... ... done! 33.928657 secs starting (7,4)... ... done! 34.027759 secs starting (7,7)... ... done! 33.752655 secs starting (7,10)... ... done! 33.937168 secs starting (7,13)... ... done! 33.999789 secs starting (7,16)... ... done! 34.254916 secs starting (10,1)... ... done! 33.865671 secs starting (10,4)... ... done! 33.984544 secs starting (10,7)... ... done! 34.146307 secs starting (10,10)... ... done! 33.598635 secs starting (10,13)... ... done! 32.956274 secs starting (10,16)... ... done! 32.921526 secs starting (13,1)... ... done! 32.842540 secs starting (13,4)... ... done! 33.113293 secs starting (13,7)... ... done! 33.923637 secs starting (13,10)... ... done! 34.066755 secs starting (13,13)... ... done! 33.661312 secs starting (13,16)... ... done! 33.391877 secs starting (16,1)... ... done! 33.539222 secs starting (16,4)... ... done! 33.716831 secs starting (16,7)... ... done! 34.202432 secs starting (16,10)... ... done! 33.927360 secs starting (16,13)... ... done! 33.986841 secs starting (16,16)... ... done! 34.110226 secs
Now write the tables, using the same color codes as before. This time, we'll write the entire table instead of skipping the lower half.
load 'generank-beta.mat'; npts=length(pts); fprintf('ktau dist between x(E(A)) and E(x(A)) for A beta \n'); disp(ktex); fprintf('... latex table code ... \n'); vals=ktex; fprintf(' & %i', pts); fprintf('\\\\ \\hline \n'); for pi=1:npts fprintf('%i & ', pts(pi)); for pj=1:npts fprintf('%s %5.3f', ccc(vals(pi,pj)), vals(pi,pj)); if pj==npts, fprintf('\\\\ \n'); else, fprintf(' & '); end end end, fprintf('\n'); fprintf('ktau dist between x(E(A)) and Std(x(A)) for A beta \n'); disp(ktstdx); fprintf('... latex table code ... \n'); vals = ktstdx; fprintf(' & %i', pts); fprintf('\\\\ \\hline \n'); for pi=1:npts fprintf('%i & ', pts(pi)); for pj=1:npts fprintf('%s %5.3f', ccc(vals(pi,pj)), vals(pi,pj)); if pj==npts, fprintf('\\\\ \n'); else, fprintf(' & '); end end end, fprintf('\n');
ktau dist between x(E(A)) and E(x(A)) for A beta Columns 1 through 3 0.963935308066816 0.964999853427596 0.970467614827264 0.989690828682611 0.984854915898576 0.984205600068107 0.995215815358824 0.991861261393641 0.990465463934085 0.997214513599484 0.994877965946222 0.993733541140431 0.99819416701717 0.996481529434182 0.995466026959551 0.998724942226051 0.997477183408322 0.99662626968346 Columns 4 through 6 0.975210212284452 0.978923314473741 0.981798392320387 0.984679030863263 0.985447252588669 0.986442479274815 0.990008586089828 0.990034541012735 0.990253057256067 0.993048253653823 0.992735077264123 0.992662096864816 0.994855430751868 0.994548178016351 0.994352259557271 0.996047308686377 0.995735964470849 0.995481905084829 ... latex table code ... & 1 & 4 & 7 & 10 & 13 & 16\\ \hline 1 & \cellcolor[rgb]{1.00,0.52,0.52} 0.964 & \cellcolor[rgb]{1.00,0.52,0.52} 0.965 & \cellcolor[rgb]{1.00,0.51,0.51} 0.970 & \cellcolor[rgb]{1.00,0.51,0.51} 0.975 & \cellcolor[rgb]{1.00,0.51,0.51} 0.979 & \cellcolor[rgb]{1.00,0.51,0.51} 0.982\\ 4 & \cellcolor[rgb]{1.00,0.51,0.51} 0.990 & \cellcolor[rgb]{1.00,0.51,0.51} 0.985 & \cellcolor[rgb]{1.00,0.51,0.51} 0.984 & \cellcolor[rgb]{1.00,0.51,0.51} 0.985 & \cellcolor[rgb]{1.00,0.51,0.51} 0.985 & \cellcolor[rgb]{1.00,0.51,0.51} 0.986\\ 7 & \cellcolor[rgb]{1.00,0.50,0.50} 0.995 & \cellcolor[rgb]{1.00,0.50,0.50} 0.992 & \cellcolor[rgb]{1.00,0.50,0.50} 0.990 & \cellcolor[rgb]{1.00,0.50,0.50} 0.990 & \cellcolor[rgb]{1.00,0.50,0.50} 0.990 & \cellcolor[rgb]{1.00,0.50,0.50} 0.990\\ 10 & \cellcolor[rgb]{1.00,0.50,0.50} 0.997 & \cellcolor[rgb]{1.00,0.50,0.50} 0.995 & \cellcolor[rgb]{1.00,0.50,0.50} 0.994 & \cellcolor[rgb]{1.00,0.50,0.50} 0.993 & \cellcolor[rgb]{1.00,0.50,0.50} 0.993 & \cellcolor[rgb]{1.00,0.50,0.50} 0.993\\ 13 & \cellcolor[rgb]{1.00,0.50,0.50} 0.998 & \cellcolor[rgb]{1.00,0.50,0.50} 0.996 & \cellcolor[rgb]{1.00,0.50,0.50} 0.995 & \cellcolor[rgb]{1.00,0.50,0.50} 0.995 & \cellcolor[rgb]{1.00,0.50,0.50} 0.995 & \cellcolor[rgb]{1.00,0.50,0.50} 0.994\\ 16 & \cellcolor[rgb]{1.00,0.50,0.50} 0.999 & \cellcolor[rgb]{1.00,0.50,0.50} 0.997 & \cellcolor[rgb]{1.00,0.50,0.50} 0.997 & \cellcolor[rgb]{1.00,0.50,0.50} 0.996 & \cellcolor[rgb]{1.00,0.50,0.50} 0.996 & \cellcolor[rgb]{1.00,0.50,0.50} 0.995\\ ktau dist between x(E(A)) and Std(x(A)) for A beta Columns 1 through 3 0.378260938868487 0.410174495690102 0.385725045698525 0.26290536588202 0.361814368603603 0.395077169769585 0.216902620769714 0.30526257867656 0.355458988763943 0.193631417206659 0.268250202254638 0.319201942621671 0.17967992375304 0.24351855340863 0.291067633393102 0.170402735724735 0.225699164152492 0.269493241852621 Columns 4 through 6 0.361762024829118 0.344073344831124 0.331233988872345 0.399240657184287 0.392466544848713 0.382965500701441 0.381516853383627 0.392090366289958 0.393836263967129 0.352192133959369 0.372957536322935 0.38452879414883 0.325763744475214 0.350160396024004 0.367225311099931 0.303385944683577 0.329414394387254 0.348785546787472 ... latex table code ... & 1 & 4 & 7 & 10 & 13 & 16\\ \hline 1 & \cellcolor[rgb]{1.00,0.81,0.81} 0.378 & \cellcolor[rgb]{1.00,0.79,0.79} 0.410 & \cellcolor[rgb]{1.00,0.81,0.81} 0.386 & \cellcolor[rgb]{1.00,0.82,0.82} 0.362 & \cellcolor[rgb]{1.00,0.83,0.83} 0.344 & \cellcolor[rgb]{1.00,0.83,0.83} 0.331\\ 4 & \cellcolor[rgb]{1.00,0.87,0.87} 0.263 & \cellcolor[rgb]{1.00,0.82,0.82} 0.362 & \cellcolor[rgb]{1.00,0.80,0.80} 0.395 & \cellcolor[rgb]{1.00,0.80,0.80} 0.399 & \cellcolor[rgb]{1.00,0.80,0.80} 0.392 & \cellcolor[rgb]{1.00,0.81,0.81} 0.383\\ 7 & \cellcolor[rgb]{1.00,0.89,0.89} 0.217 & \cellcolor[rgb]{1.00,0.85,0.85} 0.305 & \cellcolor[rgb]{1.00,0.82,0.82} 0.355 & \cellcolor[rgb]{1.00,0.81,0.81} 0.382 & \cellcolor[rgb]{1.00,0.80,0.80} 0.392 & \cellcolor[rgb]{1.00,0.80,0.80} 0.394\\ 10 & \cellcolor[rgb]{1.00,0.90,0.90} 0.194 & \cellcolor[rgb]{1.00,0.87,0.87} 0.268 & \cellcolor[rgb]{1.00,0.84,0.84} 0.319 & \cellcolor[rgb]{1.00,0.82,0.82} 0.352 & \cellcolor[rgb]{1.00,0.81,0.81} 0.373 & \cellcolor[rgb]{1.00,0.81,0.81} 0.385\\ 13 & \cellcolor[rgb]{1.00,0.91,0.91} 0.180 & \cellcolor[rgb]{1.00,0.88,0.88} 0.244 & \cellcolor[rgb]{1.00,0.85,0.85} 0.291 & \cellcolor[rgb]{1.00,0.84,0.84} 0.326 & \cellcolor[rgb]{1.00,0.82,0.82} 0.350 & \cellcolor[rgb]{1.00,0.82,0.82} 0.367\\ 16 & \cellcolor[rgb]{1.00,0.91,0.91} 0.170 & \cellcolor[rgb]{1.00,0.89,0.89} 0.226 & \cellcolor[rgb]{1.00,0.87,0.87} 0.269 & \cellcolor[rgb]{1.00,0.85,0.85} 0.303 & \cellcolor[rgb]{1.00,0.84,0.84} 0.329 & \cellcolor[rgb]{1.00,0.83,0.83} 0.349\\