Fast Katz and Commuters codes
Pooya Esfandiar, Francesco Bonchi, David F. Gleich, Chen Greif, Laks V. S. Lakshmanan, and Byung-Won On
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks. Submitted for publication.
These codes are research prototypes and may not work for you. No promises.
Superceded: see http://www.cs.purdue.edu/homes/dgleich/codes/fast-katz-2011/ for a new version of these codes.
Download
-
fast-katz-20100915.tar.gz last updated 2010-09-15 {: .nobullets}
Overview
The package is organized by directory
matlab
- The main algorithms
data
- data files
experiments
- the codes to generate the figures
tools
- some extra codes we used
Usage
Here is a usage synopsis. Please read the codes for more information or send questions.
cd('../matlab');
load('../data/arxiv_lcc.mat');
% compute Katz and pairwise approximation using alg 1
% with a=1e-4, tol=1e-4, and b=||A||_1
[Kij bounds time] = katz_pairwise(A,i,j,1e-4,alpha);
[Cij bounds time] = commute_pairwise(A,i,j,1e-4,alpha);
% compute a sparse approximation of the top-k results for Katz
x = katz_push_mex(A,alpha,i,1e-4);
Experiments
|Experiment|Description|Figure|
|:------------------|:------------------------------------|:------------------|
|pairwise/katz_pairwise_graphs.m
| Compute data for Katz pairwise experiment and plot figs | Fig.1 |
|pairwise/commute_pairwise_graphs.m
| Compute data for Commute pairwise experiment and plot figs | Fig.2 |
|katz_topk/katz_topk.m
| Compute data for Katz top-k experiment and plot figs| Fig.3 |
Data
DBLP coauthor graph
We extracted the DBLP coauthors graph from a recent snapshot (2005-2008) of the DBLP database. We considered only nodes (authors) that have at least three publications in the snapshot. There is an undirected edge between two authors if they have coauthored a paper. From the resulting set of nodes, we randomly chose a sample of 100,000 nodes, extracted the largest connected component, and discarded any weights on the edges.
arXiv coauthor graph
This dataset contains another coauthorship graph extracted by a snapshot (1990-2000) of arXiv, which is an e-print service owned, operated and funded by Cornell University, and which contains bibliographies in many fields including computer science and physics. This graph is much denser than DBLP. Again, we extracted the largest connected component of this graph and only work with that subset.
Flickr contacts
Flickr is a popular online-community for sharing photos, with millions of users. The first graph we construct is representative of its social network, in which the node set V represents users, and the edge set E is such that (u, v) is in E if and only if a user u has added user v as his/her contact. We start with a crawl extracted from Flickr in May 2006. This crawl began with a single user and continued until the total personalized PageRank on the set of uncrawled nodes was less than 0.0001. The result of the crawl was a graph with 820,878 nodes and 9,837,214 edges. In order to create a sub-graph suitable for our experimentation we performed the following steps. First, we created a graph from Flickr by taking all the contact relationships that were reciprocal, and second, we again took the largest connected component.