Random Walk Concepts in Networks

The Average First Passage Time

Imagine something is diffusing through a network starting with some seed node \(i\) following a series of discrete time steps. If the graph is connected, the thing will eventually reach every node in the network. However, depending on the connectivity structure of the graph, it will reach some nodes (e.g., those at a smaller graph theoretic distance from the seed node) sooner than others.

Here’s a function called first.pass1 that records the minimum number of steps that it takes for something to get to each node in a graph starting from a given seed node:

   first.pass1 <- function(w, seed = 1) {
      fp <- rep(0, ncol(w))
      k <- 1
      i <- seed
      while(sum(fp[-seed] == 0) != 0) {
         j <- sample(c(1:ncol(w)), 1, prob = w[i, ]) 
         if (fp[j] == 0 & j != seed) {fp[j] <- k}
         i <- j
         k <- k + 1
         }
   names(fp) <- 1:ncol(w)
   return(fp)
   }

The function takes a transition matrix \(\mathbf{P} = \mathbf{D}^{-1}\mathbf{A}\)—where \(\mathbf{D}^{-1}\) is a matrix containing the inverse of the degrees of each node along the diagonals and zeros everywhere else—as the primary input (of the sort we discussed when talking about status and prestige) and returns a vector containing the time step at which the thing that diffused through the network got to the \(j^{th}\) node. It works like this:

  • Line 2 initializes the “first pass” vector fp to all zeros. Line 3 initializes the step counter k and line 4 initializes the scalar i to the whatever node was specified as the seed note in the seed argument of the function.

  • Lines 5-10 implement a little while loop that runs until there are no more zeros in the fp vector. Line 6 samples a random neighbor of node i with probability specified by the relevant entry in the transition matrix (\(p_{ij} = \frac{1}{d_i}\)), and then turns that node j into the new sender node i, marking that as the time that the thing that is diffusing through the network got to j using marker k.

  • Line 11 names the fp vector using the column names of the transition matrix, and line 12 returns the vector.

Let’s test the first.pass1 function out using the undirected version of the friendship network from the Krackhardt high-tech managers data:

   library(networkdata)
   library(igraph)
   g <- as_undirected(ht_friends, mode = "collapse") #undirected graph
   A <- as.matrix(as_adjacency_matrix(g)) #adjacency matrix
   D <- diag(rowSums(A)^-1) #inverse of degree matrix
   P <- D %*% A  #transition matrix
   set.seed(456) 
   first.pass1(P)
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
  0   2 137  24  14  28  43  36   8   9  16  29  15  26   7   1  25  33   6  32 
 21 
 12 

We can see for instance that the rumor got to node 16 in just one time step, but that it took 26 time steps to get to node 14 and 137 (!) to get to node 3.

So it seems like this, the minimum time it takes for something that starts with me to get to you (Fouss, Pirotte, and Saerens 2004), is a good measure of the proximity between me and you in the graph.

However, we wouldn’t want to use just one run of the diffusion process to calculate this proximity measure. Instead a better approach is to use the average time it takes for something to get to the other nodes when it starts from a given node, across a large number of realizations of the diffusion process.

To do that, we can just replicate the first.pass1 function some number of times (e.g., \(n = 100\)) and take an average:

   set.seed(456)
   first.res <- replicate(100, first.pass1(P))
   round(rowMeans(first.res), 2)
    1     2     3     4     5     6     7     8     9    10    11    12    13 
 0.00 16.14 32.88 21.24 17.17 24.35 59.98 27.83 25.85 17.75 10.16 15.78 86.78 
   14    15    16    17    18    19    20    21 
36.32 15.43 32.94  8.65 44.59 19.17 31.47 33.85 

We can see that according to this measure, called the average first passage time (Fouss, Saerens, and Shimbo 2016, 36), when we start with node 1, things get relatively quickly to node 17 (\(\hat{t} = 8.7\)) but they take forever to get to node 13 (\(\hat{t} = 86.8\)).

We can of course, write a wrapper around the first.pass1 function to compute the average first passage time from one node to another using every node in the graph as the initial seed, and put all of the results in a matrix \(\mathbf{M}\) where each entry \(m_{ij}\) is the average first passage time between nodes \(i\) and \(j\):

   first.pass2 <- function(w, n) {
      m <- rowMeans(replicate(n, first.pass1(w)))
      for (i in 2:nrow(w)) {
         m <- rbind(m, rowMeans(replicate(n, first.pass1(w, seed = i))))
      }
   rownames(m) <- 1:nrow(w)
   return(m)
   }

And the result (showing the first ten rows and columns of \(\matbf{M}\)) is:

   set.seed(456)
   round(first.pass2(P, 100), 1)[1:10, 1:10]
      1    2    3    4    5    6    7    8    9   10
1   0.0 16.1 32.9 21.2 17.2 24.4 60.0 27.8 25.9 17.8
2  18.1  0.0 23.4 24.0 14.6 20.5 60.6 32.6 32.6 16.6
3  20.6 22.2  0.0 19.9 17.1 29.1 63.1 29.8 26.8 19.0
4  14.1 14.1 27.4  0.0 18.1 21.7 57.8 28.9 22.7 21.8
5  16.9 16.7 30.0 25.1  0.0 25.0 52.9 36.5 19.9 24.1
6  19.5 18.1 24.4 28.2 15.2  0.0 41.5 38.6 19.5 25.2
7  20.7 15.9 26.8 24.8 16.3 14.2  0.0 33.0 26.1 23.7
8  14.1 14.9 27.2 16.6 17.2 29.2 73.3  0.0 23.8 19.1
9  18.1 16.1 27.5 23.9 16.3 20.7 52.1 32.7  0.0 20.1
10 20.0 18.7 26.2 23.3 16.2 23.8 64.7 27.0 21.2  0.0

Note that the \(\mathbf{M}\) matrix is asymmetric. That is, the average number of steps that it takes for something to get from me to you (\(m_{ij}\)), is not necessarily the same as the average number of steps that it takes for something to get from you to me (\(m_{ji}\)).

The accuracy of the average first passage time estimate we get depends on the number of replications we use to get the average. The bigger, the more accurate. However, it can take a lot of computing time if we increased n to a giant number.

Computing the AFPT by Iterating Until Convergence

There’s a more efficient approach to computing the entries in the average first passage time matrix \(\mathbf{M}\) that involves iterating through a matrix using the information in the transition matrix \(\mathbf{P}\). The basic idea is that the average first passage time for a random walker that starts at node \(i\) and ends at node \(j\) is given by:

\[ m_{ij} = 1 + \sum^N_{j \neq k}p_{ik}m_{k,j} \tag{1}\]

With the proviso that \(m_{ii}\) is always equal to zero. What this tells us is that the average first passage time between any two nodes in the graph \(i\) and \(j\), is given by one plus the product of the probability that something can transition from a sender node \(i\) to an intermediary node \(k\)—given by \(p_{ik}\)—and \(m_{kj}\) which is the average first passage time from that intermediary node \(k\) to the destination node \(j\).

Note that since we need to know \(m_{kj}\) to get \(m_{ij}\), then this opens up a chicken or the egg problem that we can solve through iteration like we did for the status scores in the prestige lesson.

Here’s how to proceed. We first start with a null value for all the entries of the \(\mathbf{M}\) matrix (e.g., \(m_{ij} = 0\) for all \(i\) and \(j\)), compute an initial round of \(m_{ij}\) estimates for all pairs of nodes \(i\) and \(j\) using Equation 1, recompute \(m_{ij}\) using those new values, rinse, repeat, and stop after we don’t get any big changes between successive versions of the \(\mathbf{M}\) matrix.

A function that implement this idea goes like this:

   first.pass3 <- function(w) {
      n <- nrow(w)
      nodes <- 1:n
      m <- matrix(0, n, n) #initializing M matrix values
      d <- 1 #initialize delta
      while(d > 1e-06) { #do while delta is bigger than a tiny number
         old.m <- m
         for (i in nodes) { #loop through every node
            for (j in nodes[-i]) { #loop through every node except i
               m[i,j] <- 1
               for (k in nodes[-c(i,j)]) { #loop through every node except i and j
                  m[i,j] <- m[i,j] + (w[i,k]*old.m[k,j]) #update M matrix values
                  }
               }
            }
         d <- abs(sum(abs(m) - abs(old.m))) #delta equals difference between current and previous M matrix
         } #end while loop
      return(m)
   }

This function takes the probability transition matrix \(\mathbf{P}\) as input and returns the \(\mathbf{M}\) matrix of average first passage times between every pair of nodes in the network.

And the result is:

   round(first.pass3(P), 1)[1:10, 1:10]
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]  0.0 15.0 28.5 21.1 17.7 25.4 59.7 29.9 28.5  21.2
 [2,] 17.0  0.0 29.0 22.3 16.1 23.2 59.0 33.8 28.2  21.8
 [3,] 19.1 17.7  0.0 25.7 16.6 25.4 57.6 34.1 27.4  18.6
 [4,] 15.1 14.4 29.1  0.0 18.0 25.6 59.9 28.6 28.7  20.9
 [5,] 19.4 15.8 27.6 25.7  0.0 24.8 58.4 34.5 25.1  19.9
 [6,] 19.3 15.0 28.6 25.3 16.9  0.0 50.5 35.0 24.2  21.8
 [7,] 19.9 17.3 27.1 26.1 17.0 16.9  0.0 35.2 27.1  22.2
 [8,] 14.9 16.8 28.3 19.5 17.8 26.1 59.9  0.0 28.0  17.7
 [9,] 19.4 17.2 27.7 25.6 14.4 21.3 57.8 34.1  0.0  18.4
[10,] 18.7 17.3 25.4 24.3 15.7 25.4 59.4 30.2 24.9   0.0

Which are the exact (up to rounding error) average first passage times between each pair of nodes in the network.

Computing the AFPT Using the Laplacian Matrix

Finally, there is a way to use matrix algebra magic to compute the average first passage time at the limit (\(n \approx \infty\)) in closed form without averaging or iterations.

To do that, first we need to compute a matrix called the graph Laplacian, which is defined as:

\[ \mathbf{L} = \mathbf{D} - \mathbf{A} \tag{2}\]

Where \(\mathbf{D}\) is a diagonal matrix containing the degrees of each node in the graph along the diagonals and zeroes everywhere else.

In R we can compute \(\mathbf{L}\) like this:

   D <- diag(rowSums(A))
   L <- D - A

Once we have \(\mathbf{L}\), we need to compute a variation known by the less than memorable name of the Moore-Penrose Pseudo-Inverse of the Laplacian, written as \(\mathbf{L}^+\), which is given by:

\[ \mathbf{L}^+ = \left(\mathbf{L} - \frac{\mathbf{E}}{N}\right) ^{-1}- \frac{\mathbf{E}}{N} \tag{3}\]

Where \(\mathbf{E}\) is the all ones matrix of the same dimensions as \(\mathbf{L}\).

So despite the terrible name, \(\mathbf{L}^+\) is easy to compute in R:

   E <- matrix(1, nrow(A), nrow(A))/nrow(A)
   L.p <- solve(L + E) - E

As we can see, E is just a matrix containing the inverse of the number of nodes in the network in every cell (\(\frac{1}{N}\)). We use the function solve to obtain the matrix inverse of the expression in parentheses in Equation 3.

What’s in the \(\mathbf{L}^+\) matrix? Let’s see the first ten rows and columns:

   round(L.p[1:10, 1:10], 3)
        [,1]   [,2]   [,3]   [,4]   [,5]   [,6]   [,7]   [,8]   [,9]  [,10]
 [1,]  0.112  0.004 -0.011  0.015 -0.012 -0.013 -0.025  0.015 -0.013 -0.008
 [2,]  0.004  0.099 -0.015  0.007 -0.002  0.001 -0.021 -0.010 -0.012 -0.012
 [3,] -0.011 -0.015  0.167 -0.017 -0.007 -0.015 -0.014 -0.014 -0.009  0.006
 [4,]  0.015  0.007 -0.017  0.147 -0.015 -0.015 -0.028  0.022 -0.016 -0.007
 [5,] -0.012 -0.002 -0.007 -0.015  0.099 -0.010 -0.019 -0.016  0.007 -0.001
 [6,] -0.013  0.001 -0.015 -0.015 -0.010  0.145  0.030 -0.020  0.011 -0.014
 [7,] -0.025 -0.021 -0.014 -0.028 -0.019  0.030  0.341 -0.030 -0.016 -0.025
 [8,]  0.015 -0.010 -0.014  0.022 -0.016 -0.020 -0.030  0.200 -0.014  0.011
 [9,] -0.013 -0.012 -0.009 -0.016  0.007  0.011 -0.016 -0.014  0.164  0.007
[10,] -0.008 -0.012  0.006 -0.007 -0.001 -0.014 -0.025  0.011  0.007  0.125

As we can see, some of the entries of \(\mathbf{L}^+\) are positive and others are negative. More interestingly, we can see that \(\mathbf{L}^+\) is a symmetric matrix: \(l^+_{ij} = l^+_{ij}\).

Even more interestingly, \(\mathbf{L}^+\) is doubly centered—similar to the modularity matrix considered here. That means both the rows and column sums of the matrix are equal to the all zeros vector:

   round(rowSums(L.p), 2)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   round(colSums(L.p), 2)
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The average first passage time between every pair of nodes in the network \(m_{ij}\) expressed in terms of the entries of \(\mathbf{L}^+\) and the degree vector \(\mathbf{d}\), is then given by (Fouss, Saerens, and Shimbo 2016, 158):

\[ m_{ij} = \sum_{k=1}^n \left(l^+_{jj} - l^+_{ij} + l^+_{ik} - l^+_{kj}\right)d_k \tag{4}\]

Where \(d_k\) is the degree of node \(k\).

So let’s try it out:

   nodes <- 1:nrow(A) #node id vector
   d <- rowSums(A) #degree vector
   m <- matrix(0, nrow(A), nrow(A)) #initializing M to all zeros
   for (i in nodes) { 
      for (j in nodes) { 
         for (k in nodes) { 
            m[i,j] <- m[i,j] + ((L.p[j,j] - L.p[i,j] + L.p[i,k] - L.p[k,j])*d[k])
            }
         }
      }
   round(m[1:10, 1:10], 1)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]  0.0 15.0 28.5 21.1 17.7 25.4 59.7 29.9 28.5  21.2
 [2,] 17.0  0.0 29.0 22.3 16.1 23.2 59.0 33.8 28.2  21.8
 [3,] 19.1 17.7  0.0 25.7 16.6 25.4 57.6 34.1 27.4  18.6
 [4,] 15.1 14.4 29.1  0.0 18.0 25.6 59.9 28.6 28.7  20.9
 [5,] 19.4 15.8 27.6 25.7  0.0 24.8 58.4 34.5 25.1  19.9
 [6,] 19.3 15.0 28.6 25.3 16.9  0.0 50.5 35.0 24.2  21.8
 [7,] 19.9 17.3 27.1 26.1 17.0 16.9  0.0 35.2 27.1  22.2
 [8,] 14.9 16.8 28.3 19.5 17.8 26.1 59.9  0.0 28.0  17.7
 [9,] 19.4 17.2 27.7 25.6 14.4 21.3 57.8 34.1  0.0  18.4
[10,] 18.7 17.3 25.4 24.3 15.7 25.4 59.4 30.2 24.9   0.0

Which produces exactly the same results we obtained earlier via iteration.

Computing the AFPT Using a Big Matrix Formula

There is an even more straightforward way to compute the average first passage time that does not involve looping at all, but simple matrix operations. We can do that because we can express Equation 4 in matrix form like this:

\[ \mathbf{M} = \left[vol(A)\mathbf{e}(\mathbf{l}^+_{d})^T - vol(A)\mathbf{L}^+\right] + \left[(\mathbf{L}^+\mathbf{d})\mathbf{e}^T -\mathbf{e}(\mathbf{d}^T\mathbf{L}^+)\right] \]

Now, this formula looks long and monstrous but it is composed of simple quantities we know and love. We have already been introduced to \(\mathbf{L}^+\), while \(\mathbf{e}\) is a column vector of ones with as many rows as the number of nodes in the graph, \(\mathbf{d}\) is a vector of the same length as the number of nodes in the graph containing the degrees of each node at each position, \(vol(A) = \sum_{i} d_{i}\) is just the sum of the non-zero entries in the adjacency matrix, and \(\mathbf{l}^+_{d}\) is a vector containing the diagonal entries of \(\mathbf{L}^+\) at each position.

The following R code constructs \(\mathbf{M}\) step by step:

   e <- matrix(1, nrow(A), 1) #column vector of all ones
   d <- rowSums(A) #degree vector 
   l <- diag(L.p) #diagonal vector of Laplacian inverse
   vol.A <- sum(d) #graph volume 
   M <- vol.A * (e %*% t(l))
   M <- M - (vol.A * L.p)
   M <- M + (L.p %*% d) %*% t(e)
   M <- M - (e %*% (t(d) %*% L.p))

And now for the big reveal:

   round(M, 1)[1:10, 1:10]
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]  0.0 15.0 28.5 21.1 17.7 25.4 59.7 29.9 28.5  21.2
 [2,] 17.0  0.0 29.0 22.3 16.1 23.2 59.0 33.8 28.2  21.8
 [3,] 19.1 17.7  0.0 25.7 16.6 25.4 57.6 34.1 27.4  18.6
 [4,] 15.1 14.4 29.1  0.0 18.0 25.6 59.9 28.6 28.7  20.9
 [5,] 19.4 15.8 27.6 25.7  0.0 24.8 58.4 34.5 25.1  19.9
 [6,] 19.3 15.0 28.6 25.3 16.9  0.0 50.5 35.0 24.2  21.8
 [7,] 19.9 17.3 27.1 26.1 17.0 16.9  0.0 35.2 27.1  22.2
 [8,] 14.9 16.8 28.3 19.5 17.8 26.1 59.9  0.0 28.0  17.7
 [9,] 19.4 17.2 27.7 25.6 14.4 21.3 57.8 34.1  0.0  18.4
[10,] 18.7 17.3 25.4 24.3 15.7 25.4 59.4 30.2 24.9   0.0

Which once again, gives us the results we seek.

As we saw earlier, the interpretation of the entries of the \(\mathbf{M}\) matrix is also straightforward. For instance, if we start a message from node 2, we should expect it to take an average of \(m_{2, 6} =\) 23.2 steps to get to node 6 by following random walks in the network.

The Average Commute Distance

Once we have the average first passage time, we can compute another important quantity called the average commute distance between two nodes \(i\) and \(j\) (\(n_{ij}\)). This is the number of steps it takes for a random walker to start at node \(i\), reach another specific node \(j\) and then get back to the original node \(i\) (hence commuting, like going from home to work and then back from work to home again).

It turns out that \(n_{ij}\) is pretty simple to compute, once we know the average first passage time between every pair of nodes \(m_{ij}\), since it is given by:

\[ n_{ij} = m_{ij} + m_{ji} \]

So the entries of the Average Commute Distance matrix \(\mathbf{N}\) are just the entries of \(\mathbf{M}\) on the upper triangle added to the corresponding entries in the lower triangle. In R we can obtain that by just adding \(\mathbf{M}\) to its transpose \(\mathbf{M}^T\):

   N <- M + t(M)
   round(N[1:10, 1:10], 1)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]  0.0 32.0 47.6 36.2 37.1 44.7 79.6 44.8 47.9  39.9
 [2,] 32.0  0.0 46.7 36.7 31.9 38.2 76.3 50.6 45.5  39.0
 [3,] 47.6 46.7  0.0 54.8 44.2 54.0 84.7 62.4 55.0  44.0
 [4,] 36.2 36.7 54.8  0.0 43.7 50.9 86.0 48.1 54.4  45.2
 [5,] 37.1 31.9 44.2 43.7  0.0 41.8 75.4 52.3 39.5  35.6
 [6,] 44.7 38.2 54.0 50.9 41.8  0.0 67.4 61.1 45.5  47.2
 [7,] 79.6 76.3 84.7 86.0 75.4 67.4  0.0 95.1 84.9  81.6
 [8,] 44.8 50.6 62.4 48.1 52.3 61.1 95.1  0.0 62.1  47.9
 [9,] 47.9 45.5 55.0 54.4 39.5 45.5 84.9 62.1  0.0  43.3
[10,] 39.9 39.0 44.0 45.2 35.6 47.2 81.6 47.9 43.3   0.0

Note that the average commute distance matrix entries are symmetric, which makes sense, for it takes as long for something to go from to you to me and back to you as from me to you and back to me. The average commute distance between pairs of nodes thus functions as a similarity metric; the lower the average commute distance, the closer or more similar two nodes are in the network.

Of course, we can also use math to compute \(\mathbf{N}\) directly from the entries of \(\mathbf{L}^+\) using a recombination of the same ingredients we used before to compute the average first passage time. It goes like this:

\[ n_{ij} = vol(A) \left(l^+_{ii} + l^+_{jj} + 2l^+_{ij}\right) \]

Which in (more complicated) matrix form looks like:

\[ \mathbf{N} = vol(A)\left[\mathbf{l}^+_{d}\mathbf{e}^T + \mathbf{e}\mathbf{l}^+_{d} - 2\mathbf{L}^+\right] \]

But which is actually a much less monstrous and simpler expression than before.

The following R code constructs the \(\mathbf{N}\) matrix of average commute distances between each pair of nodes in the network step by step:

   N <- l %*% t(e)
   N <- N + (e %*% l)
   N <- N - (2 * L.p)
   N <- vol.A * N
   round(N, 1)[1:10, 1:10]
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
 [1,]  0.0 32.0 47.6 36.2 37.1 44.7 79.6 44.8 47.9  39.9
 [2,] 32.0  0.0 46.7 36.7 31.9 38.2 76.3 50.6 45.5  39.0
 [3,] 47.6 46.7  0.0 54.8 44.2 54.0 84.7 62.4 55.0  44.0
 [4,] 36.2 36.7 54.8  0.0 43.7 50.9 86.0 48.1 54.4  45.2
 [5,] 37.1 31.9 44.2 43.7  0.0 41.8 75.4 52.3 39.5  35.6
 [6,] 44.7 38.2 54.0 50.9 41.8  0.0 67.4 61.1 45.5  47.2
 [7,] 79.6 76.3 84.7 86.0 75.4 67.4  0.0 95.1 84.9  81.6
 [8,] 44.8 50.6 62.4 48.1 52.3 61.1 95.1  0.0 62.1  47.9
 [9,] 47.9 45.5 55.0 54.4 39.5 45.5 84.9 62.1  0.0  43.3
[10,] 39.9 39.0 44.0 45.2 35.6 47.2 81.6 47.9 43.3   0.0

Which as you can see, gives us the results we seek.

The interpretation of the entries of the \(\mathbf{N}\) matrix is also straightforward. For instance, if a message starts at node 4, it would take an average of 54.4 steps for it to get to node 9 and then back to the original node 4 by following a random walk in the network governed by the node-to-node probabilities stored in the transition matrix \(\mathbf{P}\).

Interestingly, we can obtain the average commute time distance between any pair nodes yet another way. For instance from the above matrix, we know the average commute time distances between nodes 3 and 8 is 62.38.

Let’s construct two vectors full of zeros of the same length as the number of nodes in the graph, except they have a one in the third and eighth spot respectively:

   n <- ncol(A)
   i <- rep(0, n)
   j <- rep(0, n)
   i[3] <- 1
   j[8] <- 1

Fouss, Pirotte, and Saerens (2004) show that the average commute time distance between nodes 3 and 8 is also given by:

   round(sum(A) * (t((i - j)) %*% L.p %*% (i - j)), 2)
      [,1]
[1,] 62.38

Neat!

References

Fouss, François, Alain Pirotte, and Marco Saerens. 2004. “The Application of New Concepts of Dissimilarities Between Nodes of a Graph to Collaborative Filtering.” In Proc. Workshop on Statistical Approaches for Web Mining, 26–37.
Fouss, François, Marco Saerens, and Masashi Shimbo. 2016. Algorithms and Models for Network Data and Link Analysis. Cambridge University Press.