The Centrality Cube

Brandes et al. (2016) discuss the centrality “cube,” an interesting and intuitive way to understand the way betweenness centrality works, as well as the dual connection between closeness and betweenness.

Let us illustrate using a simple example. We begin by creating an Erdos-Renyi graph with eight nodes and connection probability \(p = 0.5\):

Code

library(igraph)
set.seed(345)
g <- sample_gnp(8, 0.5)

The resulting graph looks like:

The basic innovation behind the centrality cube is to store the intermediation information among every node triplet in the graph \(s\), \(r\), \(b\) (standing for “sender,” receiver,” and “broker”) in a three dimensional array rather than the usual two dimensional matrix.

The three dimensional array can be thought of as a “cube” by stacking multiple reachability matrices between every pair of nodes \(s\) and \(r\) along a three dimensional dimension \(b\). So each “b-slice” of the cube will contain the number of times node \(b\) stands in a shortest path between \(s\) and \(r\) divided by the total number of paths between \(s\) and \(r\) which as you recall from our discussion of betweenness centrality computes the pairwise dependence (Brandes 2008) of \(s\) and \(r\) with respect to \(b\).

Let’s see how that works.

1 Building the Cube

We begin by writing a simple user-defined function to compute the number of shortest paths between every pair of nodes in the graph. We need to do this because the cube is built on the ratio of the number of paths that go through a broker node \(b\) to the total number of paths between \(s\) and \(r\):

Code

nsp <- function(w) {
  n <- vcount(w) # number of nodes in the graph
  nodes <- 1:n # vector of node ids
  P <- matrix(1, nrow = n, ncol = n) # initializing the matrix of paths with ones
  for (s in nodes) {
    r <- setdiff(nodes, c(s, as.numeric(neighbors(w, s)))) # non-neighbors in upper triangle
    if (length(r) > 0) {
      P[s, r] <- 0 # Set sr to 0 for disconnected nodes
      vpaths <- all_shortest_paths(w, from = s, to = r)$vpaths
      if (length(vpaths) > 0) {
        dests <- unlist(lapply(vpaths, function(x) as.numeric(x[length(x)])))
        counts <- table(dests)
        P[s, as.numeric(names(counts))] <- as.numeric(counts)
      }
    }
  }
  P[lower.tri(P)] <- t(P)[lower.tri(P)] # copying upper triangle into lower triangle
  return(P)
}

The function is called nsp and takes a graph w as input and returns and matrix called \(\mathbf{P}\) with entries \(p_{sr}\) equal to the total number of shortest paths between \(s\) and \(r\). This is done in four steps:

First, we initialize an \(n \times n\) matrix of ones called \(P\) in line 4 to the all ones matrix. We will update the entries of this matrix with the actual number of shortest paths between each pair of nodes in the graph.
Next, we loop through each node \(s\) in the graph starting in line 5. For each node \(s\), we find the set of target nodes that are not neighbors of \(s\) using the neighbors function in igraph and the setdiff function in base R in line 6. We then check if the set of receiver nodes is not empty using the base R length function in line 7, setting the initial value of \(p_{sr}\) to zero for all non-neighboring pairs in line 8.
Next, we compute the shortest paths from node \(s\) to the receiver nodes using the all_shortest_paths function in igraph in line 9. This function returns a list of paths in the object vpaths, where each path is represented as a vector of node ids. We check if this list is empty or not in line 10. If the list is not empty, we extract the destination nodes from the paths using lapply and unlist in line 11, and then count the number of paths to each destination node using the base R function table in line 12.
Finally, we update the corresponding entries in the \(P\) matrix with the counts of paths in line 13. After going through all nodes, we copy the upper triangle of the \(P\) matrix to the lower triangle to ensure symmetry in line 17, and return the final \(P\) matrix.

Let’s see the function in action:

Code

nsp(g)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    1    1    1    1    1    2    1    1
[2,]    1    1    1    2    1    1    2    1
[3,]    1    1    1    1    1    1    1    3
[4,]    1    2    1    1    1    1    2    1
[5,]    1    1    1    1    1    1    1    3
[6,]    2    1    1    1    1    1    1    1
[7,]    1    2    1    2    1    1    1    1
[8,]    1    1    3    1    3    1    1    1

Great! Looks like it works.

Now we are ready to write a user defined function to build the cube. Here’s a working example:

Code

cube <- function(w) {
  n <- vcount(w) # number of nodes
  c <- array(0, dim = c(n, n, n)) # initializing the cube with zeros
  S <- nsp(w) # computing number of shortest paths between nodes
  for (s in 1:n) {
    neigh_s <- neighbors(w, s) # pre-fetch neighbors
    for (r in 1:n) {
      if (s != r && !(r %in% neigh_s)) {
        p.sr <- all_shortest_paths(w, s, r)$vpaths
        brokers <- unlist(lapply(p.sr, function(x) as.numeric(x[-c(1, length(x))])))
        if (length(brokers) > 0) {
          b_counts <- table(brokers)
          b_idx <- as.numeric(names(b_counts))
          c[s, r, b_idx] <- as.numeric(b_counts) / S[s, r]
        }
      }
    }
  }
  c[is.na(c)] <- 0
  return(c)
}

The function cube takes a graph w as input and returns a three-dimensional array c with dimensions \(n \times n \times n\) where \(n\) is the number of nodes in the graph. The entries of the array correspond to the pairwise dependence of sender node \(s\) and receiver node \(r\) on broker node \(b\) for all triplets of nodes in the graph.

The function works as follows:

First, we initialize the cube as a three-dimensional array of zeros with dimensions \(n \times n \times n\) in line 3. In line 4, we also compute the matrix \(S\) containing the number of shortest paths between every pair of nodes in the graph using the nsp function defined earlier.
Next, we loop through each sender node \(s\) in line 5. For each sender node, we pre-fetch its neighbors to speed up the check for non-neighboring receiver nodes in line 6. We then loop through each potential receiver node \(r\) in line 7. We check if \(s\) and \(r\) are different and not neighbors in line 8. If they are valid sender-receiver pairs, we compute all shortest paths from \(s\) to \(r\) using the all_shortest_paths function in line 9. We then extract the broker nodes from these paths by removing the source and destination nodes using lapply and unlist in line 10. If there are any broker nodes, we count how many times each broker node appears in the paths using table in line 11, and then update the corresponding entries in the cube with the ratio of the counts to the total number of shortest paths \(S[s, r]\) in line 12.
Finally, we replace any NA values in the cube with zeros in line 13 (this can happen if there are no paths between \(s\) and \(r\)), and return the cube in line 14.

2 Exploring the Cube

Once we have our array, we can create all kinds of interesting sub-matrices containing the intermediation information in the graph by summing rows and columns of the array along different dimensions.

First, let us see what’s in the cube. We can query specific two-dimensional sub-matrices using an extension of the usual format for querying matrices in R for three-dimensional arrays. For instance this:

Code

srb <- cube(g)
round(srb[, , 2], 2)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]  0.0    0 0.00    0 0.00  0.5    0 0.00
[2,]  0.0    0 0.00    0 0.00  0.0    0 0.00
[3,]  0.0    0 0.00    0 0.00  0.0    0 0.33
[4,]  0.0    0 0.00    0 0.00  0.0    0 0.00
[5,]  0.0    0 0.00    0 0.00  0.0    0 0.33
[6,]  0.5    0 0.00    0 0.00  0.0    0 1.00
[7,]  0.0    0 0.00    0 0.00  0.0    0 0.00
[8,]  0.0    0 0.33    0 0.33  1.0    0 0.00

Creates a three-dimensional matrix of pairwise betweenness probabilities and assigns it to the srb object in line 1, and looks at the \(s_{} \times r_{} \times b_2\) entry in line 2.

Each entry in the matrix is the probability that node 2 stands on a shortest path between the row sender and the column receiver node. For instance, the 0.33 in the entry corresponding to row 5 and column 8 tells us that node 2 stands in one third of the shortest paths between nodes 5 and 8 (there are 3 distinct shortest paths between 5 and 8).

Because each sub-matrix in the cube is a matrix, we can do the usual matrix operations on them. For instance, let’s take the row sums of the \(s\) to \(r\) matrix corresponding to node 3 as the broker. This can be done like this:

Code

d.3 <- rowSums(srb[, , 3])
names(d.3) <- 1:vcount(g)
d.3

  1   2   3   4   5   6   7   8 
1.5 2.0 0.0 3.0 6.0 3.5 3.0 1.0

As Brandes et al. (2016), note this vector gives us the one-sided dependence of each node in the graph on node 3. Obviously node 3 doesn’t depend on itself so there is a zero on the third spot in the vector. As is clear from the plot, node 5 is the most dependent on 3 for intermediation with the rest of the nodes in the graph.

We can also pick a particular sender and receiver node and sum all their dyadic entries in the cube across the third (broker) dimension:

Code

sum(srb[1, 6, ])

[1] 2

This number is equivalent to the geodesic distance between the nodes minus one:

Code

distances(g)[1, 6] - 1

[1] 2

3 Betweenness and Closeness in the Cube

The betweenness centrality of each node is encoded in the cube, because we already computed the main ratio that the measure depends on, namely, the pairwise dependence of each pair of nodes in the graph on any given node. For instance, let’s look at the matrix obtained from taking the slice of cube that corresponds to node 3 as a broker:

Code

srb[, , 3]

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]  0.0  0.0    0  0.0    1  0.5  0.0    0
[2,]  0.0  0.0    0  0.5    1  0.0  0.5    0
[3,]  0.0  0.0    0  0.0    0  0.0  0.0    0
[4,]  0.0  0.5    0  0.0    1  1.0  0.5    0
[5,]  1.0  1.0    0  1.0    0  1.0  1.0    1
[6,]  0.5  0.0    0  1.0    1  0.0  1.0    0
[7,]  0.0  0.5    0  0.5    1  1.0  0.0    0
[8,]  0.0  0.0    0  0.0    1  0.0  0.0    0

The sum of the all the cells in this matrix (divided by two) correspond to node 3’s betweenness centrality:

Code

sum(srb[, , 3]) / 2

[1] 10

Code

betweenness(g)[3]

[1] 10

So to get each node’s betweenness we just can just sum up the entries in each of the cube’s sub-matrices:

Code

b.cube <- round(colSums(srb, dims = 2) / 2, 2)
b.igraph <- round(betweenness(g), 2)
names(b.cube) <- 1:vcount(g)
names(b.igraph) <- 1:vcount(g)
b.cube

    1     2     3     4     5     6     7     8 
 0.00  2.17 10.00  0.67  0.00  0.00  3.17  4.00

Code

b.igraph

    1     2     3     4     5     6     7     8 
 0.00  2.17 10.00  0.67  0.00  0.00  3.17  4.00

Note the neat trick of using the argument dims = 2 in the usual colSums command. This tells colSums that we are dealing with a three dimensional matrix, and that what we want is the sum of the columns across the cube’s third dimension (the brokers). Note also that we divide the cube betweenness by two because we are summing identical entries across the upper and lower triangle of the symmetric dyadic brokerage matrices inside the cube (not surprisingly, node 3 is the top betweenness centrality node).

As Brandes et al. (2016) point out, using the cube info, we can build a matrix of one-sided dependencies between each pair of nodes. In this matrix, the rows correspond to a sender (or receiver) node, the columns to a broker node and the \(sb^{th}\) entry contains the sum of the proportion of paths containing the broker nodes that starts with the sender node and end with some other node in the graph.

Here’s a function that uses the cube info to build the dependency matrix that Brandes et al. (2016) talk about using the cube as input:

Code

dep.ij <- function(c) {
  n <- nrow(c)
  dep.ij <- rowSums(c[, , 1])
  for (i in 2:n) {
    dep.ij <- cbind(dep.ij, rowSums(c[, , i]))
  }
  rownames(dep.ij) <- 1:n
  colnames(dep.ij) <- 1:n
  return(dep.ij)
}

This function just takes the various vectors formed by the row sums of the sender-receiver matrix across each value of the third dimension (which is just each node in the graph when playing the broker role). It then returns a regular old \(n \times n\) containing the info.

Here’s the result when applied to our little example:

Code

library(kableExtra)
kbl(round(dep.ij(srb), 2),
  format = "html", align = "c", row.names = TRUE,
  caption = "Dependence Matrix."
) %>%
  column_spec(1, bold = TRUE) %>%
  kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))

Dependence Matrix.
	2	3	4	7	8
1	0.50	1.5	0.00	2.50	2.5
2	0.00	2.0	0.00	0.00	2.0
3	0.33	0.0	0.33	1.33	0.0
4	0.00	3.0	0.00	0.00	2.0
5	0.33	6.0	0.33	1.33	0.0
6	1.50	3.5	0.00	0.50	0.5
7	0.00	3.0	0.00	0.00	1.0
8	1.67	1.0	0.67	0.67	0.0

Note that this valued matrix of one-sided dependencies is asymmetric, because the one-dependence of \(i\) on \(j\) does not have to be the same as the one-sided dependency of \(j\) on \(i\). Take for instance, node 3. Every node in the graph depends on node 3 for access to other nodes, but node 3 does not depend on nodes 1, 5, 6, or 8.

Interestingly, as Brandes et al. (2016) also show, the betweenness centrality also can be calculated from the dependency matrix! All we need to do is compute the column sums, equivalent to in-degree in the directed dependence network:

Code

round(colSums(dep.ij(srb)) / 2, 2)

    1     2     3     4     5     6     7     8 
 0.00  2.17 10.00  0.67  0.00  0.00  3.17  4.00

Even more interestingly, closeness centrality is also in the dependence matrix! It is given by the outdegree of each actor in the directed dependence network, corresponding to the row sums of the matrix (shifted by a constant given by \(n-1\)).

Code

c.c <- rowSums(distances(g))
c.d <- rowSums(dep.ij(srb)) + (vcount(g) - 1)
names(c.c) <- 1:vcount(g)
names(c.d) <- 1:vcount(g)
round(1 / c.c, 3)

    1     2     3     4     5     6     7     8 
0.071 0.091 0.111 0.083 0.067 0.077 0.091 0.091

Code

round(1 / c.d, 3)

    1     2     3     4     5     6     7     8 
0.071 0.091 0.111 0.083 0.067 0.077 0.091 0.091

Here we see that node 3 is also the top in closeness, followed closely (pun intended) by nodes 2, 7, and 8. This makes sense because an actor with high closeness is one that has low dependence on key nodes to be able to reach others.

Of course, closeness is also in the cube because of the mathematical relationship we saw earlier between the sum of entries between senders and receivers across brokers in the cube and the geodesic distance.

For instance, let’s get the matrix corresponding to node 3’s role as sender across all brokers and receivers:

Code

round(srb[3, , ], 2)

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,]    0 0.00    0 0.00    0    0 1.00    0
[2,]    0 0.00    0 0.00    0    0 0.00    0
[3,]    0 0.00    0 0.00    0    0 0.00    0
[4,]    0 0.00    0 0.00    0    0 0.00    0
[5,]    0 0.00    0 0.00    0    0 0.00    0
[6,]    0 0.00    0 0.00    0    0 0.00    0
[7,]    0 0.00    0 0.00    0    0 0.00    0
[8,]    0 0.33    0 0.33    0    0 0.33    0

The entries of this matrix give us the probability that node 3 is the sender, whenever the row node is the is the broker (an inner node in the path) and the column node is the receiver.

For instance, the value 0.3 in row 8 and column 1 tells us that node 3 is the sender node in one third of the paths that end in node 1 and feature node 8 as a broker.

Interestingly, the sums of the entries in this matrix are equivalent to the sum of the geodesic distances between node 3 and every other node in the graph shifted by a constant (\(n- 1\)):

Code

sum(srb[3, , ]) + (vcount(g) - 1)

[1] 9

Code

sum(distances(g)[3, ])

[1] 9

So the closeness centrality can be computed from the cube as follows:

Code

c <- 0
n <- vcount(g)
for (i in 1:n) {
  c[i] <- 1 / (sum(srb[i, , ]) + (n - 1))
}
round(c, 3)

[1] 0.071 0.091 0.111 0.083 0.067 0.077 0.091 0.091

Which is the same as:

Code

round(closeness(g), 3)

[1] 0.071 0.091 0.111 0.083 0.067 0.077 0.091 0.091

References

Brandes, Ulrik. 2008. “On Variants of Shortest-Path Betweenness Centrality and Their Generic Computation.” Social Networks 30 (2): 136–45.

Brandes, Ulrik, Stephen P Borgatti, and Linton C Freeman. 2016. “Maintaining the Duality of Closeness and Betweenness Centrality.” Social Networks 44: 153–59.