Centrality

In these lecture notes we will go through the basic centrality metrics. Particularly, the “big three” according to Freeman (1979), namely, degree, closeness (in two flavors) and betweenness.

We first load our trusty Pulp Fiction data set from the networkdata package, which is an undirected graph of character scene co-appearances in the film:

   library(networkdata)
   library(igraph)
   g <- movie_559

Degree Centrality

Degree centrality is the simplest and most straightforward measure. In fact, we are already computed in the lecture notes on basic network statistics since it is the same as obtaining the graph’s degree sequence. So the igraph function degree would do it as we already saw.

Here we follow a different approach using the row (or column) sums of the graph’s adjacency matrix:

   A <- as_adjacency_matrix(g)
   A <- as.matrix(A)
   rowSums(A)
          BRETT           BUDDY           BUTCH      CAPT KOONS     ED SULLIVAN 
              7               2              17               5               2 
   ENGLISH DAVE       ESMARELDA        FABIENNE      FOURTH MAN       GAWKER #2 
              4               1               3               2               3 
    HONEY BUNNY          JIMMIE            JODY           JULES           LANCE 
              8               3               4              16               4 
        MANAGER       MARSELLUS          MARVIN         MAYNARD             MIA 
              5              10               6               3              11 
         MOTHER          PATRON      PEDESTRIAN        PREACHER         PUMPKIN 
              5               5               3               3               8 
         RAQUEL           ROGER SPORTSCASTER #1 SPORTSCASTER #2        THE GIMP 
              3               6               2               1               2 
       THE WOLF         VINCENT        WAITRESS         WINSTON           WOMAN 
              3              25               4               3               5 
      YOUNG MAN     YOUNG WOMAN             ZED 
              4               4               2 

The igraph function as_adjancency_matrix doesn’t quite return a regular R matrix object, so we have to further coerce the resulting object into a numerical matrix containing zeroes and ones using the as.matrix function in line 2. Then we can apply the native rowSums function to obtain each node’s degree. Note that this is same output we got using the degree function before.

Indegree and Outdegree

The movie network is based on the relationship of co-appearance in a scene which by nature lacks any natural directionality (it’s a symmetric relation) and can therefore be represented in an undirected graph. The concepts of in and outdegree, by contrast, are only applicable to directed relations. So to illustrate them, we need to switch to a different source of data.

We pick an advice network which is a classical directed kind of (asymmetric) relation. I can give advice to you, but that doesn’t necessarily mean you can give advice to me. The networkdata package contains one such data set collected in the late 80s early 1990s in a New England law firm (see the description here), called law_advice:

   d.g <- law_advice
   V(d.g)
+ 71/71 vertices, from d1a9da7:
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
   vertex_attr(d.g)
$status
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
[39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

$gender
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 2
[39] 2 1 1 1 2 2 1 2 1 2 1 1 2 1 1 1 1 1 2 1 2 2 2 1 1 2 1 1 2 1 2 1 2

$office
 [1] 1 1 2 1 2 2 2 1 1 1 1 1 1 2 3 1 1 2 1 1 1 1 1 1 2 1 1 2 1 2 2 2 2 1 2 1 3 1
[39] 1 1 1 1 1 3 1 2 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1

$seniority
 [1] 31 32 13 31 31 29 29 28 25 25 23 24 22  1 21 20 23 18 19 19 17  9 16 15 15
[26] 15 13 11 10  7  8  8  8  8  8  5  5  7  6  6  5  4  5  5  3  3  3  1  4  3
[51]  4  4 10  3  3  3  3  3  2  2  2  2  2  2  2  1  1  1  1  1  1

$age
 [1] 64 62 67 59 59 55 63 53 53 53 50 52 57 56 48 46 50 45 46 49 43 49 45 44 43
[26] 41 47 38 38 39 34 33 37 36 33 43 44 53 37 34 31 31 47 53 38 42 38 35 36 31
[51] 29 29 38 29 34 38 33 33 30 31 34 32 29 45 28 43 35 26 38 31 26

$practice
 [1] 1 2 1 2 1 1 2 1 2 2 1 2 1 2 2 2 2 1 2 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 2 1 2 1
[39] 1 1 1 2 1 2 2 2 1 2 1 2 1 1 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 1

$law_school
 [1] 1 1 1 3 2 1 3 3 1 3 1 2 2 1 3 1 1 2 1 1 2 3 2 2 2 3 1 2 3 3 2 3 3 2 3 3 3 2
[39] 1 1 2 2 2 1 3 2 3 3 2 2 3 3 3 3 3 2 2 3 2 2 3 2 2 2 3 3 2 3 3 2 2

We can see that the graph has 71 vertices, and that there are various attributes associated with each vertex, like gender, age, seniority, status in the law firm, etc. We can query those attributes using the igraph function vertex_attr, which takes the graph object as input.

Subsetting the Graph According to a Node Attribute

To keep things manageable, we will restrict our analysis to partners. To do that we need to select the subgraph that only includes the vertices with value of 1 in the “status” vertex attribute. From the data description, we know the first 36 nodes (with value of 1 in the status attribute) are the law firm’s partners (the rest are associates). In igraph we can do this as using the subgraph function:

   d.g <- subgraph(d.g, 1:36)
   V(d.g)
+ 36/36 vertices, from d2f9459:
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36
   V(d.g)$status
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The first line just tells igraph to generate the subgraph containing the first 36 nodes (the partners). The subgraph function thus takes two main inputs: The graph object, and then a vector of node ids (or node labels) telling the function which nodes to select to create the node-induced subgraph.

Of course we already knew from the data description that the first 36 nodes where the partners. But let’s say we have a large data set and we don’t know which nodes are the partners. A smarter way of selecting a subgraph based on a node attribute is as follows:

   partners <- which(V(law_advice)$status == 1)
   d.g <- subgraph(law_advice, partners)
   V(d.g)
+ 36/36 vertices, from d3010d4:
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36

The first line using the native R vector function which allowing us to subset a vector based on a logical condition. The function takes a vector followed by a logical condition as input, and returns the position of the vector elements that meet that condition. In this case, we took the vector of values for the attribute of status and selected the node ids where status is equal to 1. We then fed that vector to the subgraph function in line 2.

We could do this with any other attribute:

   older <- which(V(law_advice)$age > 50)
   older
 [1]  1  2  3  4  5  6  7  8  9 10 12 13 14 38 44
   og <- subgraph(law_advice, older)
   V(og)
+ 15/15 vertices, from d309ed5:
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

Here we selected the subgraph (called “og”, get it, get it) formed by the subset of nodes over the age of 50 at the firm. The values of the vector older tell us which of the 71 members meet the relevant condition.

Computing in and outdegree

OK, going back to the partners subgraph, we can now create our (asymmetric) adjacency matrix and compute the row and column sums:

   d.A <- as_adjacency_matrix(d.g)
   d.A <- as.matrix(d.A)
   rowSums(d.A)
 [1]  3  7  7 17  4  0  4  2  3  7  5 18 11 13 10 19 17  5 21 10  9 12  9 16  8
[26] 22 18 22 13 15 16  9 15  6 15  7
   colSums(d.A)
 [1] 18 17  8 14 10 17  4  8 10  6 11 14 14 12 15 13 21  7  7 17 11 10  2 14  7
[26] 20  2 14 12 11  8 13  2 16  8  2

Note that in contrast to the undirected case the row and column sums give you two different sets of numbers. The row sums provide the directed graph’s outdegree set (number of outgoing links incident to each node), and the column sums provide the graph’s indegree set (number of incoming links incident to each node). So if you are high in the first vector, you are an advice giver (perhaps indicating informal status or experience) and if you are high in the second you are advice taker.

Of course igraph has a dedicated function for this, which is just our old friend degree with an extra option mode, indicating whether you want the in or outdegrees:

   d.o <- degree(d.g, mode = "out")
   d.i <- degree(d.g, mode = "in")
   d.o
 [1]  3  7  7 17  4  0  4  2  3  7  5 18 11 13 10 19 17  5 21 10  9 12  9 16  8
[26] 22 18 22 13 15 16  9 15  6 15  7
   d.i
 [1] 18 17  8 14 10 17  4  8 10  6 11 14 14 12 15 13 21  7  7 17 11 10  2 14  7
[26] 20  2 14 12 11  8 13  2 16  8  2

Note that the graph attributes are just vectors of values, and can be accessed from the graph object using the $ operator attached to the V() function as we did above.

So if we wanted to figure out the correlation between some vertex attribute and in or out degree centrality, all we need to do is correlate the two vectors:

   r <- cor(d.o, V(d.g)$age)
   round(r, 2)
[1] -0.43

Which tells us that at least in this case, younger partners are more sought after as sources of advice than older partners.

Closeness Centrality

Recall that the closeness centrality is defined as the inverse of the sum of the lengths of shortest paths from each node to every other node. That means that to compute it, we first need to calculate the geodesic distance matrix. This is matrix in which each entry \(g_{ij}\) records the length of the shortest path(s) between row node \(i\) and column node \(j\). Then, we sum the rows (or columns) of this symmetric matrix and then we obtain the inverse to get the closeness of each node:

   S <- distances(g) #length of shortest paths matrix
   d.sum <- rowSums(S)
   close1 <- round(1/d.sum, 4)
   close1
          BRETT           BUDDY           BUTCH      CAPT KOONS     ED SULLIVAN 
         0.0125          0.0108          0.0143          0.0125          0.0108 
   ENGLISH DAVE       ESMARELDA        FABIENNE      FOURTH MAN       GAWKER #2 
         0.0105          0.0070          0.0096          0.0104          0.0097 
    HONEY BUNNY          JIMMIE            JODY           JULES           LANCE 
         0.0114          0.0093          0.0093          0.0132          0.0093 
        MANAGER       MARSELLUS          MARVIN         MAYNARD             MIA 
         0.0111          0.0104          0.0103          0.0098          0.0115 
         MOTHER          PATRON      PEDESTRIAN        PREACHER         PUMPKIN 
         0.0125          0.0111          0.0097          0.0097          0.0114 
         RAQUEL           ROGER SPORTSCASTER #1 SPORTSCASTER #2        THE GIMP 
         0.0105          0.0125          0.0096          0.0057          0.0073 
       THE WOLF         VINCENT        WAITRESS         WINSTON           WOMAN 
         0.0056          0.0139          0.0083          0.0105          0.0125 
      YOUNG MAN     YOUNG WOMAN             ZED 
         0.0083          0.0083          0.0073 

Of course, we could have just used the available function in igraph and computed the closeness centrality directly from the graph object using the function closeness:

   close2 <- round(closeness(g), 4)
   close2
          BRETT           BUDDY           BUTCH      CAPT KOONS     ED SULLIVAN 
         0.0125          0.0108          0.0143          0.0125          0.0108 
   ENGLISH DAVE       ESMARELDA        FABIENNE      FOURTH MAN       GAWKER #2 
         0.0105          0.0070          0.0096          0.0104          0.0097 
    HONEY BUNNY          JIMMIE            JODY           JULES           LANCE 
         0.0114          0.0093          0.0093          0.0132          0.0093 
        MANAGER       MARSELLUS          MARVIN         MAYNARD             MIA 
         0.0111          0.0104          0.0103          0.0098          0.0115 
         MOTHER          PATRON      PEDESTRIAN        PREACHER         PUMPKIN 
         0.0125          0.0111          0.0097          0.0097          0.0114 
         RAQUEL           ROGER SPORTSCASTER #1 SPORTSCASTER #2        THE GIMP 
         0.0105          0.0125          0.0096          0.0057          0.0073 
       THE WOLF         VINCENT        WAITRESS         WINSTON           WOMAN 
         0.0056          0.0139          0.0083          0.0105          0.0125 
      YOUNG MAN     YOUNG WOMAN             ZED 
         0.0083          0.0083          0.0073 

Once we have the closeness centrality values, we are interested in who are the top nodes. The following code creates a table with the top five:

   library(kableExtra)
   close2 <- sort(close2, decreasing = TRUE)
   close2 <- data.frame(close2[1:5])
   kbl(close2, format = "pipe", align = c("l", "c"),
       col.names = c("Character", "Closeness"),
       caption = "Top Five Closeness Characters in Pulp Fiction Network.") %>% 
   kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))
Top Five Closeness Characters in Pulp Fiction Network.
Character Closeness
BUTCH 0.0143
VINCENT 0.0139
JULES 0.0132
BRETT 0.0125
CAPT KOONS 0.0125

It makes sense that the three main characters are also the ones that are at closest distances from everyone else!

Edge Closeness

Bröhl and Lehnertz (2022) define the closeness of an edge as a function of the closeness of the two nodes incident to it. An edge \(e_{jk}\) linking vertex \(v_j\) to \(v_k\) has high closeness whenever vertices \(v_j\) and \(v_k\) also have high closeness.

More specifically, the closeness centrality of an edge is proportional to the ratio of the product of the closeness of the two nodes incident to it divided by their sum:

\[ C(e_{jk}) = (E - 1)\frac{C(v_j) \times C(v_k)}{C(v_j)+C(v_k)} \]

Note that the equation normalizes the ratio of the product to the sum of the vertex closeness centralities by the number of edges minus one.

To compute edge closeness in a real network, we can use the same approach to data wrangling we used to compute the degree correlation. The goal is to create an edge list data frame containing five columns. The ids of the two nodes in the edge, the closeness centralities of the two nodes in the edge, and the closeness centrality of the edge calculated according to the above equation.

In the Pulp Fiction network this looks like:

   library(dplyr)
   g.el <- as_edgelist(g) #transforming graph to edgelist
   c <- round(closeness(g), 3)  #closeness centrality vector
   c.dat <- data.frame(name1 = names(c), name2 = names(c), c)
   el.temp <- data.frame(name2 = g.el[, 2]) %>% 
      left_join(c.dat, by = "name2") %>% 
      dplyr::select(c("name2", "c")) %>% 
      rename(c2 = c) 
   c.el <- data.frame(name1 = g.el[, 1]) %>% 
      left_join(c.dat, by = "name1") %>% 
      dplyr::select(c("name1", "c")) %>% 
      rename(c1 = c) %>% 
      cbind(el.temp) %>% 
      mutate(e.clos = round((ecount(g)-1)*(c1*c2)/(c+c2), 3))
head(c.el)
  name1    c1     name2    c2 e.clos
1 BRETT 0.013 MARSELLUS 0.010  0.571
2 BRETT 0.013    MARVIN 0.010  0.625
3 BRETT 0.013     ROGER 0.013  0.632
4 BRETT 0.013   VINCENT 0.014  0.681
5 BUDDY 0.011       MIA 0.011  0.555
6 BUDDY 0.011   VINCENT 0.014  0.622

To create a table of the top five closeness centrality edges, we just order the data frame by the last column and table it:

   c.el <- c.el[order(c.el$e.clos, decreasing = TRUE), ] %>% 
      dplyr::select(c("name1", "name2", "e.clos"))

   kbl(c.el[1:5, ], format = "pipe", align = c("l", "l", "c"),
       col.names = c("i", "j", "Edge Clos."), row.names = FALSE,
       caption = "Edges Sorted by Closeness in the Pulp Fiction Network") %>% 
   kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))
Edges Sorted by Closeness in the Pulp Fiction Network
i j Edge Clos.
BUTCH VINCENT 0.900
BRETT BUTCH 0.875
MOTHER VINCENT 0.875
BUTCH MIA 0.864
JULES PUMPKIN 0.850

Interestingly, the top closeness edges tend to bring somewhat strange bedfellows together, characters that themselves don’t spend much time together in the film (e.g., the Butch/Vincent interaction is relatively brief and somewhat embarrassing for Vincent) but who themselves can reach other character clusters in the film via relatively short paths.

Closeness Centrality in Directed Graphs

What about closeness centrality for a directed network? Let us see how this works using a subgraph of the advice network, this time selecting just women under the age of forty:

   women <- which(V(law_advice)$gender == 2)
   wg <- subgraph(law_advice, women)
   young <- which(V(wg)$age < 40)
   wg <- subgraph(wg, young)
   V(wg)
+ 12/12 vertices, from d5a286c:
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

This network is small enough that a plot could be informative about its structure. Let us plot it using the package ggraph, a visualization package that follows the same principles as the ggplot grammar of graphics but for network graphs (see here).

   #install.packages("ggraph")
   library(ggraph)
    p <- ggraph(wg, layout = 'auto')
    p <- p + geom_edge_parallel(color = "steelblue", edge_width = 0.5,
                                arrow = arrow(length = unit(2.5, 'mm')),
                                end_cap = circle(4, 'mm'), 
                                sep = unit(3, 'mm'))
    p <- p + geom_node_point(aes(x = x, y = y), size = 8, color = "tan2") 
    p <- p + geom_node_text(aes(label = 1:vcount(wg)), size = 4, color = "white")
    p <- p + theme_graph() 
    p

Women lawyers advice network

Now a question we might ask is who has the greatest closeness centrality in this advice network. We could proceed as usual and compute the geodesic distances between actors:

   S <- distances(wg)
   S
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]    0    1    2    1    3    3    4    2    2     3     3     3
 [2,]    1    0    2    1    2    3    3    1    1     3     3     3
 [3,]    2    2    0    1    1    1    2    2    3     1     1     1
 [4,]    1    1    1    0    2    2    3    2    2     2     2     2
 [5,]    3    2    1    2    0    1    1    1    2     2     2     2
 [6,]    3    3    1    2    1    0    2    2    3     1     2     1
 [7,]    4    3    2    3    1    2    0    2    3     3     3     3
 [8,]    2    1    2    2    1    2    2    0    1     3     3     3
 [9,]    2    1    3    2    2    3    3    1    0     4     4     4
[10,]    3    3    1    2    2    1    3    3    4     0     1     1
[11,]    3    3    1    2    2    2    3    3    4     1     0     1
[12,]    3    3    1    2    2    1    3    3    4     1     1     0

Note that this is not quite right. In igraph the default settings of the distance function treats the graph as undirected. So it doesn’t use the strict directed paths, but it just treats them all as semi-paths ignoring direction. That is why, for instance, it counts node 1 as being “adjacent” to node 4 even though there is only one incoming link from 4 to 1 and why the whole matrix is symmetric, when we know from just eyeballing the network that there is a lot of asymmetry in terms of who can reach who via directed paths.

To get the actual directed distance matrix, we need to specify the “mode” option, asking whether we want in or out paths. Here, let’s select out-paths:

   S <- distances(wg, mode = "out")
   S
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,]    0    1  Inf  Inf  Inf  Inf  Inf  Inf  Inf   Inf   Inf   Inf
 [2,]    1    0  Inf  Inf  Inf  Inf  Inf  Inf  Inf   Inf   Inf   Inf
 [3,]    2    2    0    1  Inf    1  Inf  Inf  Inf   Inf   Inf   Inf
 [4,]    1    1    1    0  Inf    2  Inf  Inf  Inf   Inf   Inf   Inf
 [5,]    3    2    1    2    0    1    1    1    2   Inf   Inf   Inf
 [6,]    3    3    1    2  Inf    0  Inf  Inf  Inf   Inf   Inf   Inf
 [7,]    4    3    2    3    1    2    0    2    3   Inf   Inf   Inf
 [8,]    2    1  Inf  Inf  Inf  Inf  Inf    0    1   Inf   Inf   Inf
 [9,]    2    1  Inf  Inf  Inf  Inf  Inf    1    0   Inf   Inf   Inf
[10,]    3    3    1    2  Inf    1  Inf  Inf  Inf     0     1     2
[11,]    3    3    1    2  Inf    2  Inf  Inf  Inf     1     0     1
[12,]    3    3    1    2  Inf    1  Inf  Inf  Inf     1     1     0

This is better but introduces a problem. The directed graph is not strongly connected, so it means that some nodes cannot reach other ones via a directed path of any length. That means that the geodesic distances from a node to an unreachable node is coded as “infinite” (Inf). The problem with infinity is that it gets in the way of calculating sums of distances, a requirement for the closeness centrality.

   S <- distances(wg, mode = "out")
   rowSums(S)
 [1] Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf Inf

Adding infinity to a number just returns infinity so all the rows with at least one “Inf” in the distance matrix get an Inf for the row sum. In this case that’s all of them. A bummer.

Harmonic Centrality

But dont’ worry there’s a patch. It is called the harmonic centrality (Rochat 2009).1 This is a variation on the closeness centrality that works whether you are working with connected or disconnected graphs (or in the case of directed graphs regardless of whether the graph is strongly or weakly connected), and therefore regardless of whether the geodesic distance matrix contains Infs.2

The main difference between the harmonic and regular closeness centrality is that instead of calculating the inverse of the sum of the distances for each node, we calculate the sum of the inverses:

   S <- distances(wg, mode = "out")
   S = round(1/S, 2) 
   diag(S) <- 0 #setting diagonals to zero
   S
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
 [1,] 0.00 1.00  0.0 0.00    0  0.0    0  0.0 0.00     0     0   0.0
 [2,] 1.00 0.00  0.0 0.00    0  0.0    0  0.0 0.00     0     0   0.0
 [3,] 0.50 0.50  0.0 1.00    0  1.0    0  0.0 0.00     0     0   0.0
 [4,] 1.00 1.00  1.0 0.00    0  0.5    0  0.0 0.00     0     0   0.0
 [5,] 0.33 0.50  1.0 0.50    0  1.0    1  1.0 0.50     0     0   0.0
 [6,] 0.33 0.33  1.0 0.50    0  0.0    0  0.0 0.00     0     0   0.0
 [7,] 0.25 0.33  0.5 0.33    1  0.5    0  0.5 0.33     0     0   0.0
 [8,] 0.50 1.00  0.0 0.00    0  0.0    0  0.0 1.00     0     0   0.0
 [9,] 0.50 1.00  0.0 0.00    0  0.0    0  1.0 0.00     0     0   0.0
[10,] 0.33 0.33  1.0 0.50    0  1.0    0  0.0 0.00     0     1   0.5
[11,] 0.33 0.33  1.0 0.50    0  0.5    0  0.0 0.00     1     0   1.0
[12,] 0.33 0.33  1.0 0.50    0  1.0    0  0.0 0.00     1     1   0.0

Note that in this matrix of inverse distances, the closest (adjacent) nodes get the maximum score of one, and nodes farther apart when smaller scores (approaching zero). More importantly, those pesky Infs disappear (!) because unreachable directed pairs of nodes get the lowest score, corresponding to \(1/\infty = 0\). Turns out the mathematics of infinity weren’t our enemy after all.

Also note that the reachability relation expressed in this matrix is asymmetric: So node 4 and reach node 1 (there is a directed tie from 4 to 1), but node 1 cannot reach 4. This is precisely what we want.

Once we have this matrix of inverse distances, we can then we can compute the harmonic centrality the same way as regular closeness by adding up the row scores for each node and dividing by the number of nodes minus one (to get the average):

   d.harm <- rowSums(S)
   d.harm <- d.harm/(vcount(wg) - 1)
   d.harm <- round(d.harm, 2)
   d.harm
 [1] 0.09 0.09 0.27 0.32 0.53 0.20 0.34 0.23 0.23 0.42 0.42 0.47

We can see that the highest harmonic closeness centrality node is 5, followed by 12. Here’s a plot of the network highlighting the highest harmonic (closeness) centrality node.

   col <- rep("tan2", vcount(wg)) #creating node color vector
   col[which(d.harm == max(d.harm))] <- "red" #changing color of max centrality node to red
   p <- p + geom_node_point(aes(x = x, y = y), size = 8, color = col)
   p <- p + geom_node_text(aes(label = 1:vcount(wg)), size = 4, color = "white")
   p

Women lawyers advice network with highest closeness centrality node in red

Of course, igraph has a built in function to calculate the harmonic centrality called (you guessed it) harmonic_centrality:

   d.harm <- harmonic_centrality(wg, normalized = TRUE)
   d.harm <- round(d.harm, 2)
   d.harm
 [1] 0.09 0.09 0.27 0.32 0.53 0.20 0.34 0.23 0.23 0.42 0.42 0.47

Which gives us the same results.

Generalized Harmonic Centrality

Agneessens, Borgatti, and Everett (2017) propose a “generalized” version of the harmonic centrality that yields plain old degree centrality and the regular harmonic centrality as special cases. The key is to introduce a parameter \(\delta\) governing how much weight we give to shortest paths based on distance. Let’s see how this works.

Recall that the harmonic centrality we defined earlier is given by:

\[ \frac{\sum_{j \neq i} (g_{ij})^{-1}}{n-1} \]

For any node \(i\), where \(g_{ij}\) is the geodesic distance between \(i\) and every other node in the graph \(j\), which could be “infinite” if there is no path linking them.

Agneessens et al’s tweak is to instead compute:

\[ \frac{\sum_{j \neq i} (g_{ij})^{-\delta}}{n-1} \]

Where \(\delta\) is a free parameter chosen by the researcher with the restriction that \(\delta \geq 0\) (if you want to calculate a closeness measure as we will see below).

When \(\delta = \infty\) the numerator element \(1/(g_{ij})^{\infty} = 1\) only when nodes are adjacent and \(g_{ij} = 1\) (because \(1^{\infty} = 1\)); otherwise, for \(g_{ij} > 1\) then \(1/(g_{ij})^{\infty} = 0\), and therefore the generalized harmonic centrality just becomes a (normalized) version of degree centrality. Alternatively, when \(\delta = 1\) we just get the plain old harmonic centrality we defined earlier.

The interesting cases come from \(1 > \delta < \infty\) and \(0 > \delta < 1\). In the first case, nodes at shorter distances are weighted more (like in the standard harmonic centrality measure) as \(\delta\) becomes bigger and bigger then the generalized harmonic centrality approximates degree. For values below one, as \(\delta\) approaches zero, then indirect connections to nodes of greater length are discounted less, and thus count for “more” in defining your generalized harmonic centrality score.

Let us see a real-world example of the generalized harmonic centrality in action:

First, we create a custom function to compute the generalized harmonic centrality:

   g.harm <- function(x, d) {
      library(igraph)
      S <- distances(x) #get distances from graph object
      S <- 1/S^d #matrix of generalized inverse distances
      diag(S) <- 0 #set diagonals to zero
      c <- rowSums(S)/(vcount(x) - 1) #summing and averaging
      return(c)
   }

Second, we compute three versions of the harmonic centrality, with \(\delta = 5\), \(\delta = 0.05\), and \(\delta = -5\), using the full (unrestricted by age) subgraph of the law_advice network composed of the women lawyers at the firm, with relations constrained to be undirected:

   women <- which(V(law_advice)$gender == 2)
   wg <- subgraph(law_advice, women)
   wg <- as.undirected(wg)
   c1 <- g.harm(wg, d = 5)
   c2 <- g.harm(wg, d = 0.05)
   c3 <- g.harm(wg, d = -5)
  • The first version of the harmonic centrality in line 5, with a positive value of \(\delta\) above zero, will compute centrality scores emphasizing direct (one-step) connections, thus coming closer to degree.

  • The second version, in line 6, with a value of \(\delta\) close to zero, will give comparatively more emphasis to indirect connections weighing longer paths almost as much as shorter paths (but always a little less), thus being more similar to closeness centrality.

  • Finally, the last version, in line 7, with \(\delta < 0\), will weigh longer paths more than shorter ones, serving as a measure of eccentricity (farness from others) not closeness.

Full women lawyers advice network

Above is a plot of the women lawyers network showing the top node for each of the centralities:

  • In red we have node 3 who has the largest degree (\(k(3) = 8\)) and thus comes out on top using the generalized harmonic centrality version emphasizing direct connections (\(\delta > 1\)).

  • Then in blue we have node 9 who can reach the most others via the shortest paths, and thus comes out on top when the generalized harmonic centrality emphasizes indirect connectivity.

  • Finally, in purple we have node 12, which is farthest from everyone else, and thus comes out on “top” when longer indirect connections count for more (\(\delta < 0)\).

As we said earlier, both regular harmonic centrality and degree are special cases of the generalized measure. We can check this by setting \(\delta\) to either one or infinity.

When we set \(\delta=1\) the generalized harmonic centrality is the same as (normalized by number of nodes minus one) the regular harmonic centrality:

   g.harm(wg, d = 1)
 [1] 0.5980392 0.5343137 0.7058824 0.5980392 0.6568627 0.6274510 0.4264706
 [8] 0.5392157 0.6960784 0.6470588 0.6666667 0.4068627 0.5882353 0.5196078
[15] 0.5833333 0.6029412 0.5588235 0.5441176
   harmonic_centrality(wg, normalized = TRUE)
 [1] 0.5980392 0.5343137 0.7058824 0.5980392 0.6568627 0.6274510 0.4264706
 [8] 0.5392157 0.6960784 0.6470588 0.6666667 0.4068627 0.5882353 0.5196078
[15] 0.5833333 0.6029412 0.5588235 0.5441176

When we set \(\delta=\infty\) the generalized harmonic centrality is the same as (normalized by number of nodes minus one) degree centrality:

   g.harm(wg, d = Inf)
 [1] 0.23529412 0.17647059 0.47058824 0.23529412 0.35294118 0.29411765
 [7] 0.05882353 0.17647059 0.41176471 0.35294118 0.41176471 0.05882353
[13] 0.23529412 0.17647059 0.23529412 0.35294118 0.23529412 0.23529412
   degree(wg, normalized = TRUE)
 [1] 0.23529412 0.17647059 0.47058824 0.23529412 0.35294118 0.29411765
 [7] 0.05882353 0.17647059 0.41176471 0.35294118 0.41176471 0.05882353
[13] 0.23529412 0.17647059 0.23529412 0.35294118 0.23529412 0.23529412

Betweenness

We finally come to betweenness centrality. Recall that the key conceptual distinction between closeness and betweenness according to Freeman (1979) is that between (pun intended) the capacity to reach others quickly (e.g., via the shortest paths) and the capacity to intermediate among those same paths. High betweenness nodes control the flow of information in the network between other nodes.

This is evident in the way betweenness is calculated. Recall that the betweenness of a node k relative to any pair of nodes i and j in the network is simply:

\[ \frac{\sigma_{i(k)j}}{\sigma_{ij}} \]

Where the denominator of the fraction (\(\sigma_{ij}\)) is a count of the total number of shortest paths that start and end with nodes i and j and the numerator of the fraction (\(\sigma_{i(k)j}\)) is the subset of those paths that include node k as an inner node.

As Freeman (1979) also notes because this is a ratio, it can range from zero to one, with everything in between. As such the betweenness centrality of a node relative to any two others has an intuitive interpretation as a probability, namely the probability that if you send something from i to j it has to go through k. This probability is 1.0 if k stands in every shortest path between i and j and zero if they stand in none of the shortest paths indirectly connecting i and j.

The betweenness of a given node is just the sum all of these probabilities across every pair of nodes in the graph for each node:

\[ \sum_{i \neq j, i \neq n, j \neq v} \frac{\sigma_{i(k)j}}{\sigma_{ij}} \]

Below we can see a point and line diagram of the undirected Pulp Fiction network we have been working with.

Pulp Fiction character schene co-appearance network.

We should expect a character to have high betweenness in this network to the extent that they appear in scenes with characters who themselves don’t appear in any scenes together, thus inter-mediating between different parts of the story. Characters who only appear in one scene with some others (like The Wolf or The Gimp) are likely to be low in betweenness.

Let’s create a top ten table of betweenness for the Pulp Fiction network. We use the igraph function betweenness to calculate the scores:

   pulp.bet <- betweenness(g)
   top.5.bet <- sort(pulp.bet, decreasing = TRUE)[1:10]
   kbl(round(top.5.bet, 2), format = "pipe", align = c("l", "c"),
       col.names = c("Character", "Betweenness"),
       caption = "Top Five Betweenness Characters in Pulp Fiction Network.") %>% 
   kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))
Top Five Betweenness Characters in Pulp Fiction Network.
Character Betweenness
BUTCH 275.52
VINCENT 230.19
JULES 142.11
MIA 76.68
MAYNARD 70.00
HONEY BUNNY 49.97
PUMPKIN 49.97
SPORTSCASTER #1 36.00
BRETT 29.85
PREACHER 28.23

Unsurprisingly, the top four characters are also the highest in betweenness. Somewhat surprisingly, the main antagonist of the story (the pawn shop owner) is also up there. After that we see a big drop in the bottom five of the top ten.

Now let us examine betweenness centrality in our directed young women lawyers advice network:

   women <- which(V(law_advice)$gender == 2)
   wg <- subgraph(law_advice, women)
   young <- which(V(wg)$age < 40)
   wg <- subgraph(wg, young)
   w.bet <- betweenness(wg)
   w.bet
 [1]  0.0000000  3.0000000 16.3333333 11.0000000  7.0000000  0.0000000
 [7]  0.0000000  5.0000000  0.0000000  0.3333333  1.0000000  0.3333333

Here we see that node 3 is the highest in betweenness, pictured below:

Women lawyers advice network with highest closeness centrality node in blue and highest betweenness centrality node in red

This result makes sense. Node 3 intermediates all the connections linking the tightly knit group of nodes on the left side (6, 10, 11, 12) with the rest of the network. Also if nodes 5 and 7 need to pass something along to the rest, they have to use 3 at least half time. Node 4 also needs 3 to reach 6.

This result nicely illustrates the difference between closeness and betweenness.

Edge Betweenness

Edge betweenness is defined in similar fashion as node betweenness:

\[ \sum_{i \neq j} \frac{\sigma_{i(e)j}}{\sigma_{ij}} \]

Where \(\sigma_{i(e)j}\) is a count of the number of shortest paths between i and j that feature edge e as an intermediary link. This tells us that the betweenness of an edge e is the sum of the ratios of the number of times that edge appears in the middle of a shortest path connecting every pair of nodes in the graph i and j divided by the total number of shortest paths linking each pair of nodes.

Like before, the edge betweenness with respect to a specific pair of nodes in the graph is a probability: Namely, that if you send something–using a shortest path–from any node i to any other node j it has to go through edge e. The resulting edge betweenness scores is the sum of these probabilities across every possible pair of nodes for each edge in the graph.

For this example, we will work with a simplified version of the women lawyers advice network, in which we transform it into an undirected graph. We use the igraph function as.undirected for that:

   wg <- as.undirected(wg, mode = "collapse")

The “collapse” value in the “mode” argument tells as.undirected to link every connected dyad in the original directed graph using an undirected edge. It does that by removing the directional arrow of the single directed links and collapsing (hence the name) all the bi-directional links into a single undirected one.

The resulting undirected graph looks like this:

Looking at this point and line plot of the women lawyers advice network, which edge do you think has the top betweenness?

Well no need to figure that out via eyeballing! We can just use the igraph function edge_betweenness:

   w.ebet <- edge_betweenness(wg)

The edge_betweenness function takes the igraph graph object as input and produces a vector of edge betweenness values of the same length as the number of edges in the graph, which happens to be 20 in this case.

Using this information, we can then create a table of the top ten edges ordered by betweenness:

   edges <- as_edgelist(wg) #creating an edgelist
   etab <- data.frame(edges, bet = round(w.ebet, 2)) #adding bet. scores to edgelist
   etab <- etab[order(etab$bet, decreasing = TRUE), ] #ordering by bet.
   kbl(etab[1:10, ], format = "pipe", align = c("l", "l", "c"),
       col.names = c("i", "j", "Edge Bet."), row.names = FALSE,
       caption = "Edges Sorted by Betweenness in the Women Lawyers Advice Network") %>% 
   kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))
Edges Sorted by Betweenness in the Women Lawyers Advice Network
i j Edge Bet.
3 4 19.17
5 8 15.83
3 5 13.67
5 7 11.00
2 4 9.17
3 11 8.33
5 6 8.17
1 4 7.00
2 8 6.50
8 9 6.33

Not surprisingly, the top edges are the ones linking nodes 3 and 4 and nodes 5 and 8.

Disconnecting a Graph Via Bridge Removal

High betweenness edges are likely to function as bridges being the only point of indirect connectivity between most nodes in the social structure. That means that an easy way to disconnect a connected graph is to remove the bridges (Girvan and Newman 2002).

In igraph we can produce an edge deleted subgraph of an original graph using the “minus” operator, along with the edge function like this:

   del.g <- wg - edge("3|4")
   del.g <- del.g - edge("5|8")

The first line creates a new graph object (a subgraph) which equals the original graph minus the edge linking nodes 3 and 4. The second line takes this last subgraph and further deletes the edge linking nodes 5 and 8.

The resulting subgraph, minus the top two high-betweenness edges, looks like:

Which is indeed disconnected!

Induced Betweenness

Borgatti and Everett (2020, 340–45) argue that another way of thinking about centrality of a node (or edge) is to calculate the difference that removing that node makes for some graph property in the network. They further suggest that the sum of the centrality scores of each node is just such a property, proposing that betweenness is particularly interesting in this regard. Let’s see how this works.

We will use the undirected version of the women lawyers advice network for this example. Let’s say we are interested in the difference that node 10 makes for the betweenness centralities of everyone else. In that case we would proceed as follows:

   bet <- betweenness(wg) #original centrality scores
   Sbet <- sum(bet) #sum of original centrality scores
   wg.d <- wg - vertex("10") #removing vertex 10 from the graph
   bet.d <- betweenness(wg.d) #centrality scores of node deleted subgraph
   Sbet.d <- sum(bet.d) #sum of centrality scores of node deleted subgraph
   total.c <- Sbet - Sbet.d #total centrality
   indirect.c <- total.c - bet[10] #indirect centrality
   indirect.c
[1] 12.66667

Line 1 just calculates the regular betweenness centrality vector for the graph. Line 2 sums up all of the entries of this vector. Line 3 creates a node deleted subgraph by removing node 10. This is done using the “minus” operator and the igraph function vertex, which works just like the edge function we used earlier to create an edge deleted subgraph, except it takes a node id or name as input.

Lines 4-5 just recalculate the sum of betweenness centralities in the subgraph that excludes node 10. Then in line 6 we subtract the sum of centralities of the node deleted subgraph from the sum of centralities of the original graph. If this number, which Borgatti and Everett call the “total” centrality, is large and positive then that means that node 10 makes a difference for the centrality of others.

However, part of that difference is node 10’s own “direct” centrality, so to get a more accurate sense of node 10’s impact on other people’s centrality we need to subtract node 10’s direct centrality from the total number, which we do in line 7 to get node 10’s “indirect” centrality. The result is shown in the last line, which indicates that node 10 has a pretty big impact on other people’s betweenness centralities, net of their own (which is pretty small).

Now all we need to do is do the same for each node to create a vector of indirect betweenness centralities. So we incorporate the code above into a short loop through all vertices:

   total.c <- 0 #empty vector
   indirect.c <- 0 #empty vector
   for (i in 1:vcount(wg)) {
      wg.d <- wg - vertex(i)
      bet.d <- betweenness(wg.d) #centrality scores of node deleted subgraph
      Sbet.d <- sum(bet.d) #sum of centrality scores of node deleted subgraph
      total.c[i] <- Sbet - Sbet.d #total centrality
   indirect.c[i] <- total.c[i] - bet[i] #total minus direct
   }

We can now list the total, direct, and indirect betweenness centralities for the women lawyers graph using a nice table:

   i.bet <- data.frame(n = 1:vcount(wg), total.c, round(betweenness(wg), 1), round(indirect.c, 1))
   kbl(i.bet, format = "pipe", align = c("l", "c", "c", "c"),
       col.names = c("Node", "Total", "Direct", "Indirect"), row.names = FALSE,
       caption = "Induced Betweenness Scores in the Women Lawyers Advice Network") %>% 
   kable_styling(bootstrap_options = c("hover", "condensed", "responsive"))
Induced Betweenness Scores in the Women Lawyers Advice Network
Node Total Direct Indirect
1 16 0.0 16.0
2 4 6.7 -2.7
3 -24 23.2 -47.2
4 -4 12.2 -16.2
5 19 18.8 0.2
6 10 3.7 6.3
7 18 0.0 18.0
8 4 8.8 -4.8
9 18 0.0 18.0
10 13 0.3 12.7
11 14 0.0 14.0
12 13 0.3 12.7

This approach to decomposing betweenness centrality provides a new way to categorize actors in a network:

  • On the one hand, we have actors like nodes 3 and 4 who “hog” centrality from others. Perhaps these are the prototypical high betweenness actors who monopolize the flow through the network. Their own direct centrality is high, but their indirect centrality is negative, suggesting that others become more central when they are removed from the graph as they can now become intermediaries themselves.

  • In contrast, we also have actors like node 5 who are high centrality themselves, but who’s removal from the network does not affect anyone else’s centrality. These actors are high betweenness but themselves don’t monopolize the flow of information in the network.

  • Then we have actors (like nodes 9-12) who have low centrality, but whose removal from the network makes a positive difference for other people’s centrality, which overall decreases when they are removed from the network.

  • Finally, we have actors line nodes 2 and 8, who are not particularly central, but who also hog centrality from others, in that removing them from the network also increases other people’s centrality (although not such an extent as the hogs).

References

Agneessens, Filip, Stephen P Borgatti, and Martin G Everett. 2017. “Geodesic Based Centrality: Unifying the Local and the Global.” Social Networks 49: 12–26.
Boldi, Paolo, and Sebastiano Vigna. 2014. “Axioms for Centrality.” Internet Mathematics 10 (3-4): 222–62.
Borgatti, Stephen P, and Martin G. Everett. 2020. “Three Perspectives on Centrality.” In The Oxford Handbook of Social Networks, edited by R. Light and J. Moody, 334--351. Sage Publications.
Bröhl, Timo, and Klaus Lehnertz. 2022. “A Straightforward Edge Centrality Concept Derived from Generalizing Degree and Strength.” Scientific Reports 12 (1): 4407.
Freeman, Linton C. 1979. “Centrality in Social Networks Conceptual Clarification.” Social Networks 1: 215–39.
Girvan, Michelle, and Mark EJ Newman. 2002. “Community Structure in Social and Biological Networks.” Proceedings of the National Academy of Sciences 99 (12): 7821–26.
Rochat, Yannick. 2009. “Closeness Centrality Extended to Unconnected Graphs: The Harmonic Centrality Index.” Switzerland: Institute of Applied Mathematics University of Lausanne. https://infoscience.epfl.ch/server/api/core/bitstreams/7f3097a0-d7e0-4cd4-a3f3-82673aab1909/content.

Footnotes

  1. Agneessens, Borgatti, and Everett (2017) call the harmonic centrality “reciprocal closeness”↩︎

  2. Some people (Boldi and Vigna 2014) claim that the harmonic centrality is the only centrality measure that could be called by that name from a purely axiomatic mathematical approach, but that’s a different story.↩︎