Degree Centrality

In the following lecture notes we will go through the basic centrality metrics. Particularly, the “big three” according to Freeman (1979), namely, degree (treated here), followed by closeness and betweenness.

Degree Centrality in Undirected Graphs

We first load our trusty Pulp Fiction data set from the networkdata package, which is an undirected graph of character scene co-appearances in the film:

    library(networkdata)
    library(igraph)
    library(stringr) #using stringr to change names from all caps to title case
    g <- movie_559
    V(g)$name <- str_to_title(V(g)$name)
    V(g)$name[which(V(g)$name == "Esmarelda")] <- "Esmeralda" #fixing misspelled name

Degree centrality is the simplest and most straightforward measure. In fact, we are already computed in the lecture notes on basic network statistics since it is the same as obtaining the graph’s degree sequence.

We can obtain that by computing the row (or column) sums of the graph’s adjacency matrix:

\[ C_i^{DEG}=\sum_j a_{ij} \tag{1}\]

Which in R we calculate as follows:

   A <- as_adjacency_matrix(g)
   A <- as.matrix(A)
   d <- rowSums(A)
   sort(d, decreasing = TRUE)

        Vincent           Butch           Jules             Mia       Marsellus 
             25              17              16              11              10 
    Honey Bunny         Pumpkin           Brett          Marvin           Roger 
              8               8               7               6               6 
     Capt Koons         Manager          Mother          Patron           Woman 
              5               5               5               5               5 
   English Dave            Jody           Lance        Waitress       Young Man 
              4               4               4               4               4 
    Young Woman        Fabienne       Gawker #2          Jimmie         Maynard 
              4               3               3               3               3 
     Pedestrian        Preacher          Raquel        The Wolf         Winston 
              3               3               3               3               3 
          Buddy     Ed Sullivan      Fourth Man Sportscaster #1        The Gimp 
              2               2               2               2               2 
            Zed       Esmeralda Sportscaster #2 
              2               1               1

The igraph function as_adjancency_matrix doesn’t quite return a regular R matrix object, so we have to further coerce the resulting object into a numerical matrix containing zeroes and ones using the as.matrix function in line 2. Then we can apply the native rowSums function to obtain each node’s degree.

We can also express the degree in “matrix” form as follows:

\[ \mathbf{d} = \mathbf{A}\mathbf{e} \tag{2}\]

Where $\mathbf{d}$ is the column vector containing each node’s degree and $\mathbf{e}$ is the all ones column vector of dimensions $N \times 1$ with $N$ equal to the number of nodes in the graph.

In R we can calculate the degree using matrix multiplication as follows:

   e <- matrix(1, nrow(A), 1) #all ones column vector
   d <- A %*% e
   names(d) <- rownames(A)
   sort(d, decreasing = TRUE)

        Vincent           Butch           Jules             Mia       Marsellus 
             25              17              16              11              10 
    Honey Bunny         Pumpkin           Brett          Marvin           Roger 
              8               8               7               6               6 
     Capt Koons         Manager          Mother          Patron           Woman 
              5               5               5               5               5 
   English Dave            Jody           Lance        Waitress       Young Man 
              4               4               4               4               4 
    Young Woman        Fabienne       Gawker #2          Jimmie         Maynard 
              4               3               3               3               3 
     Pedestrian        Preacher          Raquel        The Wolf         Winston 
              3               3               3               3               3 
          Buddy     Ed Sullivan      Fourth Man Sportscaster #1        The Gimp 
              2               2               2               2               2 
            Zed       Esmeralda Sportscaster #2 
              2               1               1

Of course, we can also use the igraph function degree which returns the same named vector with the degrees of each node as output:

   d <- degree(g)
   sort(d, decreasing = TRUE)

        Vincent           Butch           Jules             Mia       Marsellus 
             25              17              16              11              10 
    Honey Bunny         Pumpkin           Brett          Marvin           Roger 
              8               8               7               6               6 
     Capt Koons         Manager          Mother          Patron           Woman 
              5               5               5               5               5 
   English Dave            Jody           Lance        Waitress       Young Man 
              4               4               4               4               4 
    Young Woman        Fabienne       Gawker #2          Jimmie         Maynard 
              4               3               3               3               3 
     Pedestrian        Preacher          Raquel        The Wolf         Winston 
              3               3               3               3               3 
          Buddy     Ed Sullivan      Fourth Man Sportscaster #1        The Gimp 
              2               2               2               2               2 
            Zed       Esmeralda Sportscaster #2 
              2               1               1

Degree Centrality in Weighted Graphs

Typically, undirected graphs record the interactions between actors in the networks as binary (1/0) ties. Nevertheless in many cases edges have a weight associated with them, recording the intensity of the interactions between pair of actors, resulting in a weighted graph.

In igraph we can check if the network in question is weighted by seeing whether there is a weight attribute associated with the edges in the graph:

   E(g)$weight

  [1] 1 1 1 1 1 1 1 1 2 1 1 3 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 2 2 1 1 1 1 1 3 1 4
 [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 6 4 3 3 1 1 2
 [75] 1 1 3 9 1 3 2 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

If the graph was unweighted, this call would return a NULL value. We can see however, that there are weights associated with each edge. In the Pulp Fiction network, the weights represent the number of scenes in the movie that two characters co-appeared in.

We can take a quick peek at how these edge weights look by creating an edge list data frame using the igraph function as_edgelist and then attaching the edge weight attribute:

   dat <- data.frame(as_edgelist(g), w = E(g)$weight)

And looking at the first twenty rows:

   dat[1:20, ]

           X1              X2 w
1       Brett       Marsellus 1
2       Brett          Marvin 1
3       Brett           Roger 1
4       Brett         Vincent 1
5       Buddy             Mia 1
6       Buddy         Vincent 1
7       Brett           Butch 1
8       Butch      Capt Koons 1
9       Butch       Esmeralda 2
10      Butch       Gawker #2 1
11      Butch           Jules 1
12      Butch       Marsellus 3
13      Butch      Pedestrian 1
14      Butch Sportscaster #1 1
15      Butch    English Dave 1
16      Brett        Fabienne 1
17      Butch        Fabienne 2
18   Fabienne           Jules 1
19 Fourth Man           Jules 1
20 Fourth Man         Vincent 1

We can see, for instance, that Butch and Marsellus co-appear in three different scenes.

Vertex Strength

How do we incorporate edge weights into the calculation of degree centrality? One approach, proposed by Barrat et al. (2004) is to compute:

\[ C^{VS}_i = \sum_j w_{ij} \tag{3}\]

Where the weighted degree (Barrat et al. (2004) call it the vertex strength) of a node is just the sum of the weights of the edges incident to that node.

In R we can compute the vertex strength by calculating the row (or column) sums of the weighted adjacency matrix $\mathbf{W}$ containing the weight of the edge between $i$ and $j$ in each cell $w_{ij}$.

We can do that like this:

   W <- as.matrix(as_adjacency_matrix(g, attr="weight"))
   vs <- rowSums(W)
   sort(vs, decreasing = TRUE)

        Vincent           Jules           Butch             Mia       Marsellus 
             51              31              23              16              15 
       The Wolf     Honey Bunny          Jimmie           Lance         Pumpkin 
             13               9               9               9               9 
          Brett            Jody          Marvin           Roger      Capt Koons 
              8               8               8               6               5 
   English Dave         Manager          Mother          Patron           Woman 
              5               5               5               5               5 
       Fabienne        Waitress       Young Man     Young Woman       Gawker #2 
              4               4               4               4               3 
        Maynard      Pedestrian        Preacher          Raquel Sportscaster #1 
              3               3               3               3               3 
        Winston           Buddy     Ed Sullivan       Esmeralda      Fourth Man 
              3               2               2               2               2 
Sportscaster #2        The Gimp             Zed 
              2               2               2

Note that in igraph, we obtain the weighted adjacency matrix , by specifying "weight" as the value of the attr argument (for edge attribute) in the call to the as_adjacency matrix function.

Not surprisingly, the main characters in the story have the largest vertex strengths, as they tend to co-appear with other characters several times in the film.

The vertex strength approach is simple but it has the drawback of giving the same score to low degree nodes with large weights and high degree nodes with small weights (e.g., a node with two contacts each with a weight of 3.0 gets the same score as node with six contacts each with a weight of 1.0).

Generalized Degree

To deal with this issue, Opsahl, Agneessens, and Skvoretz (2010) propose that we instead compute:

\[ C^{WD}_i = \left(\sum_j w_{ij}\right)^{\alpha}\left(\sum_j a_{ij}\right)^{1-\alpha} \tag{4}\]

With the parameter $\alpha$ chosen by the researcher, with the restriction that $0 \geq \alpha \leq 1$. Note that in the right-hand side of the equation, the factor on the left is just vertex strength as defined in Equation 3 raised to the power of $\alpha$, and the factor on the right is just the usual binary degree centrality, counting the number of alters ego is connected to, raised to the power of $1-\alpha$.

The $\alpha$ parameter thus controls the importance we give the number of contacts versus the weight of the edges in computing the degree. When $\alpha = 1$ then Equation 4 reduces to Equation 3 the vertex strength; when $\alpha = 0$ then Equation 4 reduces to Equation 1 the regular unweighted degree.

$\alpha$ values between zero and one combine both quantities, with numbers closer to zero giving more weight to number of contacts, and numbers closer to one giving more importance to the weight of the edges.

Here’s function that computes the Opsahl, Agneessens, and Skvoretz (2010) generalized degree:

   gen.deg <- function(x, alpha) {
      s <- rowSums(as.matrix(as_adjacency_matrix(x, attr="weight")))
      d <- rowSums(as.matrix(as_adjacency_matrix(x)))
      wd <- s^alpha * d^(1-alpha)
      return(wd)
      }

The function takes the graph and a value for $\alpha$ as inputs and returns the vector of generalized degrees as output. In lines 2 and 3 we compute the vertex strength and the regular degree from the row sums of the weighted adjacency matrix and the regular adjacency matrix.

We can see that when we set $\alpha = 0$ we get values identical to the degree:

   dr <- gen.deg(g, alpha = 0)
   sort(dr, decreasing = TRUE)[1:10]

    Vincent       Butch       Jules         Mia   Marsellus Honey Bunny 
         25          17          16          11          10           8 
    Pumpkin       Brett      Marvin       Roger 
          8           7           6           6

And when we set $\alpha = 1$ we get values identical to the vertex strength:

   dr <- gen.deg(g, alpha = 1)
   sort(dr, decreasing = TRUE)[1:10]

    Vincent       Jules       Butch         Mia   Marsellus    The Wolf 
         51          31          23          16          15          13 
Honey Bunny      Jimmie       Lance     Pumpkin 
          9           9           9           9

Note that the top 10 ranks differ; while Vincent is the top character by either measure, Butch is second and Jules is third by degree, but their places switch when we rank them by vertex strength. Also, while the Wolf is not in the top 10 by degree, he is by vertex strength.

The interesting thing is what happens with other values. Let’s set $\alpha = 0.6$:

   dr <- gen.deg(g, alpha = 0.6)
   sort(round(dr, 1), decreasing = TRUE)[1:10]

    Vincent       Jules       Butch         Mia   Marsellus Honey Bunny 
       38.3        23.8        20.4        13.8        12.8         8.6 
    Pumpkin       Brett    The Wolf      Marvin 
        8.6         7.6         7.2         7.1

Which ends up being a compromise between the two top 10 lists, containing characters from both.

Degree Centrality in Directed Graphs

Indegree and Outdegree Centrality

The movie network is based on the relationship of co-appearance in a scene which by nature lacks any natural directionality (it’s a symmetric relation) and can therefore be represented in an undirected graph. The concepts of in and outdegree, by contrast, are only applicable to directed ties, which are represented by a directed graph. So to illustrate them, we need to switch to a different source of data.

We pick an advice network which is a classical directed kind of (asymmetric) relation. I can give advice to you, but that doesn’t necessarily mean you can give advice to me. The networkdata package contains one such data set collected in the late 80s early 1990s in a New England law firm (Lazega 2001) called law_advice:

   g <- law_advice
   V(g)

+ 71/71 vertices, from d1a9da7:
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71

   vertex_attr(g)

$status
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
[39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

$gender
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 2
[39] 2 1 1 1 2 2 1 2 1 2 1 1 2 1 1 1 1 1 2 1 2 2 2 1 1 2 1 1 2 1 2 1 2

$office
 [1] 1 1 2 1 2 2 2 1 1 1 1 1 1 2 3 1 1 2 1 1 1 1 1 1 2 1 1 2 1 2 2 2 2 1 2 1 3 1
[39] 1 1 1 1 1 3 1 2 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1

$seniority
 [1] 31 32 13 31 31 29 29 28 25 25 23 24 22  1 21 20 23 18 19 19 17  9 16 15 15
[26] 15 13 11 10  7  8  8  8  8  8  5  5  7  6  6  5  4  5  5  3  3  3  1  4  3
[51]  4  4 10  3  3  3  3  3  2  2  2  2  2  2  2  1  1  1  1  1  1

$age
 [1] 64 62 67 59 59 55 63 53 53 53 50 52 57 56 48 46 50 45 46 49 43 49 45 44 43
[26] 41 47 38 38 39 34 33 37 36 33 43 44 53 37 34 31 31 47 53 38 42 38 35 36 31
[51] 29 29 38 29 34 38 33 33 30 31 34 32 29 45 28 43 35 26 38 31 26

$practice
 [1] 1 2 1 2 1 1 2 1 2 2 1 2 1 2 2 2 2 1 2 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 2 1 2 1
[39] 1 1 1 2 1 2 2 2 1 2 1 2 1 1 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 1

$law_school
 [1] 1 1 1 3 2 1 3 3 1 3 1 2 2 1 3 1 1 2 1 1 2 3 2 2 2 3 1 2 3 3 2 3 3 2 3 3 3 2
[39] 1 1 2 2 2 1 3 2 3 3 2 2 3 3 3 3 3 2 2 3 2 2 3 2 2 2 3 3 2 3 3 2 2

We can see that the graph has 71 vertices, and that there are various attributes associated with each vertex, like gender, age, seniority, status in the law firm, etc. We can query those attributes using the igraph function vertex_attr, which takes the graph object as input.

Note that in contrast to the Pulp Fiction network the vertices in this graph don’t have a name attribute. We can assign one like this:

   V(g)$name <- paste("v", 1:vcount(g), sep = "_")

Now each node is just going to be named “v_1”, “v_2” and so forth:

   V(g)$name

 [1] "v_1"  "v_2"  "v_3"  "v_4"  "v_5"  "v_6"  "v_7"  "v_8"  "v_9"  "v_10"
[11] "v_11" "v_12" "v_13" "v_14" "v_15" "v_16" "v_17" "v_18" "v_19" "v_20"
[21] "v_21" "v_22" "v_23" "v_24" "v_25" "v_26" "v_27" "v_28" "v_29" "v_30"
[31] "v_31" "v_32" "v_33" "v_34" "v_35" "v_36" "v_37" "v_38" "v_39" "v_40"
[41] "v_41" "v_42" "v_43" "v_44" "v_45" "v_46" "v_47" "v_48" "v_49" "v_50"
[51] "v_51" "v_52" "v_53" "v_54" "v_55" "v_56" "v_57" "v_58" "v_59" "v_60"
[61] "v_61" "v_62" "v_63" "v_64" "v_65" "v_66" "v_67" "v_68" "v_69" "v_70"
[71] "v_71"

To keep things manageable, we will restrict our analysis to partners. To do that we need to select the subgraph that only includes the vertices with value of 1 in the “status” vertex attribute. From the data description, we know the first 36 nodes (with value of 1 in the status attribute) are the law firm’s partners (the rest are associates). In igraph we can do this as using the subgraph function:

   g <- subgraph(g, 1:36)
   V(g)

+ 36/36 vertices, named, from 3a2f38b:
 [1] v_1  v_2  v_3  v_4  v_5  v_6  v_7  v_8  v_9  v_10 v_11 v_12 v_13 v_14 v_15
[16] v_16 v_17 v_18 v_19 v_20 v_21 v_22 v_23 v_24 v_25 v_26 v_27 v_28 v_29 v_30
[31] v_31 v_32 v_33 v_34 v_35 v_36

   V(g)$status

 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

The first line just tells igraph to generate the subgraph containing the first 36 nodes (the partners). The subgraph function thus takes two main inputs: The graph object, and then a vector of node ids (or node labels) telling the function which nodes to select to create the node-induced subgraph.

   A <- as_adjacency_matrix(g)
   A <- as.matrix(A)

As we saw in the lecture notes on basic statistics for directed graphs, in contrast to the undirected case where the row and column sums give you the same number (each node’s degree), in the directed case, the row and column sums give you two different sets of numbers.

The row sums provide each node’s outdegree centrality (number of outgoing links incident to each node), and the column sums provide the each nodes’s indegree centrality (number of incoming links incident to each node):

\[ C_i^{OUT}=\sum_j a_{ij} \tag{5}\]

\[ C_i^{IN}=\sum_i a_{ij} \tag{6}\]

Note that to compute $C_i^{OUT}$ we sum across the columns ($j$ subscript) and to compute $C_i^{IN}$ we sum across the rows ($i$ subscript).

In the law_advice network case, each directed tie indicates that the source node $i$ seeks advice from the destination $j$. This means that if you are high in in-degree centrality, you are an advice giver (perhaps indicating informal status or experience) and if you are high in the outdegree centrality you are advice seeker (perhaps indicating novice or subordinate status).

In R we can compute that in and out degrees using the colSums and rowSums matrix functions like this:

   d.i <- colSums(A)
   d.o <- rowSums(A)
   sort(d.i, decreasing = TRUE)[1:10] #top ten nodes by indegree

v_17 v_26  v_1  v_2  v_6 v_20 v_34 v_15  v_4 v_12 
  21   20   18   17   17   17   16   15   14   14

   sort(d.o, decreasing = TRUE)[1:10] #top ten nodes by outdegree

v_26 v_28 v_19 v_16 v_12 v_27  v_4 v_17 v_24 v_31 
  22   22   21   19   18   18   17   17   16   16

We can see from these results that $v_{17}$, $v_{26}$ and $v_1$ are the top advice givers and $v_{26}$, $v_{28}$ and $v_{19}$ are the top advice seekers.

We can also express the in ($\mathbf{d}_i$) and out ($\mathbf{d}_o$) degree vectors in matrix form:

\[ \mathbf{d}_i = \mathbf{A}^T\mathbf{e} \tag{7}\]

\[ \mathbf{d}_o = \mathbf{A}\mathbf{e} \tag{8}\]

Where $\mathbf{e}$ is the all ones column vector of dimensions $N \times 1$ with $N$ equal to the number of nodes in the graph, and $\mathbf{A}^T$ is the transpose of the adjacency matrix.

In R we can compute the in and outdegrees using matrix multiplication as follows:

   e <- matrix(1, nrow(A), 1) #all ones column vector
   d.i <- t(A) %*% e
   d.o <- A %*% e
   names(d.i) <- rownames(A)
   names(d.o) <- rownames(A)
   sort(d.i, decreasing = TRUE)[1:10] #top ten nodes by indegree

v_17 v_26  v_1  v_2  v_6 v_20 v_34 v_15  v_4 v_12 
  21   20   18   17   17   17   16   15   14   14

   sort(d.o, decreasing = TRUE)[1:10] #top ten nodes by outdegree

v_26 v_28 v_19 v_16 v_12 v_27  v_4 v_17 v_24 v_31 
  22   22   21   19   18   18   17   17   16   16

Of course igraph has a dedicated function for this, which is just our old friend degree with an extra option mode, indicating whether you want the in or outdegrees:

   d.o <- degree(g, mode = "out")
   d.i <- degree(g, mode = "in")
   sort(d.i, decreasing = TRUE)[1:10] #top ten nodes by indegree

v_17 v_26  v_1  v_2  v_6 v_20 v_34 v_15  v_4 v_12 
  21   20   18   17   17   17   16   15   14   14

   sort(d.o, decreasing = TRUE)[1:10] #top ten nodes by outdegree

v_26 v_28 v_19 v_16 v_12 v_27  v_4 v_17 v_24 v_31 
  22   22   21   19   18   18   17   17   16   16

Correlation Between Degree and Vertex Attributes

Note that the graph attributes are just vectors of values, and can be accessed from the graph object using the $ operator attached to the V() function as we did above.

So if we wanted to figure out the correlation between some vertex attribute and in or out degree centrality, all we need to do is correlate the two vectors:

   r <- cor(d.o, V(g)$age)
   round(r, 2)

[1] -0.43

Which tells us that at least in this case, younger partners are more sought after as sources of advice than older partners.

References

Barrat, Alain, Marc Barthelemy, Romualdo Pastor-Satorras, and Alessandro Vespignani. 2004. “The Architecture of Complex Weighted Networks.” Proceedings of the National Academy of Sciences 101 (11): 3747–52.

Freeman, Linton C. 1979. “Centrality in Social Networks Conceptual Clarification.” Social Networks 1: 215–39.

Lazega, Emmanuel. 2001. The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership. Oxford University Press.

Opsahl, Tore, Filip Agneessens, and John Skvoretz. 2010. “Node Centrality in Weighted Networks: Generalizing Degree and Shortest Paths.” Social Networks 32 (3): 245–51.