Degree Centrality
In the following lecture notes we will go through the basic centrality metrics. Particularly, the “big three” according to Freeman (1979), namely, degree (treated here), followed by closeness and betweenness.
Degree Centrality in Undirected Graphs
We first load our trusty Pulp Fiction data set from the networkdata package, which is an undirected graph of character scene co-appearances in the film:
Degree centrality is the simplest and most straightforward measure. In fact, we are already computed in the lecture notes on basic network statistics since it is the same as obtaining the graph’s degree sequence.
We can obtain that by computing the row (or column) sums of the graph’s adjacency matrix:
\[ C_i^{DEG}=\sum_j a_{ij} \tag{1}\]
Which in R we calculate as follows:
Vincent Butch Jules Mia Marsellus
25 17 16 11 10
Honey Bunny Pumpkin Brett Marvin Roger
8 8 7 6 6
Capt Koons Manager Mother Patron Woman
5 5 5 5 5
English Dave Jody Lance Waitress Young Man
4 4 4 4 4
Young Woman Fabienne Gawker #2 Jimmie Maynard
4 3 3 3 3
Pedestrian Preacher Raquel The Wolf Winston
3 3 3 3 3
Buddy Ed Sullivan Fourth Man Sportscaster #1 The Gimp
2 2 2 2 2
Zed Esmeralda Sportscaster #2
2 1 1
The igraph function as_adjancency_matrix doesn’t quite return a regular R matrix object, so we have to further coerce the resulting object into a numerical matrix containing zeroes and ones using the as.matrix function in line 2. Then we can apply the native rowSums function to obtain each node’s degree.
We can also express the degree in “matrix” form as follows:
\[ \mathbf{d} = \mathbf{A}\mathbf{e} \tag{2}\]
Where \(\mathbf{d}\) is the column vector containing each node’s degree and \(\mathbf{e}\) is the all ones column vector of dimensions \(N \times 1\) with \(N\) equal to the number of nodes in the graph.
In R we can calculate the degree using matrix multiplication as follows:
e <- matrix(1, nrow(A), 1) #all ones column vector
d <- A %*% e
names(d) <- rownames(A)
sort(d, decreasing = TRUE) Vincent Butch Jules Mia Marsellus
25 17 16 11 10
Honey Bunny Pumpkin Brett Marvin Roger
8 8 7 6 6
Capt Koons Manager Mother Patron Woman
5 5 5 5 5
English Dave Jody Lance Waitress Young Man
4 4 4 4 4
Young Woman Fabienne Gawker #2 Jimmie Maynard
4 3 3 3 3
Pedestrian Preacher Raquel The Wolf Winston
3 3 3 3 3
Buddy Ed Sullivan Fourth Man Sportscaster #1 The Gimp
2 2 2 2 2
Zed Esmeralda Sportscaster #2
2 1 1
Of course, we can also use the igraph function degree which returns the same named vector with the degrees of each node as output:
Vincent Butch Jules Mia Marsellus
25 17 16 11 10
Honey Bunny Pumpkin Brett Marvin Roger
8 8 7 6 6
Capt Koons Manager Mother Patron Woman
5 5 5 5 5
English Dave Jody Lance Waitress Young Man
4 4 4 4 4
Young Woman Fabienne Gawker #2 Jimmie Maynard
4 3 3 3 3
Pedestrian Preacher Raquel The Wolf Winston
3 3 3 3 3
Buddy Ed Sullivan Fourth Man Sportscaster #1 The Gimp
2 2 2 2 2
Zed Esmeralda Sportscaster #2
2 1 1
Degree Centrality in Weighted Graphs
Typically, undirected graphs record the interactions between actors in the networks as binary (1/0) ties. Nevertheless in many cases edges have a weight associated with them, recording the intensity of the interactions between pair of actors, resulting in a weighted graph.
In igraph we can check if the network in question is weighted by seeing whether there is a weight attribute associated with the edges in the graph:
[1] 1 1 1 1 1 1 1 1 2 1 1 3 1 1 1 1 2 1 1 1 1 1 2 1 2 1 1 2 2 1 1 1 1 1 3 1 4
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 6 4 3 3 1 1 2
[75] 1 1 3 9 1 3 2 6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
If the graph was unweighted, this call would return a NULL value. We can see however, that there are weights associated with each edge. In the Pulp Fiction network, the weights represent the number of scenes in the movie that two characters co-appeared in.
We can take a quick peek at how these edge weights look by creating an edge list data frame using the igraph function as_edgelist and then attaching the edge weight attribute:
And looking at the first twenty rows:
X1 X2 w
1 Brett Marsellus 1
2 Brett Marvin 1
3 Brett Roger 1
4 Brett Vincent 1
5 Buddy Mia 1
6 Buddy Vincent 1
7 Brett Butch 1
8 Butch Capt Koons 1
9 Butch Esmeralda 2
10 Butch Gawker #2 1
11 Butch Jules 1
12 Butch Marsellus 3
13 Butch Pedestrian 1
14 Butch Sportscaster #1 1
15 Butch English Dave 1
16 Brett Fabienne 1
17 Butch Fabienne 2
18 Fabienne Jules 1
19 Fourth Man Jules 1
20 Fourth Man Vincent 1
We can see, for instance, that Butch and Marsellus co-appear in three different scenes.
Vertex Strength
How do we incorporate edge weights into the calculation of degree centrality? One approach, proposed by Barrat et al. (2004) is to compute:
\[ C^{VS}_i = \sum_j w_{ij} \tag{3}\]
Where the weighted degree (Barrat et al. (2004) call it the vertex strength) of a node is just the sum of the weights of the edges incident to that node.
In R we can compute the vertex strength by calculating the row (or column) sums of the weighted adjacency matrix \(\mathbf{W}\) containing the weight of the edge between \(i\) and \(j\) in each cell \(w_{ij}\).
We can do that like this:
Vincent Jules Butch Mia Marsellus
51 31 23 16 15
The Wolf Honey Bunny Jimmie Lance Pumpkin
13 9 9 9 9
Brett Jody Marvin Roger Capt Koons
8 8 8 6 5
English Dave Manager Mother Patron Woman
5 5 5 5 5
Fabienne Waitress Young Man Young Woman Gawker #2
4 4 4 4 3
Maynard Pedestrian Preacher Raquel Sportscaster #1
3 3 3 3 3
Winston Buddy Ed Sullivan Esmeralda Fourth Man
3 2 2 2 2
Sportscaster #2 The Gimp Zed
2 2 2
Note that in igraph, we obtain the weighted adjacency matrix , by specifying "weight" as the value of the attr argument (for edge attribute) in the call to the as_adjacency matrix function.
Not surprisingly, the main characters in the story have the largest vertex strengths, as they tend to co-appear with other characters several times in the film.
The vertex strength approach is simple but it has the drawback of giving the same score to low degree nodes with large weights and high degree nodes with small weights (e.g., a node with two contacts each with a weight of 3.0 gets the same score as node with six contacts each with a weight of 1.0).
Generalized Degree
To deal with this issue, Opsahl, Agneessens, and Skvoretz (2010) propose that we instead compute:
\[ C^{WD}_i = \left(\sum_j w_{ij}\right)^{\alpha}\left(\sum_j a_{ij}\right)^{1-\alpha} \tag{4}\]
With the parameter \(\alpha\) chosen by the researcher, with the restriction that \(0 \geq \alpha \leq 1\). Note that in the right-hand side of the equation, the factor on the left is just vertex strength as defined in Equation 3 raised to the power of \(\alpha\), and the factor on the right is just the usual binary degree centrality, counting the number of alters ego is connected to, raised to the power of \(1-\alpha\).
The \(\alpha\) parameter thus controls the importance we give the number of contacts versus the weight of the edges in computing the degree. When \(\alpha = 1\) then Equation 4 reduces to Equation 3 the vertex strength; when \(\alpha = 0\) then Equation 4 reduces to Equation 1 the regular unweighted degree.
\(\alpha\) values between zero and one combine both quantities, with numbers closer to zero giving more weight to number of contacts, and numbers closer to one giving more importance to the weight of the edges.
Here’s function that computes the Opsahl, Agneessens, and Skvoretz (2010) generalized degree:
The function takes the graph and a value for \(\alpha\) as inputs and returns the vector of generalized degrees as output. In lines 2 and 3 we compute the vertex strength and the regular degree from the row sums of the weighted adjacency matrix and the regular adjacency matrix.
We can see that when we set \(\alpha = 0\) we get values identical to the degree:
Vincent Butch Jules Mia Marsellus Honey Bunny
25 17 16 11 10 8
Pumpkin Brett Marvin Roger
8 7 6 6
And when we set \(\alpha = 1\) we get values identical to the vertex strength:
Vincent Jules Butch Mia Marsellus The Wolf
51 31 23 16 15 13
Honey Bunny Jimmie Lance Pumpkin
9 9 9 9
Note that the top 10 ranks differ; while Vincent is the top character by either measure, Butch is second and Jules is third by degree, but their places switch when we rank them by vertex strength. Also, while the Wolf is not in the top 10 by degree, he is by vertex strength.
The interesting thing is what happens with other values. Let’s set \(\alpha = 0.6\):
Vincent Jules Butch Mia Marsellus Honey Bunny
38.3 23.8 20.4 13.8 12.8 8.6
Pumpkin Brett The Wolf Marvin
8.6 7.6 7.2 7.1
Which ends up being a compromise between the two top 10 lists, containing characters from both.
Degree Centrality in Directed Graphs
Indegree and Outdegree Centrality
The movie network is based on the relationship of co-appearance in a scene which by nature lacks any natural directionality (it’s a symmetric relation) and can therefore be represented in an undirected graph. The concepts of in and outdegree, by contrast, are only applicable to directed ties, which are represented by a directed graph. So to illustrate them, we need to switch to a different source of data.
We pick an advice network which is a classical directed kind of (asymmetric) relation. I can give advice to you, but that doesn’t necessarily mean you can give advice to me. The networkdata package contains one such data set collected in the late 80s early 1990s in a New England law firm (Lazega 2001) called law_advice:
+ 71/71 vertices, from d1a9da7:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
[51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
$status
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2
[39] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
$gender
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 1 1 1 1 2 1 1 1 2
[39] 2 1 1 1 2 2 1 2 1 2 1 1 2 1 1 1 1 1 2 1 2 2 2 1 1 2 1 1 2 1 2 1 2
$office
[1] 1 1 2 1 2 2 2 1 1 1 1 1 1 2 3 1 1 2 1 1 1 1 1 1 2 1 1 2 1 2 2 2 2 1 2 1 3 1
[39] 1 1 1 1 1 3 1 2 3 1 1 2 2 1 1 1 1 1 1 2 2 1 1 1 2 1 1 1 1 1 1 1 1
$seniority
[1] 31 32 13 31 31 29 29 28 25 25 23 24 22 1 21 20 23 18 19 19 17 9 16 15 15
[26] 15 13 11 10 7 8 8 8 8 8 5 5 7 6 6 5 4 5 5 3 3 3 1 4 3
[51] 4 4 10 3 3 3 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1
$age
[1] 64 62 67 59 59 55 63 53 53 53 50 52 57 56 48 46 50 45 46 49 43 49 45 44 43
[26] 41 47 38 38 39 34 33 37 36 33 43 44 53 37 34 31 31 47 53 38 42 38 35 36 31
[51] 29 29 38 29 34 38 33 33 30 31 34 32 29 45 28 43 35 26 38 31 26
$practice
[1] 1 2 1 2 1 1 2 1 2 2 1 2 1 2 2 2 2 1 2 1 1 1 1 1 2 1 1 2 2 1 1 1 1 2 2 1 2 1
[39] 1 1 1 2 1 2 2 2 1 2 1 2 1 1 2 1 1 1 1 1 1 2 2 2 2 2 1 1 1 1 1 2 1
$law_school
[1] 1 1 1 3 2 1 3 3 1 3 1 2 2 1 3 1 1 2 1 1 2 3 2 2 2 3 1 2 3 3 2 3 3 2 3 3 3 2
[39] 1 1 2 2 2 1 3 2 3 3 2 2 3 3 3 3 3 2 2 3 2 2 3 2 2 2 3 3 2 3 3 2 2
We can see that the graph has 71 vertices, and that there are various attributes associated with each vertex, like gender, age, seniority, status in the law firm, etc. We can query those attributes using the igraph function vertex_attr, which takes the graph object as input.
Note that in contrast to the Pulp Fiction network the vertices in this graph don’t have a name attribute. We can assign one like this:
Now each node is just going to be named “v_1”, “v_2” and so forth:
[1] "v_1" "v_2" "v_3" "v_4" "v_5" "v_6" "v_7" "v_8" "v_9" "v_10"
[11] "v_11" "v_12" "v_13" "v_14" "v_15" "v_16" "v_17" "v_18" "v_19" "v_20"
[21] "v_21" "v_22" "v_23" "v_24" "v_25" "v_26" "v_27" "v_28" "v_29" "v_30"
[31] "v_31" "v_32" "v_33" "v_34" "v_35" "v_36" "v_37" "v_38" "v_39" "v_40"
[41] "v_41" "v_42" "v_43" "v_44" "v_45" "v_46" "v_47" "v_48" "v_49" "v_50"
[51] "v_51" "v_52" "v_53" "v_54" "v_55" "v_56" "v_57" "v_58" "v_59" "v_60"
[61] "v_61" "v_62" "v_63" "v_64" "v_65" "v_66" "v_67" "v_68" "v_69" "v_70"
[71] "v_71"
To keep things manageable, we will restrict our analysis to partners. To do that we need to select the subgraph that only includes the vertices with value of 1 in the “status” vertex attribute. From the data description, we know the first 36 nodes (with value of 1 in the status attribute) are the law firm’s partners (the rest are associates). In igraph we can do this as using the subgraph function:
+ 36/36 vertices, named, from 3a2f38b:
[1] v_1 v_2 v_3 v_4 v_5 v_6 v_7 v_8 v_9 v_10 v_11 v_12 v_13 v_14 v_15
[16] v_16 v_17 v_18 v_19 v_20 v_21 v_22 v_23 v_24 v_25 v_26 v_27 v_28 v_29 v_30
[31] v_31 v_32 v_33 v_34 v_35 v_36
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
The first line just tells igraph to generate the subgraph containing the first 36 nodes (the partners). The subgraph function thus takes two main inputs: The graph object, and then a vector of node ids (or node labels) telling the function which nodes to select to create the node-induced subgraph.
As we saw in the lecture notes on basic statistics for directed graphs, in contrast to the undirected case where the row and column sums give you the same number (each node’s degree), in the directed case, the row and column sums give you two different sets of numbers.
The row sums provide each node’s outdegree centrality (number of outgoing links incident to each node), and the column sums provide the each nodes’s indegree centrality (number of incoming links incident to each node):
\[ C_i^{OUT}=\sum_j a_{ij} \tag{5}\]
\[ C_i^{IN}=\sum_i a_{ij} \tag{6}\]
Note that to compute \(C_i^{OUT}\) we sum across the columns (\(j\) subscript) and to compute \(C_i^{IN}\) we sum across the rows (\(i\) subscript).
In the law_advice network case, each directed tie indicates that the source node \(i\) seeks advice from the destination \(j\). This means that if you are high in in-degree centrality, you are an advice giver (perhaps indicating informal status or experience) and if you are high in the outdegree centrality you are advice seeker (perhaps indicating novice or subordinate status).
In R we can compute that in and out degrees using the colSums and rowSums matrix functions like this:
v_17 v_26 v_1 v_2 v_6 v_20 v_34 v_15 v_4 v_12
21 20 18 17 17 17 16 15 14 14
v_26 v_28 v_19 v_16 v_12 v_27 v_4 v_17 v_24 v_31
22 22 21 19 18 18 17 17 16 16
We can see from these results that \(v_{17}\), \(v_{26}\) and \(v_1\) are the top advice givers and \(v_{26}\), \(v_{28}\) and \(v_{19}\) are the top advice seekers.
We can also express the in (\(\mathbf{d}_i\)) and out (\(\mathbf{d}_o\)) degree vectors in matrix form:
\[ \mathbf{d}_i = \mathbf{A}^T\mathbf{e} \tag{7}\]
\[ \mathbf{d}_o = \mathbf{A}\mathbf{e} \tag{8}\]
Where \(\mathbf{e}\) is the all ones column vector of dimensions \(N \times 1\) with \(N\) equal to the number of nodes in the graph, and \(\mathbf{A}^T\) is the transpose of the adjacency matrix.
In R we can compute the in and outdegrees using matrix multiplication as follows:
e <- matrix(1, nrow(A), 1) #all ones column vector
d.i <- t(A) %*% e
d.o <- A %*% e
names(d.i) <- rownames(A)
names(d.o) <- rownames(A)
sort(d.i, decreasing = TRUE)[1:10] #top ten nodes by indegreev_17 v_26 v_1 v_2 v_6 v_20 v_34 v_15 v_4 v_12
21 20 18 17 17 17 16 15 14 14
v_26 v_28 v_19 v_16 v_12 v_27 v_4 v_17 v_24 v_31
22 22 21 19 18 18 17 17 16 16
Of course igraph has a dedicated function for this, which is just our old friend degree with an extra option mode, indicating whether you want the in or outdegrees:
Correlation Between Degree and Vertex Attributes
Note that the graph attributes are just vectors of values, and can be accessed from the graph object using the $ operator attached to the V() function as we did above.
So if we wanted to figure out the correlation between some vertex attribute and in or out degree centrality, all we need to do is correlate the two vectors:
Which tells us that at least in this case, younger partners are more sought after as sources of advice than older partners.