Graph Degree Metrics in Directed Networks
As we saw in the Basic Network Statistics Handout, the degree distribution and the degree correlation are two basic things we want to have a sense of when characterizing a network, and we showed examples for the undirected graph case.
As we also saw on the Centrality handout, the number of things you have to compute in terms of degrees “doubles” in the directed graph case; for instance, instead of a single degree set and sequence, now we have two: An out and an indegree set and sequence.
The same thing applies to the degree distribution and the degree correlation.
In and Out Degree Distributions
In the case of the degree distribution, now we have two distributions: An outdegree distribution and an indegree distribution.
Let’s see an example using the law_advice
data.
So the main complication is that now we have to specify a value for the “mode” argument; “in” for indegree and “out” for outdegree.
That also means that when plotting, we have to create two data frames and present two plots.
First the data frames:
i.d <- degree(g, mode = "in")
o.d <- degree(g, mode = "out")
i.d.vals <- c(0:max(i.d))
o.d.vals <- c(0:max(o.d))
i.deg.dist <- data.frame(i.d.vals, i.prop)
o.deg.dist <- data.frame(o.d.vals, o.prop)
head(i.deg.dist)
i.d.vals i.prop
1 0 0.01408451
2 1 0.02816901
3 2 0.07042254
4 3 0.02816901
5 4 0.08450704
6 5 0.02816901
o.d.vals o.prop
1 0 0.01408451
2 1 0.00000000
3 2 0.01408451
4 3 0.05633803
5 4 0.04225352
6 5 0.04225352
Now, to plotting. To be effective, the resulting plot has to show the outdegree and indegree distribution side by side so as to allow the reader to compare. To do that, we first generate each plot separately:
library(ggplot2)
p <- ggplot(data = o.deg.dist, aes(x = o.d.vals, y = o.prop))
p <- p + geom_bar(stat = "identity", fill = "red", color = "red")
p <- p + theme_minimal()
p <- p + labs(x = "", y = "Proportion",
title = "Outdegree Distribution in Law Advice Network")
p <- p + geom_vline(xintercept = mean(o.d),
linetype = 2, linewidth = 0.75, color = "blue")
p1 <- p + scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40)) + xlim(0, 40)
p <- ggplot(data = i.deg.dist, aes(x = i.d.vals, y = i.prop))
p <- p + geom_bar(stat = "identity", fill = "red", color = "red")
p <- p + theme_minimal()
p <- p + labs(x = "", y = "Proportion",
title = "Indegree Distribution in Law Advice Network")
p <- p + geom_vline(xintercept = mean(i.d),
linetype = 2, linewidth = 0.75, color = "blue")
p2 <- p + scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40)) + xlim(0, 40)
Then we use the magical package patchwork
to combine the plots:
The data clearly shows that while both distributions are skewed, the indegree distribution is more heterogeneous, with a larger proportion of nodes in the high end of receiving advice as compared to giving advice.
Note also that since the mean degree is the same regardless of whether we use the out or indegree distribution, then the blue line pointing to the mean degree falls in the same spot on the x-axis for both plots.
In and Out Degree Correlations
The same doubling (really quadrupling) happens to degree correlations in directed graphs. While in an undirected graph, there is a single degree correlation, in the directed case we have four quantities to compute: The out-out degree correlation, the in-in degree correlation, the out-in degree correlation, and the in-out degree correlation (see here, p. 38).
To proceed, we need to create an edge list data set with six columns: The node id of the “from” node, the node id of the “to” node, the indegree of the “from” node, the outdegree of the “from” node, the indegree of the “to” node, and the outdegree of the “to” node.
We can adapt the code we used for the undirected case for this purpose. First, we create an edge list data frame using the igraph
function as_data_frame
:
fr to
1 1 2
2 1 17
3 1 20
4 2 1
5 2 6
6 2 17
Note that we have to specify that this is in an igraph
function by typing igraph::
in front of the as_data_frame
because there is an (older) dplyr
function with the same name that was used for data wrangling.
Second, we create data frames containing the in and outdegrees of each node in the network:
Third, we merge this info into the edge list data frame to get the in and outdegrees of the from and to nodes in the directed edge:
d.el <- g.el %>%
left_join(deg.dat.fr) %>%
rename(o.d.fr = o.d, i.d.fr = i.d) %>%
left_join(deg.dat.to, by = "to") %>%
rename(o.d.to = o.d, i.d.to = i.d)
head(d.el)
fr to o.d.fr i.d.fr o.d.to i.d.to
1 1 2 3 22 7 23
2 1 17 3 22 21 26
3 1 20 3 22 11 22
4 2 1 7 23 3 22
5 2 6 7 23 0 21
6 2 17 7 23 21 26
Now we can compute the four different flavors of the degree correlation for directed graphs:
[1] -0.0054
[1] 0.187
[1] -0.0283
[1] -0.0843
These results tell us that there is not much degree assortativity going on in the law advice network, except for a slight tendency of people who receive advice from lots of others to give advice to people who also receive advice from a lot of other people (the “in-in” correlation)
Note that by default, the assortativity_degree
function in igraph
only returns the out-in correlation for directed graphs:
That is, assortativity_degree
checks if more active senders are more likely to send ties to people who are popular receivers of ties.