Graph Degree Metrics in Directed Networks

As we saw in the Basic Network Statistics Handout, the degree distribution and the degree correlation are two basic things we want to have a sense of when characterizing a network, and we showed examples for the undirected graph case.

As we also saw on the Centrality handout, the number of things you have to compute in terms of degrees “doubles” in the directed graph case; for instance, instead of a single degree set and sequence, now we have two: An out and an indegree set and sequence.

The same thing applies to the degree distribution and the degree correlation.

In and Out Degree Distributions

In the case of the degree distribution, now we have two distributions: An outdegree distribution and an indegree distribution.

Let’s see an example using the law_advice data.

   library(networkdata)
   library(igraph)
   g <- law_advice
   i.prop <- degree_distribution(g, mode = "in")
   o.prop <- degree_distribution(g, mode = "out")

So the main complication is that now we have to specify a value for the “mode” argument; “in” for indegree and “out” for outdegree.

That also means that when plotting, we have to create two data frames and present two plots.

First the data frames:

   i.d <- degree(g, mode = "in")
   o.d <- degree(g, mode = "out")
   i.d.vals <- c(0:max(i.d))
   o.d.vals <- c(0:max(o.d))
   i.deg.dist <- data.frame(i.d.vals, i.prop)
   o.deg.dist <- data.frame(o.d.vals, o.prop)
   head(i.deg.dist)
  i.d.vals     i.prop
1        0 0.01408451
2        1 0.02816901
3        2 0.07042254
4        3 0.02816901
5        4 0.08450704
6        5 0.02816901
   head(o.deg.dist)
  o.d.vals     o.prop
1        0 0.01408451
2        1 0.00000000
3        2 0.01408451
4        3 0.05633803
5        4 0.04225352
6        5 0.04225352

Now, to plotting. To be effective, the resulting plot has to show the outdegree and indegree distribution side by side so as to allow the reader to compare. To do that, we first generate each plot separately:

   library(ggplot2)
   p <- ggplot(data = o.deg.dist, aes(x = o.d.vals, y = o.prop))
   p <- p + geom_bar(stat = "identity", fill = "red", color = "red")
   p <- p + theme_minimal()
   p <- p + labs(x = "", y = "Proportion", 
                 title = "Outdegree Distribution in Law Advice Network") 
   p <- p + geom_vline(xintercept = mean(o.d), 
                       linetype = 2, linewidth = 0.75, color = "blue")
   p1 <- p + scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40)) + xlim(0, 40)

   p <- ggplot(data = i.deg.dist, aes(x = i.d.vals, y = i.prop))
   p <- p + geom_bar(stat = "identity", fill = "red", color = "red")
   p <- p + theme_minimal()
   p <- p + labs(x = "", y = "Proportion", 
                 title = "Indegree Distribution in Law Advice Network") 
   p <- p + geom_vline(xintercept = mean(i.d), 
                       linetype = 2, linewidth = 0.75, color = "blue")
   p2 <- p + scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40)) + xlim(0, 40)

Then we use the magical package patchwork to combine the plots:

   # install.packages("patchwork")
   library(patchwork)
   p <- p1 / p2
   p

The data clearly shows that while both distributions are skewed, the indegree distribution is more heterogeneous, with a larger proportion of nodes in the high end of receiving advice as compared to giving advice.

Note also that since the mean degree is the same regardless of whether we use the out or indegree distribution, then the blue line pointing to the mean degree falls in the same spot on the x-axis for both plots.

In and Out Degree Correlations

The same doubling (really quadrupling) happens to degree correlations in directed graphs. While in an undirected graph, there is a single degree correlation, in the directed case we have four quantities to compute: The out-out degree correlation, the in-in degree correlation, the out-in degree correlation, and the in-out degree correlation (see here, p. 38).

To proceed, we need to create an edge list data set with six columns: The node id of the “from” node, the node id of the “to” node, the indegree of the “from” node, the outdegree of the “from” node, the indegree of the “to” node, and the outdegree of the “to” node.

We can adapt the code we used for the undirected case for this purpose. First, we create an edge list data frame using the igraph function as_data_frame:

   library(dplyr)
   g.el <- igraph::as_data_frame(g) %>% 
      rename(fr = from)
   head(g.el)
  fr to
1  1  2
2  1 17
3  1 20
4  2  1
5  2  6
6  2 17

Note that we have to specify that this is in an igraph function by typing igraph:: in front of the as_data_frame because there is an (older) dplyr function with the same name that was used for data wrangling.

Second, we create data frames containing the in and outdegrees of each node in the network:

    deg.dat.fr <- data.frame(fr = 1:vcount(g), o.d, i.d)
    deg.dat.to <- data.frame(to = 1:vcount(g), o.d, i.d)

Third, we merge this info into the edge list data frame to get the in and outdegrees of the from and to nodes in the directed edge:

   d.el <- g.el %>% 
      left_join(deg.dat.fr) %>% 
      rename(o.d.fr = o.d, i.d.fr = i.d) %>% 
      left_join(deg.dat.to, by = "to") %>% 
      rename(o.d.to = o.d, i.d.to = i.d) 
   head(d.el)
  fr to o.d.fr i.d.fr o.d.to i.d.to
1  1  2      3     22      7     23
2  1 17      3     22     21     26
3  1 20      3     22     11     22
4  2  1      7     23      3     22
5  2  6      7     23      0     21
6  2 17      7     23     21     26

Now we can compute the four different flavors of the degree correlation for directed graphs:

   round(cor(d.el$o.d.fr, d.el$o.d.to), 4) #out-out correlation
[1] -0.0054
   round(cor(d.el$i.d.fr, d.el$i.d.to), 4) #in-in correlation
[1] 0.187
   round(cor(d.el$i.d.fr, d.el$o.d.to), 4) #in-out correlation
[1] -0.0283
   round(cor(d.el$o.d.fr, d.el$i.d.to), 4) #out-in correlation
[1] -0.0843

These results tell us that there is not much degree assortativity going on in the law advice network, except for a slight tendency of people who receive advice from lots of others to give advice to people who also receive advice from a lot of other people (the “in-in” correlation)

Note that by default, the assortativity_degree function in igraph only returns the out-in correlation for directed graphs:

   round(assortativity_degree(g, directed = TRUE), 4)
[1] -0.0843

That is, assortativity_degree checks if more active senders are more likely to send ties to people who are popular receivers of ties.