Fitness and Complexity in Two Mode Networks

The Fitness/Complexity Score

In a highly cited piece, Tacchella et al. (2012) introduce a new prestige metric for two-mode networks that relies on the same “prismatic” model of status distribution we considered before. They called the prestige metrics they obtained “fitness” and “complexity” because they developed in the empirical context of calculating metrics for ranking nations based on their competitive advantage in exporting products, which means analyzing a two-mode country-by-product matrix (Hidalgo and Hausmann 2009).

However, when considered in the more general context of two-mode network link analysis, it is clear that their approach is a prestige metric for two-mode networks that combines ideas from Bonacich Eigenvector scoring and PageRank scoring that we covered in the two-mode prestige lesson.

Their basic idea is that when we are (asymmetrically) interested in determining the status or prestige of nodes in one particular mode (e.g., the row-mode nodes), we should not use summaries (e.g., sums or averages) of the scores for nodes in the other (e.g., column) mode in determining their status. Instead, we should deeply discount those nodes that connect to low status nodes in the other end.

To understand what they are getting at, it helps to write down the Bonacich prestige scoring in equation form, as we go through each iteration of the status distribution game:

If you remember from the function tm.status, each iteration \(q\), the vector of status scores for the row nodes \(\mathbf{s}^R\) and the column nodes \(\mathbf{s}^C\) is given by:

\[ s^R_i(q) = \sum_j\mathbf{A}_{ij}s^C_j(q-1) \tag{1}\]

\[ s^C_j(q) = \sum_i\mathbf{A}_{ij}s^R_i(q-1) \tag{2}\]

Where \(\mathbf{A}\) is the two-mode network’s biadjacency matrix, and with the restriction that at the initial step \(\mathbf{s}(0)^C = \mathbf{1}\) where \(\mathbf{1}\) is the all ones vector of length equals to the number of columns of the biadjacency matrix \(\mathbf{A}\).

At each iteration \(q > 0\) we normalize both score vectors:

\[ \mathbf{s}^R(q) = \frac{\mathbf{s}^R(q)}{\left\langle\mathbf{s}^R(q)\right\rangle} \tag{3}\]

\[ \mathbf{s}^C(q) = \frac{\mathbf{s}^C(q)}{\left\langle\mathbf{s}^C(q)\right\rangle} \tag{4}\]

Where \(\left\langle \mathbf{s} \right\rangle\) is the Euclidean vector norm, and the continue iterating until the differences between the vectors across successive iterations is minimal.

So far, this is what we covered before. What Tacchella et al. (2012) propose is to substitute Equation 2 above with:

\[ s^C_j(q) = \left[\sum_i\mathbf{A}_{ij}\left(s^R_i(q-1)\right)^{-1}\right]^{-1} \tag{5}\]

Which means that first (inner parentheses) we take the reciprocal of the row-mode nodes’ status scores, sum them across column-mode nodes (such that column-mode nodes that connect to low status row-mode nodes get a big score), and then take the reciprocal of the reciprocal to get back to a measure of status for column-mode nodes. This non-linear transformation heavily discounts the status scores assigned to column-mode nodes whenever they connect to lower status row-mode nodes.

Beyond Status as Popularity/Activity

How should we understand this modification? Recall that the basic principle of standard Bonacich prestige scoring is based on the equation of status/prestige and popularity/activity. In the canonical case of persons and groups (Breiger 1974), an event receives status from being attended by high-status individuals and an individual receives status from being affiliated with a high status event; in each case, status from the point of view of the event means having highly active members, and from the point of view of the individual it means being affiliated with popular events.

But status may not always work this way. Consider the world-economic network linking countries to the products they have a competitive advantage in producing (Hidalgo and Hausmann 2009). Analysts noticed that the most developed countries produce both “complex” (i.e., high status) products that only a select few of other highly developed economies produce (like semiconductors) and also less “complex” (i.e., low status, like extractive natural resources) products that the other less developed economics produce (Tacchella et al. 2012). That means that the “complexity” (i.e., status score) of a product cannot be derived simply taking a summary (e.g., sum or average) of the status score of the countries that produce it, because high status countries engage in both high and low status forms of production. However, knowing that a product is produced by a low-status country is more informative (and should weigh more significantly in the determination of a product’s status score) because low-status countries only produce low-status products.

Applying the same reasoning to the aforementioned case of persons and groups (Breiger 1974), an equivalent situation would go as follows. Imagine there is a set of elite women and a set of elite events that only the elite women attend. However, elite women are also endowed with a spirit of noblesse oblige, which means that the most elite of them also attend non-elite events. This means that when determining the status of the events it is not very informative to know that elites affiliate with them; rather, we should weigh more heavily whether non-elites affiliate with an event in determining an event’s status, such that as the number of non-elite women who affiliate with an event increases, a given event’s status is downgraded in a non-linear way which feeds back into the computation of each woman’s prestige.

   library(networkdata)
   library(igraph)
   g <- southern_women
   A <- as.matrix(as_biadjacency_matrix(g))

And here’s a function called tm.fitness that modifies the old two-mode status distribution game function we played before to compute the fitness and complexity prestige scores for persons and groups:

   tm.fitness <- function(w) {
      y <- matrix(1, ncol(w), 1) #initial group status column vector set to a constant
      z <- t(w)
      epsilon <- 1 
      k <- 0
      while (epsilon > 1e-15) {
         o.y <- y 
         x <- w %*% o.y #fitness status scores for people
         x <- x/mean(x) #normalizing new people status scores 
         y <- (z %*% x^-1)^-1 #complexity status scores for groups
         y <- y/mean(y) #normalizing new group status scores 
         if (k > 1) {
            epsilon <- abs(sum(abs(y) - abs(o.y))) 
            }
         k <- k + 1
         }
   return(list(p.s = x, g.s = y, k = k))
   }

And we apply it to the SW data:

    fc <- tm.fitness(A)

We also calculate the usual Bonacich eigenvector scores for comparison purposes:

   eig.p <- eigen(A %*% t(A))
   eig.g <- eigen(t(A) %*% A)
   p.s <- eig.p$vector[, 1] * -1
   g.s <- eig.g$vector[, 1] * -1
   names(p.s) <- rownames(A)
   names(g.s) <- colnames(A)

And we put them in a table. Here are the people:

Table 1: Status Scores for Persons
Person Bonacich Fitness/Complexity
EVELYN 0.903 1.000
NORA 0.712 0.911
LAURA 0.834 0.854
SYLVIA 0.748 0.833
KATHERINE 0.594 0.792
THERESA 1.000 0.744
BRENDA 0.845 0.720
CHARLOTTE 0.454 0.355
HELEN 0.542 0.286
MYRNA 0.504 0.214
FRANCES 0.564 0.214
ELEANOR 0.616 0.160
VERNE 0.589 0.149
RUTH 0.637 0.128
PEARL 0.486 0.086
OLIVIA 0.188 0.066
FLORA 0.188 0.066
DOROTHY 0.355 0.037
* Scores normalized by dividing by the maximum.

And the groups:

Table 2: Status Scores for Groups.
Group Bonacich Fitness/Complexity
3/2 0.297 1.000
6/27 0.280 0.988
11/21 0.223 0.987
8/3 0.223 0.987
9/26 0.347 0.537
6/10 0.336 0.300
4/12 0.499 0.284
4/7 0.400 0.191
2/25 0.635 0.133
5/19 0.647 0.123
3/15 0.757 0.106
2/23 0.177 0.100
9/16 1.000 0.044
4/8 0.749 0.037
* Scores normalized by dividing by the maximum.

Each table sorts persons and groups according to the fitness/complexity score. We can see that the status order changes once we introduce the fitness/complexity mode of scoring. While {Theresa} is the top person according to the usual dual Bonacich prestige score, once we heavily discount the status of events that include low status people, {Evelyn} becomes the top person, with {Theresa} dropping to the sixth spot. In the same way while {Nora} is ranked sixth by the Bonacich prestige, her standing improves to second in the fitness scoring.

The status of groups changes even more dramatically once complexity is calculated by heavily discounting the status of groups that include lower status people. While {9/16} is the top event by the usual eigenvector scoring, this event has minimal status according to the complexity scoring ending up second from the bottom. Instead, the top even by complexity is {3/2} a relatively low-status even according to the Bonacich score. In fact, all of the other top events according to the complexity scoring, were ranked minimally by the Bonacich scoring, except for event {2/23}, which is a low status event on both accountings. This means that the Bonacich prestige and complexity scores for events have a strong negative correlation (r = -0.68). This is different from the person ranks, which agree more closely (r = 0.78).

   tm.gen.fitness <- function(w, delta = 1, gamma = 1, iter = 1000) {
      y <- matrix(1, ncol(w), 1) #initial group status column vector set to a constant
      z <- t(w)
      k <- 0
      while (k < iter) {
         o.y <- y 
         x <- (w %*% o.y^-delta)^-(1/delta) #fitness status scores for people
         x <- x/mean(x) #normalizing new people status scores 
         y <- (z %*% x^-gamma)^-(1/gamma) #complexity status scores for groups
         y <- y/mean(y) #normalizing new group status scores 
         k <- k + 1
         }
   return(list(p.s = x, g.s = y, k = k))
   }
   fc1 <- tm.gen.fitness(A, delta = -1, gamma = -1)
   fc2 <- tm.gen.fitness(A, delta = -1, gamma = 1)
   fc3 <- tm.gen.fitness(A, delta = 1, gamma = -1)
   fc4 <- tm.gen.fitness(A, delta = 2, gamma = 0.5)
                fc1         fc2         fc3       fc4
THERESA   1.0000000 0.669147044 0.008893852 0.4261160
EVELYN    0.9033105 1.000000000 0.005951295 0.4530310
BRENDA    0.8446830 0.666303533 0.008931808 0.4883959
LAURA     0.8344714 0.841534959 0.007071952 0.4888818
SYLVIA    0.7479813 0.759149640 0.007839423 0.4473089
NORA      0.7121701 0.776947977 0.007659837 0.6469804
RUTH      0.6369764 0.049414054 0.120437296 0.4418250
ELEANOR   0.6162130 0.065996969 0.090175278 0.4943145
KATHERINE 0.5944761 0.741487639 0.008026155 0.4790866
VERNE     0.5893687 0.055131552 0.107947171 0.4496324
FRANCES   0.5639176 0.126239496 0.047142893 0.5322269
HELEN     0.5415181 0.110118995 0.054044219 0.5211563
MYRNA     0.5040700 0.090985628 0.065409177 0.4796601
PEARL     0.4858270 0.025798717 0.230681825 0.4713670
CHARLOTTE 0.4539402 0.294229824 0.020226689 1.0000000
DOROTHY   0.3546904 0.006975837 0.853129872 0.4869986
OLIVIA    0.1877427 0.005951295 1.000000000 0.8693094
FLORA     0.1877427 0.005951295 1.000000000 0.8693094

kbl(s.tab, format = “pipe”, digits = 3, align = c(“l”, “c”, “c”, “c”, “c”), col.names = c(“Person”, “Bonacich”, “Fitness (C)”, “Fitness (R)”, “Fitness (RC)”)) %>% kable_styling(bootstrap_options = c(“hover”, “condensed”, “responsive”)) %>% column_spec(1, bold = TRUE) |> footnote(symbol = c(“Scores normalized by dividing by the maximum.”))

References

Breiger, Ronald L. 1974. “The Duality of Persons and Groups.” Social Forces 53 (2): 181–90.
Hidalgo, César A, and Ricardo Hausmann. 2009. “The Building Blocks of Economic Complexity.” Proceedings of the National Academy of Sciences 106 (26): 10570–75.
Tacchella, Andrea, Matthieu Cristelli, Guido Caldarelli, Andrea Gabrielli, and Luciano Pietronero. 2012. “A New Metrics for Countries’ Fitness and Products’ Complexity.” Scientific Reports 2 (1): 723.