Similarity and Equivalence in Two-Mode Networks

1 Normalized Vertex Similarity Metrics

Note that the one-mode projections can be considered unnormalized similarity matrices just like in the case of regular networks. That means that if we have the degrees of nodes in each mode, we can transform this matrix into any of the normalized vertex similarity metrics we discussed before, including Jaccard, Cosine, Dice, LHN, and so on.

Let’s see how this would work in our trusty Southern Women dataset:

Code

library(igraph)
library(networkdata)
g <- southern_women
A <- as_biadjacency_matrix(g)

Repackaging our vertex similarity function for the two-mode case, we have:

Code

vertex.sim <- function(x) {
  A <- as.matrix(as_biadjacency_matrix(x))
  M <- nrow(A) # number of persons
  N <- ncol(A) # number of groups
  p.d <- rowSums(A) # person degrees
  g.d <- colSums(A) # group degrees
  P <- A %*% t(A) # person projection
  G <- t(A) %*% A # group projection
  J.p <- diag(1, M, M)
  J.g <- diag(1, N, N)
  C.p <- diag(1, M, M)
  C.g <- diag(1, N, N)
  D.p <- diag(1, M, M)
  D.g <- diag(1, N, N)
  L.p <- diag(1, M, M)
  L.g <- diag(1, N, N)
  for (i in 1:M) {
    for (j in 1:M) {
      if (i < j) {
        J.p[i, j] <- P[i, j] / (P[i, j] + p.d[i] + p.d[j])
        J.p[j, i] <- P[i, j] / (P[i, j] + p.d[i] + p.d[j])
        C.p[i, j] <- P[i, j] / (sqrt(p.d[i] * p.d[j]))
        C.p[j, i] <- P[i, j] / (sqrt(p.d[i] * p.d[j]))
        D.p[i, j] <- (2 * P[i, j]) / (2 * P[i, j] + p.d[i] + p.d[j])
        D.p[j, i] <- (2 * P[i, j]) / (2 * P[i, j] + p.d[i] + p.d[j])
        L.p[i, j] <- P[i, j] / (p.d[i] * p.d[j])
        L.p[j, i] <- P[i, j] / (p.d[i] * p.d[j])
      }
    }
  }
  for (i in 1:N) {
    for (j in 1:N) {
      if (i < j) {
        J.g[i, j] <- G[i, j] / (G[i, j] + g.d[i] + g.d[j])
        J.g[j, i] <- G[i, j] / (G[i, j] + g.d[i] + g.d[j])
        C.g[i, j] <- G[i, j] / (sqrt(g.d[i] * g.d[j]))
        C.g[j, i] <- G[i, j] / (sqrt(g.d[i] * g.d[j]))
        D.g[i, j] <- (2 * G[i, j]) / (2 * G[i, j] + g.d[i] + g.d[j])
        D.g[j, i] <- (2 * G[i, j]) / (2 * G[i, j] + g.d[i] + g.d[j])
        L.g[i, j] <- G[i, j] / (g.d[i] * g.d[j])
        L.g[j, i] <- G[i, j] / (g.d[i] * g.d[j])
      }
    }
  }
  return(list(
    J.p = J.p, C.p = C.p, D.p = D.p, L.p = L.p,
    J.g = J.g, C.g = C.g, D.g = D.g, L.g = L.g
  ))
}

Using this function to compute the Jaccard similarity between people yields:

Code

J.p <- vertex.sim(g)$J.p
rownames(J.p) <- rownames(A)
colnames(J.p) <- rownames(A)
round(J.p, 2)

          EVELYN LAURA THERESA BRENDA CHARLOTTE FRANCES ELEANOR PEARL RUTH
EVELYN      1.00  0.29    0.30   0.29      0.20    0.25    0.20  0.21 0.20
LAURA       0.29  1.00    0.29   0.30      0.21    0.27    0.27  0.17 0.21
THERESA     0.30  0.29    1.00   0.29      0.25    0.25    0.25  0.21 0.25
BRENDA      0.29  0.30    0.29   1.00      0.27    0.27    0.27  0.17 0.21
CHARLOTTE   0.20  0.21    0.25   0.27      1.00    0.20    0.20  0.00 0.20
FRANCES     0.25  0.27    0.25   0.27      0.20    1.00    0.27  0.22 0.20
ELEANOR     0.20  0.27    0.25   0.27      0.20    0.27    1.00  0.22 0.27
PEARL       0.21  0.17    0.21   0.17      0.00    0.22    0.22  1.00 0.22
RUTH        0.20  0.21    0.25   0.21      0.20    0.20    0.27  0.22 1.00
VERNE       0.14  0.15    0.20   0.15      0.11    0.11    0.20  0.22 0.27
MYRNA       0.14  0.08    0.14   0.08      0.00    0.11    0.11  0.22 0.20
KATHERINE   0.12  0.07    0.12   0.07      0.00    0.09    0.09  0.18 0.17
SYLVIA      0.12  0.12    0.17   0.12      0.08    0.08    0.15  0.17 0.21
NORA        0.11  0.12    0.16   0.12      0.08    0.08    0.14  0.15 0.14
HELEN       0.07  0.14    0.13   0.14      0.10    0.10    0.18  0.11 0.18
DOROTHY     0.17  0.10    0.17   0.10      0.00    0.14    0.14  0.29 0.25
OLIVIA      0.09  0.00    0.09   0.00      0.00    0.00    0.00  0.17 0.14
FLORA       0.09  0.00    0.09   0.00      0.00    0.00    0.00  0.17 0.14
          VERNE MYRNA KATHERINE SYLVIA NORA HELEN DOROTHY OLIVIA FLORA
EVELYN     0.14  0.14      0.12   0.12 0.11  0.07    0.17   0.09  0.09
LAURA      0.15  0.08      0.07   0.12 0.12  0.14    0.10   0.00  0.00
THERESA    0.20  0.14      0.12   0.17 0.16  0.13    0.17   0.09  0.09
BRENDA     0.15  0.08      0.07   0.12 0.12  0.14    0.10   0.00  0.00
CHARLOTTE  0.11  0.00      0.00   0.08 0.08  0.10    0.00   0.00  0.00
FRANCES    0.11  0.11      0.09   0.08 0.08  0.10    0.14   0.00  0.00
ELEANOR    0.20  0.11      0.09   0.15 0.14  0.18    0.14   0.00  0.00
PEARL      0.22  0.22      0.18   0.17 0.15  0.11    0.29   0.17  0.17
RUTH       0.27  0.20      0.17   0.21 0.14  0.18    0.25   0.14  0.14
VERNE      1.00  0.27      0.23   0.27 0.20  0.25    0.25   0.14  0.14
MYRNA      0.27  1.00      0.29   0.27 0.20  0.25    0.25   0.14  0.14
KATHERINE  0.23  0.29      1.00   0.32 0.26  0.21    0.20   0.11  0.11
SYLVIA     0.27  0.27      0.32   1.00 0.29  0.25    0.18   0.10  0.10
NORA       0.20  0.20      0.26   0.29 1.00  0.24    0.09   0.17  0.17
HELEN      0.25  0.25      0.21   0.25 0.24  1.00    0.12   0.12  0.12
DOROTHY    0.25  0.25      0.20   0.18 0.09  0.12    1.00   0.20  0.20
OLIVIA     0.14  0.14      0.11   0.10 0.17  0.12    0.20   1.00  0.33
FLORA      0.14  0.14      0.11   0.10 0.17  0.12    0.20   0.33  1.00

2 Structural Equivalence

And, of course, once we have a similarity we can cluster nodes based on approximate structural equivalence by transforming proximities to distances:

Code

D <- as.dist(1 - J.p)
hc.p <- hclust(D, method = "ward.D2")
plot(hc.p)

And for events:

Code

J.g <- vertex.sim(g)$J.g
rownames(J.g) <- colnames(A)
colnames(J.g) <- colnames(A)
D <- as.dist(1 - J.g)
hc.g <- hclust(D, method = "ward.D2")
plot(hc.g)

We can then derive cluster memberships for people and groups from the hclust object:

Code

library(dendextend)
clus.p <- sort(cutree(hc.p, 4)) # selecting four clusters for people
clus.p

   EVELYN     LAURA   THERESA    BRENDA CHARLOTTE   FRANCES   ELEANOR     PEARL 
        1         1         1         1         1         1         1         2 
     RUTH     VERNE   DOROTHY     MYRNA KATHERINE    SYLVIA      NORA     HELEN 
        2         2         2         3         3         3         3         3 
   OLIVIA     FLORA 
        4         4

Code

clus.g <- sort(cutree(hc.g, 3)) # selecting three clusters for groups
clus.g

 6/27   3/2  4/12  9/26  2/25  5/19  3/15  9/16   4/8  6/10  2/23   4/7 11/21 
    1     1     1     1     1     1     2     2     2     3     3     3     3 
  8/3 
    3

And finally we can block the original affiliation matrix, as recommended by Everett and Borgatti (2013, 210, table 5):

Code

library(ggcorrplot)
p <- ggcorrplot(t(A[names(clus.p), names(clus.g)]),
  colors = c("white", "white", "red")
)
p <- p + theme(
  legend.position = "none",
  axis.text.y = element_text(size = 8),
  axis.text.x = element_text(size = 8, angle = 0),
)
p <- p + scale_x_discrete(position = "top")
p <- p + geom_hline(yintercept = 7.5, linewidth = 2, color = "blue")
p <- p + geom_hline(yintercept = 11.5, linewidth = 2, color = "blue")
p <- p + geom_hline(yintercept = 16.5, linewidth = 2, color = "blue")
p <- p + geom_vline(xintercept = 6.5, linewidth = 2, color = "blue")
p <- p + geom_vline(xintercept = 9.5, linewidth = 2, color = "blue")
p

Which reveals a number of almost complete (one-blocks) and almost null (zero-blocks) in the social structure, with a reduced image matrix that looks like:

Code

library(kableExtra)
IM <- matrix(0, 4, 3)
IM[1, ] <- c(0, 1, 0)
IM[2, ] <- c(0, 1, 1)
IM[3, ] <- c(0, 1, 0)
IM[4, ] <- c(1, 1, 0)
rownames(IM) <- c("P.Block1", "P.Block2", "P.Block3", "P.Block4")
colnames(IM) <- c("E.Block1", "E.Block2", "E.Block3")
kbl(IM, format = "html", , align = "c") %>%
  column_spec(1, bold = TRUE) %>%
  kable_styling(
    full_width = TRUE,
    bootstrap_options = c("hover", "condensed", "responsive")
  )

	E.Block1	E.Block2	E.Block3
P.Block1	0	1	0
P.Block2	0	1	1
P.Block3	0	1	0
P.Block4	1	1	0

3 Generalized Vertex Similarity

Recall that vertex similarity works using the principle of structural equivalence: Two people are similar if the choose the same objects (groups), and two objects (groups) are similar if they are chosen by the same people.

We can, like we did in the one mode case, be after a more general version of similarity, which says that: Two people are similar if they choose similar (not necessarily the same) objects, and two objects are similar if they are chosen by similar (not necessarily the same) people.

This leads to the same problem setup that inspired the SimRank approach (Jeh and Widom 2002).

A (longish) function to compute the SimRank similarity between nodes in a two mode network goes as follows:

Code

TM.SimRank <- function(A, C = 0.8, iter = 10) {
  nr <- nrow(A)
  nc <- ncol(A)
  dr <- rowSums(A)
  dc <- colSums(A)
  Sr <- diag(1, nr, nr) # baseline similarity: every node maximally similar to themselves
  Sc <- diag(1, nc, nc) # baseline similarity: every node maximally similar to themselves
  rn <- rownames(A)
  cn <- colnames(A)
  rownames(Sr) <- rn
  colnames(Sr) <- rn
  rownames(Sc) <- cn
  colnames(Sc) <- cn
  m <- 1
  while (m < iter) {
    Sr.pre <- Sr
    Sc.pre <- Sc
    for (i in 1:nr) {
      for (j in 1:nr) {
        if (i != j) {
          a <- names(which(A[i, ] == 1)) # objects chosen by i
          b <- names(which(A[j, ] == 1)) # objects chosen by j
          Scij <- 0
          for (k in a) {
            for (l in b) {
              Scij <- Scij + Sc[k, l] # i's similarity to j
            }
          }
          Sr[i, j] <- C / (dr[i] * dr[j]) * Scij
        }
      }
    }
    for (i in 1:nc) {
      for (j in 1:nc) {
        if (i != j) {
          a <- names(which(A[, i] == 1)) # people who chose object i
          b <- names(which(A[, j] == 1)) # people who chose object j
          Srij <- 0
          for (k in a) {
            for (l in b) {
              Srij <- Srij + Sr[k, l] # i's similarity to j
            }
          }
          Sc[i, j] <- C / (dc[i] * dc[j]) * Srij
        }
      }
    }
    m <- m + 1
  }
  return(list(Sr = Sr, Sc = Sc))
}

This function takes the biadjacency matrix \(\mathbf{A}\) as input and returns two generalized relational similarity matrices: One for the people (row objects) and the other one for the groups (column objects).

Here’s how that would work in the SW data. First we compute the SimRank scores:

Code

sim.res <- TM.SimRank(A)

Then we peek inside the people similarity matrix:

Code

round(sim.res$Sr[1:10, 1:10], 3)

          EVELYN LAURA THERESA BRENDA CHARLOTTE FRANCES ELEANOR PEARL  RUTH
EVELYN     1.000 0.267   0.262  0.266     0.259   0.275   0.248 0.255 0.237
LAURA      0.267 1.000   0.262  0.277     0.270   0.287   0.280 0.237 0.247
THERESA    0.262 0.262   1.000  0.262     0.273   0.270   0.264 0.254 0.256
BRENDA     0.266 0.277   0.262  1.000     0.290   0.287   0.279 0.235 0.246
CHARLOTTE  0.259 0.270   0.273  0.290     1.000   0.276   0.269 0.175 0.256
FRANCES    0.275 0.287   0.270  0.287     0.276   1.000   0.305 0.280 0.256
ELEANOR    0.248 0.280   0.264  0.279     0.269   0.305   1.000 0.279 0.294
PEARL      0.255 0.237   0.254  0.235     0.175   0.280   0.279 1.000 0.279
RUTH       0.237 0.247   0.256  0.246     0.256   0.256   0.294 0.279 1.000
VERNE      0.201 0.207   0.222  0.206     0.198   0.202   0.246 0.276 0.288
          VERNE
EVELYN    0.201
LAURA     0.207
THERESA   0.222
BRENDA    0.206
CHARLOTTE 0.198
FRANCES   0.202
ELEANOR   0.246
PEARL     0.276
RUTH      0.288
VERNE     1.000

And the group similarity matrix:

Code

round(sim.res$Sc[1:10, 1:10], 3)

      6/27   3/2  4/12  9/26  2/25  5/19  3/15  9/16   4/8  6/10
6/27 1.000 0.343 0.314 0.312 0.287 0.277 0.224 0.226 0.178 0.137
3/2  0.343 1.000 0.312 0.311 0.285 0.276 0.224 0.228 0.200 0.141
4/12 0.314 0.312 1.000 0.314 0.288 0.265 0.226 0.220 0.179 0.138
9/26 0.312 0.311 0.314 1.000 0.287 0.256 0.230 0.214 0.186 0.137
2/25 0.287 0.285 0.288 0.287 1.000 0.260 0.235 0.226 0.187 0.146
5/19 0.277 0.276 0.265 0.256 0.260 1.000 0.224 0.226 0.200 0.171
3/15 0.224 0.224 0.226 0.230 0.235 0.224 1.000 0.221 0.204 0.209
9/16 0.226 0.228 0.220 0.214 0.226 0.226 0.221 1.000 0.221 0.214
4/8  0.178 0.200 0.179 0.186 0.187 0.200 0.204 0.221 1.000 0.234
6/10 0.137 0.141 0.138 0.137 0.146 0.171 0.209 0.214 0.234 1.000

Like before we can use these results to define two sets of distances:

Code

D.p <- as.dist(1 - sim.res$Sr)
D.g <- as.dist(1 - sim.res$Sc)

Subject to hierarchical clustering:

Code

hc.p <- hclust(D.p, method = "ward.D2")
hc.g <- hclust(D.g, method = "ward.D2")

And plot:

Code

plot(hc.p)

Code

plot(hc.g)

Get cluster memberships for people and groups from the hclust object:

Code

clus.p <- sort(cutree(hc.p, 4)) # selecting four clusters for people
clus.p

   EVELYN     LAURA   THERESA    BRENDA CHARLOTTE   FRANCES   ELEANOR     PEARL 
        1         1         1         1         1         1         1         2 
     RUTH     VERNE   DOROTHY     MYRNA KATHERINE    SYLVIA      NORA     HELEN 
        2         2         2         3         3         3         3         3 
   OLIVIA     FLORA 
        4         4

Code

clus.g <- sort(cutree(hc.g, 3)) # selecting three clusters for groups
clus.g

 6/27   3/2  4/12  9/26  2/25  5/19  3/15  9/16   4/8  2/23  6/10   4/7 11/21 
    1     1     1     1     1     1     2     2     2     2     3     3     3 
  8/3 
    3

And block the biadjacency matrix:

Code

p <- ggcorrplot(t(A[names(clus.p), names(clus.g)]),
  colors = c("white", "white", "red")
)
p <- p + theme(
  legend.position = "none",
  axis.text.y = element_text(size = 8),
  axis.text.x = element_text(size = 8, angle = 0),
)
p <- p + scale_x_discrete(position = "top")
p <- p + geom_hline(yintercept = 7.5, linewidth = 2, color = "blue")
p <- p + geom_hline(yintercept = 11.5, linewidth = 2, color = "blue")
p <- p + geom_hline(yintercept = 16.5, linewidth = 2, color = "blue")
p <- p + geom_vline(xintercept = 6.5, linewidth = 2, color = "blue")
p <- p + geom_vline(xintercept = 10.5, linewidth = 2, color = "blue")
p

Note that this block solution is similar (pun intended) but not exactly the same as the one based on structural equivalence we obtained earlier, although it would lead to the same reduced image matrix for the blocks.

References

Everett, Martin G, and Stephen P Borgatti. 2013. “The Dual-Projection Approach for Two-Mode Networks.” Social Networks 35 (2): 204–10.

Jeh, Glen, and Jennifer Widom. 2002. “Simrank: A Measure of Structural-Context Similarity.” Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 538–43.