Analyzing Ego Networks

Ego Networks and Ego Network Data

An ego-network, is just a subgraph of a larger network that includes a node of interest (“ego”), all of the connections between ego and their neighbors (called “alters”) and usually all of the connections between each of the alters.

Ego network data is social network data collected in such way (e.g., using standard social survey techniques) that you capture the ego networks of some set of people, usually a convenience sample or, more rarely, a probability sample of some population.

Once you have ego network data you can analyze each ego graph using the standard techniques we learned so far (if you are only interested in the structural characteristics of the ego graph).

If you have attributes on each alter, you can alternatively compute measures of diversity to or homophily to get a sense of how likely ego is to connect to similar or diverse others.

Structural Measures

The Clustering Coefficient

Perhaps the most basic structural characteristic of an ego network is the density of the subgraph formed by all of the connections between the alters. This is called ego’s clustering coefficient.

Let’s see how it works. First we load up the New Hope Star Wars social network included in the `networkdata`` package (Gabasova 2016):

   library(networkdata)
   g <- starwars[[4]]

As we said an ego graph is just a subgraph centered on a particular actor. So R2-D2’s ego graph is just:

   library(igraph)
   N <- neighbors(g, "R2-D2")
   x <- subgraph(g, c("R2-D2", names(N)))

And we can just plot it like we would any igraph object:

   V(x)$color <- c(1, rep(2, length(N)))
   plot(x, 
     vertex.size=10, vertex.frame.color="lightgray", 
     vertex.label.dist=2, 
     layout = layout_(x, as_star()),
     vertex.label.cex = 1.25, edge.color = "lightgray")

R2-D2’s Ego Network.

Note that we use the as_star() option for the layout, so that the ego is put in the center of the star graph surrounded by their alters.

R2-D2’s clustering coefficient is just the density of the graph that includes only the alters:

   x.alters <- x - vertex("R2-D2")
   C <- round(edge_density(x.alters), 2)
   C
[1] 0.67

The clustering coefficient \(C_i\) for an ego \(i\) ranges from zero to one. \(C = 0\) means that none of ego’s alters are connected to one another and \(C = 1\) means that all of ego’s alters are connected to one another. In this case, \(C = 0.67\) means that 67% of R2-D2’s alters are connected (co-appear in scenes) with one another.

Typically we would want to compute the clustering coefficient of every node in a graph. This can be done using our trusty lapply and sapply meta-functions:

   create.ego <- function(x, w) {
      alter.net <- subgraph(w, neighbors(w, x))
      return(alter.net)
      }
   ego.graphs <- lapply(V(g)$name, create.ego, w = g)
   head(ego.graphs)
[[1]]
IGRAPH 487f306 UNW- 9 24 -- Episode IV – A New Hope
+ attr: name (g/c), name (v/c), height (v/n), mass (v/n), hair_color
| (v/c), skin_color (v/c), eye_color (v/c), birth_year (v/n), sex
| (v/c), homeworld (v/c), species (v/c), weight (e/n)
+ edges from 487f306 (vertex names):
 [1] CHEWBACCA--C-3PO   CHEWBACCA--LUKE    C-3PO    --LUKE    C-3PO    --BIGGS  
 [5] LUKE     --BIGGS   CHEWBACCA--LEIA    C-3PO    --LEIA    LUKE     --LEIA   
 [9] BIGGS    --LEIA    C-3PO    --BERU    LUKE     --BERU    LEIA     --BERU   
[13] C-3PO    --OWEN    LUKE     --OWEN    BERU     --OWEN    CHEWBACCA--OBI-WAN
[17] C-3PO    --OBI-WAN LUKE     --OBI-WAN LEIA     --OBI-WAN CHEWBACCA--HAN    
[21] C-3PO    --HAN     LUKE     --HAN     LEIA     --HAN     OBI-WAN  --HAN    

[[2]]
IGRAPH 487f99a UNW- 6 15 -- Episode IV – A New Hope
+ attr: name (g/c), name (v/c), height (v/n), mass (v/n), hair_color
| (v/c), skin_color (v/c), eye_color (v/c), birth_year (v/n), sex
| (v/c), homeworld (v/c), species (v/c), weight (e/n)
+ edges from 487f99a (vertex names):
 [1] R2-D2  --C-3PO   R2-D2  --LUKE    C-3PO  --LUKE    R2-D2  --LEIA   
 [5] C-3PO  --LEIA    LUKE   --LEIA    R2-D2  --OBI-WAN C-3PO  --OBI-WAN
 [9] LUKE   --OBI-WAN LEIA   --OBI-WAN R2-D2  --HAN     C-3PO  --HAN    
[13] LUKE   --HAN     LEIA   --HAN     OBI-WAN--HAN    

[[3]]
IGRAPH 487faa6 UNW- 10 27 -- Episode IV – A New Hope
+ attr: name (g/c), name (v/c), height (v/n), mass (v/n), hair_color
| (v/c), skin_color (v/c), eye_color (v/c), birth_year (v/n), sex
| (v/c), homeworld (v/c), species (v/c), weight (e/n)
+ edges from 487faa6 (vertex names):
 [1] R2-D2    --CHEWBACCA R2-D2    --LUKE      CHEWBACCA--LUKE     
 [4] R2-D2    --BIGGS     LUKE     --BIGGS     R2-D2    --LEIA     
 [7] CHEWBACCA--LEIA      LUKE     --LEIA      BIGGS    --LEIA     
[10] R2-D2    --BERU      LUKE     --BERU      LEIA     --BERU     
[13] R2-D2    --OWEN      LUKE     --OWEN      BERU     --OWEN     
[16] R2-D2    --OBI-WAN   CHEWBACCA--OBI-WAN   LUKE     --OBI-WAN  
+ ... omitted several edges

[[4]]
IGRAPH 487fbb2 UNW- 15 36 -- Episode IV – A New Hope
+ attr: name (g/c), name (v/c), height (v/n), mass (v/n), hair_color
| (v/c), skin_color (v/c), eye_color (v/c), birth_year (v/n), sex
| (v/c), homeworld (v/c), species (v/c), weight (e/n)
+ edges from 487fbb2 (vertex names):
 [1] R2-D2    --CHEWBACCA R2-D2    --C-3PO     R2-D2    --BERU     
 [4] R2-D2    --OWEN      R2-D2    --OBI-WAN   R2-D2    --LEIA     
 [7] R2-D2    --BIGGS     R2-D2    --HAN       CHEWBACCA--OBI-WAN  
[10] CHEWBACCA--C-3PO     CHEWBACCA--HAN       CHEWBACCA--LEIA     
[13] CAMIE    --BIGGS     BERU     --OWEN      C-3PO    --BERU     
[16] C-3PO    --OWEN      C-3PO    --LEIA      LEIA     --BERU     
+ ... omitted several edges

[[5]]
IGRAPH 487fce6 UNW- 4 4 -- Episode IV – A New Hope
+ attr: name (g/c), name (v/c), height (v/n), mass (v/n), hair_color
| (v/c), skin_color (v/c), eye_color (v/c), birth_year (v/n), sex
| (v/c), homeworld (v/c), species (v/c), weight (e/n)
+ edges from 487fce6 (vertex names):
[1] LEIA --OBI-WAN LEIA --MOTTI   LEIA --TARKIN  MOTTI--TARKIN 

[[6]]
IGRAPH 487fde4 UNW- 2 1 -- Episode IV – A New Hope
+ attr: name (g/c), name (v/c), height (v/n), mass (v/n), hair_color
| (v/c), skin_color (v/c), eye_color (v/c), birth_year (v/n), sex
| (v/c), homeworld (v/c), species (v/c), weight (e/n)
+ edge from 487fde4 (vertex names):
[1] LUKE--BIGGS

First, we turn the code we used to find R2-D2’s ego graph into a function, then we apply the function to each node in the network. The result is a list object with \(|V| = 21\) ego subgraphs composed of each node’s alters and their connections.

Now, to find out the clustering coefficient of each node, we just type:

   C <- round(sapply(ego.graphs, edge_density), 2)
   names(C) <- V(g)$name
   C
      R2-D2   CHEWBACCA       C-3PO        LUKE DARTH VADER       CAMIE 
       0.67        1.00        0.60        0.34        0.67        1.00 
      BIGGS        LEIA        BERU        OWEN     OBI-WAN       MOTTI 
       0.54        0.42        0.90        1.00        0.76        1.00 
     TARKIN         HAN      GREEDO       JABBA     DODONNA GOLD LEADER 
       1.00        0.54         NaN         NaN        1.00        0.80 
      WEDGE  RED LEADER     RED TEN 
       0.80        0.57        1.00 

Note we have a couple of NaN values in the slots corresponding to Greedo and Jabba in the clustering coefficient vector.

Let’s check out why:

   degree(g)
      R2-D2   CHEWBACCA       C-3PO        LUKE DARTH VADER       CAMIE 
          9           6          10          15           4           2 
      BIGGS        LEIA        BERU        OWEN     OBI-WAN       MOTTI 
          8          12           5           4           7           3 
     TARKIN         HAN      GREEDO       JABBA     DODONNA GOLD LEADER 
          3           8           1           1           3           5 
      WEDGE  RED LEADER     RED TEN 
          5           7           2 

Here we see the problem is that both Greedo and Jabba are singleton nodes (with degree equal to one), so it doesn’t make sense to analyze their clustering coefficients because their ego graph is just an isolated node!

We can just drop them and re-analyze:

   g <- subgraph(g, degree(g)> 1)
   ego.graphs <- lapply(V(g)$name, create.ego, w = g)
   C <- round(sapply(ego.graphs, edge_density), 2)
   names(C) <- V(g)$name
   C
      R2-D2   CHEWBACCA       C-3PO        LUKE DARTH VADER       CAMIE 
       0.67        1.00        0.60        0.34        0.67        1.00 
      BIGGS        LEIA        BERU        OWEN     OBI-WAN       MOTTI 
       0.54        0.42        0.90        1.00        0.76        1.00 
     TARKIN         HAN     DODONNA GOLD LEADER       WEDGE  RED LEADER 
       1.00        1.00        1.00        0.80        0.80        0.57 
    RED TEN 
       1.00 

Much better!

Note that in this analysis, Luke has the lowest clustering coefficient (\(C = 0.34\)) this usually indicates an ego whose alters are partitioned into distinct clusters (and hence they are not connected to one another), and ego is a mediator or broker between those clusters.

Let’s see what that looks like:

   set.seed(456)
   N <- neighbors(g, "LUKE")
   luke.alters <- subgraph(g, N)  
   V(luke.alters)$color <- cluster_leading_eigen(luke.alters)$membership
   luke <- subgraph(g, c("LUKE", names(N)))
   luke <- simplify(union(luke, luke.alters))
   V(luke)$color[which(is.na(V(luke)$color))] <- "red"
   plot(luke, 
     vertex.size=6, vertex.frame.color="lightgray", 
     vertex.label.dist=1.25,  
     vertex.label.cex = 0.75, edge.color = "lightgray")

Luke’s ego network with alter nodes colored by community assingment via Newman’s leading eigenvector method.

Here we can see that Luke mediates between the Rebel Pilot community on the left and the Obi-Wan, Leia, Chewbacca, Han Solo and Droid communities on the right.

Note that in the ego graph that includes ego, each connected alter is a triangle in the ego graph. So the clustering coefficient is simply a count of the number of undirected triangles that are centered on ego, or the number of cycles of length three centered on ego.

So that means that the diagonals of the cube of the adjacency matrix also contain the information needed to compute the clustering coefficient:

   A <- as.matrix(as_adjacency_matrix(g))
   A3 <- A %*% A %*% A
   diag(A3)
      R2-D2   CHEWBACCA       C-3PO        LUKE DARTH VADER       CAMIE 
         48          30          54          72           8           2 
      BIGGS        LEIA        BERU        OWEN     OBI-WAN       MOTTI 
         30          56          18          12          32           6 
     TARKIN         HAN     DODONNA GOLD LEADER       WEDGE  RED LEADER 
          6          30           6          16          16          24 
    RED TEN 
          2 

So all we need to do is divide these numbers by the maximum possible number of undirected triangles that could be centered on a node, which is \(k_i(k_i - 1)\) where \(k_i\) is ego’s degree:

   k <- degree(g)
   C <- diag(A3)/(k*(k - 1))
   round(C, 2)
      R2-D2   CHEWBACCA       C-3PO        LUKE DARTH VADER       CAMIE 
       0.67        1.00        0.60        0.34        0.67        1.00 
      BIGGS        LEIA        BERU        OWEN     OBI-WAN       MOTTI 
       0.54        0.42        0.90        1.00        0.76        1.00 
     TARKIN         HAN     DODONNA GOLD LEADER       WEDGE  RED LEADER 
       1.00        1.00        1.00        0.80        0.80        0.57 
    RED TEN 
       1.00 

Which gives us the answer as before!

Finally, from each ego’s clustering coefficient (sometimes called the local clustering coefficient of each node) we can compute the graph’s global clustering coefficient which is just the average this quantity across each node in the graph:

\[ C(G) = \frac{1}{N}\sum_iC_i \]

In R:

   C.glob <- mean(C)
   round(C.glob, 2)
[1] 0.79

Which indicates a fairly clustered graph.

In igraph we can use the function transitivity to compute the local and global clustering coefficients, which can be specified using the argument type. For the local version, the function also expects a list of vertices:

   round(transitivity(g, V(g)$name, type = "local"), 2)
      R2-D2   CHEWBACCA       C-3PO        LUKE DARTH VADER       CAMIE 
       0.67        1.00        0.60        0.34        0.67        1.00 
      BIGGS        LEIA        BERU        OWEN     OBI-WAN       MOTTI 
       0.54        0.42        0.90        1.00        0.76        1.00 
     TARKIN         HAN     DODONNA GOLD LEADER       WEDGE  RED LEADER 
       1.00        1.00        1.00        0.80        0.80        0.57 
    RED TEN 
       1.00 

And the graph’s global clustering coefficient is:

   round(transitivity(g, type = "average"), 2)
[1] 0.79

Ego-Network Betweenness

An alternative structural measure of ego’s position in the ego network, closely related to the clustering coefficient, is ego network betweenness. As Everett and Borgatti (2005) note, in an ego network betweenness is determined by the number of paths of length two that involve ego, that is, by the number of disconnected alters.

Therefore, if \(\mathbf{A}\) is the adjacency matrix recording the links between the alters, then \(\mathbf{A}^2\) will contain the number of paths of length two between each pair of alters. Because we are only interested in the number of paths of length two between each pair of disconnected alters, we multiply (element-wise) this matrix by the adjacency matrix corresponding to the graph complement (a matrix with a one for every zero in the original adjacency matrix and a zero for each one). We then take the sum of the reciprocals of one of the triangles (excluding the main diagonal) of the resulting matrix to find the betweenness of ego.

In math:

\[ C_B(Ego) = \sum_{i < j}\left[\mathbf{A}^2 \bullet (\mathbf{J} - \mathbf{A})\right]_{ij}^{-1} \]

Where \(\mathbf{J}\) is a matrix full of ones of the same dimensions as \(\mathbf{A}\) and \(\bullet\) indicates element-wise matrix multiplication.

A simple function that does this looks like:

   ego.bet <- function(x, n) {
      N <- neighbors(x, n)
      alter.net <- subgraph(x, c(n, names(N)))
      A <- as.matrix(as_adjacency_matrix(alter.net))
      A2 <- A %*% A
      J <- matrix(1, nrow(A), ncol(A))
      cb <- A2 * (J - A)
      cb <- 1/cb
      cb[is.infinite(cb)] <- 0
      cb <- sum(cb[upper.tri(cb)])
      return(cb)
   }

Let’s see how it works for Luke:

   round(ego.bet(g, "LUKE"), 2)
[1] 45.92

Which says that Luke has pretty high ego-network betweenness.

We can, of course, compute it for everyone in the network like before:

   round(sapply(V(g)$name, ego.bet, x = g), 2)
      R2-D2   CHEWBACCA       C-3PO        LUKE DARTH VADER       CAMIE 
       3.33        0.00        5.42       45.92        1.00        0.00 
      BIGGS        LEIA        BERU        OWEN     OBI-WAN       MOTTI 
       5.58       25.83        0.25        0.00        2.50        0.00 
     TARKIN         HAN     DODONNA GOLD LEADER       WEDGE  RED LEADER 
       0.00        0.00        0.00        0.67        0.67        3.83 
    RED TEN 
       0.00 

Which confirms our original impression of Luke as the highest ego-network betweenness character with Leia in second place.

What does it mean to have an ego-network betweenness of zero? Well, this is only possible if your clustering coefficient is 1.0, that is, when all of your alters are directly connected to one another. This is evident in Chewbacca’s ego network:

   N <- neighbors(g, "CHEWBACCA")
   chew <- subgraph(g, c("CHEWBACCA", names(N)))
   cols <- rep("#56B4E9", vcount(chew))
   names(cols) <- V(chew)$name
   cols[which(names(cols) == "CHEWBACCA")] <- categorical_pal(1)
   V(chew)$color <- cols
   plot(chew, 
     vertex.size=10, vertex.frame.color="lightgray", 
     vertex.label.dist=2, 
     layout = layout_as_star(chew, center = "CHEWBACCA"),
     vertex.label.cex = 1.25, edge.color = "lightgray")

Chewbacca’s Ego Network.

Which is a complete clique of size seven.

Compositional Measures

Ego Network Diversity

If we have information on the categorical vertex attributes of ego’s alters we may be interested in how diverse are ego’s choices across those attributes.

The most common measure is Blau’s Diversity Index (\(H\)). For a categorical attribute with \(m\) levels, this is given by:

\[ H = 1 - \sum_{k=1}^m p_k^2 \]

Where \(p_k\) is the proportion of ego’s alters that fall under category level \(k\).

The \(H\) measure ranges from a minimum of \(H = 0\) (all of ego’s alters belong to a single category) to a maximum of \(H = 1- \frac{1}{m}\) (all of ego’s alters belong to a different category).

When the Blau diversity index is normalized by its theoretical maximum, it is sometimes referred to as the Index of Qualitative Variation or \(IQV\):

\[ IQV = \frac{1 - \sum_{k=1}^m p_k^2}{1-\frac{1}{m}} \]

The main difference between \(H\) and \(IQV\) is that the latter has a maximum of \(IQV = 1.0\) indicating the top diversity that can be observed for a categorical attribute with \(m\) categories.

Let’s see how this would work.

First, let’s switch to the Attack of the Clones Star Wars graph:

   g <- starwars[[2]]
   g <- subgraph(g, degree(g)> 1) #removing singletons

Now, we will pick the vertex attribute homeworld and try to measure how diverse is each character’s ego network on this score.

To do that, we need to get the proportion of characters from each homeworld in the network.

Let’s check out this vertex attribute:

   V(g)$homeworld
 [1] "Naboo"          "Naboo"          "Naboo"          NA              
 [5] NA               "Haruun Kal"     NA               "Cerea"         
 [9] "Alderaan"       "Naboo"          "Stewjon"        "Tatooine"      
[13] "Naboo"          NA               NA               NA              
[17] "Kamino"         "Kamino"         "Kamino"         "Concord Dawn"  
[21] "Tatooine"       "Tatooine"       "Tatooine"       "Tatooine"      
[25] "Serenno"        NA               "Geonosis"       "Cato Neimoidia"

There are some NA values here, so let’s create a residual category called “other”:

   V(g)$homeworld[is.na(V(g)$homeworld)] <- "Other"
   V(g)$homeworld
 [1] "Naboo"          "Naboo"          "Naboo"          "Other"         
 [5] "Other"          "Haruun Kal"     "Other"          "Cerea"         
 [9] "Alderaan"       "Naboo"          "Stewjon"        "Tatooine"      
[13] "Naboo"          "Other"          "Other"          "Other"         
[17] "Kamino"         "Kamino"         "Kamino"         "Concord Dawn"  
[21] "Tatooine"       "Tatooine"       "Tatooine"       "Tatooine"      
[25] "Serenno"        "Other"          "Geonosis"       "Cato Neimoidia"

Great! Now we can use the native R function table to get the relevant proportions.

The function table gives us the count of characters in each category, and then we divide by the total number of actors in the network, given by vcount:

   p.hw <- round(table(V(g)$homeworld)/vcount(g), 3)
   p.hw

      Alderaan Cato Neimoidia          Cerea   Concord Dawn       Geonosis 
         0.036          0.036          0.036          0.036          0.036 
    Haruun Kal         Kamino          Naboo          Other        Serenno 
         0.036          0.107          0.179          0.250          0.036 
       Stewjon       Tatooine 
         0.036          0.179 

Now that we know how to get the proportions we need, we can write a custom function that will compute \(H\) (or its normalized counterpart the \(IQV\)) for a given ego network for any given attribute:

   blau <- function(n, w, a, norm = FALSE) {
      x <- subgraph(w, neighbors(w, n)) #ego subgraph
      att.vec <- vertex_attr(x, a) #number of alters in each category of a
      H <- 1 - sum((table(att.vec)/vcount(x))^2)
      if (norm == TRUE) {H <- H /(1 - (1/length(att.vec)))} #IQV
      return(H)
      }

This function takes three inputs: The name of the ego n, the graph object w, and the name of the attribute a. It returns the Blau diversity index score for that ego on that attribute by default; when norm is set to TRUE it returns the normalized Blau score (a.k.a. the \(IQV\)).

Let’s see Padme’s Home World ego network diversity score:

   round(blau("PADME", g, "homeworld"), 3)
[1] 0.796

Which says that Padme has a fairly diverse ego network when it comes to Home World.

We can, of course, just use sapply to compute everyone’s Home World ego network diversity score:

   round(sapply(V(g)$name, blau, w = g, a = "homeworld"), 3)
          R2-D2   CAPTAIN TYPHO         EMPEROR SENATOR ASK AAK    ORN FREE TAA 
          0.444           0.667           0.815           0.776           0.444 
     MACE WINDU            YODA    KI-ADI-MUNDI     BAIL ORGANA         JAR JAR 
          0.820           0.840           0.750           0.776           0.780 
        OBI-WAN          ANAKIN           PADME            SOLA           JOBAL 
          0.858           0.809           0.796           0.625           0.625 
          RUWEE         TAUN WE         LAMA SU       BOBA FETT      JANGO FETT 
          0.625           0.625           0.500           0.667           0.625 
          C-3PO            OWEN            BERU          CLIEGG     COUNT DOOKU 
          0.444           0.320           0.320           0.320           0.860 
        SUN RIT          POGGLE     NUTE GUNRAY 
          0.833           0.833           0.833 

In this network Count Dooku stands out as having a very diverse ego network by Home World, while Owen sports a very homogeneous ego network on the same attribute.

Let’s see a side-by-side comparison:

Two Ego Networks with Nodes Colored by Homeworld

And here are the \(IQV\) scores for everyone in the network:

   round(sapply(V(g)$name, blau, w = g, 
                a = "homeworld", norm = TRUE), 3)
          R2-D2   CAPTAIN TYPHO         EMPEROR SENATOR ASK AAK    ORN FREE TAA 
          0.667           1.000           0.917           0.905           0.667 
     MACE WINDU            YODA    KI-ADI-MUNDI     BAIL ORGANA         JAR JAR 
          0.911           0.933           1.000           0.905           0.867 
        OBI-WAN          ANAKIN           PADME            SOLA           JOBAL 
          0.912           0.856           0.846           0.833           0.833 
          RUWEE         TAUN WE         LAMA SU       BOBA FETT      JANGO FETT 
          0.833           0.833           1.000           1.000           0.833 
          C-3PO            OWEN            BERU          CLIEGG     COUNT DOOKU 
          0.533           0.400           0.400           0.400           0.956 
        SUN RIT          POGGLE     NUTE GUNRAY 
          1.000           1.000           1.000 

As we noted an ego network with maximum diversity \(IQV = 1.0\) is one where every alter is in a different category. Here are two examples:

Two Ego Networks with Maximum Homeworld Diversity.

Ego Network Homophily

Diversity measures the extent to which ego’s connect to alters who are different from one another. We may also want to get a sense of how homophilous an ego network is, namely, the extent to which ego connects to alters that are the same or different from them.

For instance, a person can have an ego network composed of 100% alters who are different from them on a given attribute (maximum “heterophily”) but those alters could be 100% homogeneous—e.g., all come from the same planet—and thus ego will have the minimum Blau diversity score (\(H = 0\)).

To measure homophily in the ego network we use the EI homophily index. This is given by:

\[ EI = \frac{E-I}{E+I} \]

Where \(E\) is the number of “external” ties (alter different from ego on attribute), and \(I\) is the number of “internal” ties (alter same as ego on attribute).

The \(EI\) index ranges from a minimum of \(EI = -1\), indicating maximum homophily, to a maximum of \(EI = 1\), indicating maximum heterophily. An EI index value of zero indicates no preference for internal over external ties.

So let’s write a function that does what we want to calculate EI:

   EI <- function(n, w, a) {
      x <- subgraph(w, neighbors(w, n))
      E <- vertex_attr(w, a, n) != vertex_attr(x, a)
      E <- sum(as.numeric(E))
      I <- vertex_attr(w, a, n) == vertex_attr(x, a)
      I <- sum(as.numeric(I))
      ei = (E - I)/(E + I)
      return(ei)
      }

Let’s look at the attribute sex:

   V(g)$sex
 [1] "none"   "male"   "male"   NA       NA       "male"   "male"   "male"  
 [9] "male"   "male"   "male"   "male"   "female" NA       NA       NA      
[17] "female" "male"   "male"   "male"   "none"   "male"   "female" "male"  
[25] "male"   NA       "male"   "male"  

Getting rid of the NA values:

   V(g)$sex[is.na(V(g)$sex)] <- "Other"
   V(g)$sex
 [1] "none"   "male"   "male"   "Other"  "Other"  "male"   "male"   "male"  
 [9] "male"   "male"   "male"   "male"   "female" "Other"  "Other"  "Other" 
[17] "female" "male"   "male"   "male"   "none"   "male"   "female" "male"  
[25] "male"   "Other"  "male"   "male"  

And calculating the homophily index on gender for everyone:

   EI <- sapply(V(g)$name, EI, w = g, a = "sex")
   round(EI, 2)
          R2-D2   CAPTAIN TYPHO         EMPEROR SENATOR ASK AAK    ORN FREE TAA 
           0.33           -1.00           -0.56            0.71            0.33 
     MACE WINDU            YODA    KI-ADI-MUNDI     BAIL ORGANA         JAR JAR 
          -0.60           -0.60           -1.00           -0.71           -0.40 
        OBI-WAN          ANAKIN           PADME            SOLA           JOBAL 
          -0.53           -0.11            0.88            0.00            0.00 
          RUWEE         TAUN WE         LAMA SU       BOBA FETT      JANGO FETT 
           0.00            1.00            0.00           -0.33           -0.50 
          C-3PO            OWEN            BERU          CLIEGG     COUNT DOOKU 
           0.67            0.20            0.60            0.20           -0.60 
        SUN RIT          POGGLE     NUTE GUNRAY 
           1.00           -0.33           -0.33 

As we can see, the Emperor, Mace Windu, Obi-Wan and other such characters have a homophilous “bro” network. Padme, on the other hand, has a heterophilous network with respect to gender.

Let’s see a side-by-side comparison:

H = -0.60

H = 0.88

Two Ego Networks with Nodes Colored by Gender

As we can see, Mace Windu is mostly surrounded by other men (like him) but Padme’s network includes only one other woman, and the rest are composed of people with a different gender presentation than her (or have no discernible gender like the droids).

Like the clustering coefficient, we can compute a graph level index of homophily on a given attribute. This is given by the average EI index of nodes in the graph for that attribute.

In the case of gender in Attack of the Clones:

   EI.gender = mean(EI)
   round(EI.gender, 2)
[1] -0.06

Which shows a slight preference for same-gender ties in the network.

Normalized EI

Sometimes we may want to take into account that the group sizes of different categories of people is unequal in the network. For instance, Star Wars is full characters gendered as men, which means that any homophily index will penalize men as being more homophilous simply because there are more men around to form ties with.

Everett and Borgatti (2012, 564–65) propose approach to normalizing the EI index to account for unequal group sizes, yielding the \(NEI\). So instead of computing EI they suggest calculating:

\[ E^* = \frac{E}{N-Ns} \]

\[ I^* = \frac{I}{Ns} \]

\[ NEI = \frac{E^*-I^*}{E^*+I^*} \]

With \(N_s\) being the number of “similar” nodes to ego in the whole graph (or external population) and \(N\) being the total number of nodes (or persons in the population). As you can see, the \(NEI\) weights both the number of external and \(E\) the number of internal ties \(I\) by their maximum possible values in the network.

Here’s a function that does that:

   NEI <- function(n, w, a) {
      x <- subgraph(w, neighbors(w, n))
      Ns <- sum(as.numeric(vertex_attr(w, a) == vertex_attr(w, a, n)))
      E <- vertex_attr(w, a, n) != vertex_attr(x, a)
      E <- sum(as.numeric(E))/(vcount(w) - Ns) 
      I <- vertex_attr(w, a, n) == vertex_attr(x, a)
      I <- sum(as.numeric(I))/Ns
      nei = (E - I)/(E + I)
      return(nei)
   }

Let’s re-check Mace Windu’s and Padme’s EI index using the normalized scale:

   round(NEI("MACE WINDU", g, "sex"), 2)
[1] -0.44
   round(NEI("PADME", g, "sex"), 2)
[1] 0.32

As we can see the NEI scores are less extreme than the unnormalized ones, once we take into account that the majority of characters in the film are men.

Here are the NEI scores with respect to gender for everyone:

   NEI <- sapply(V(g)$name, NEI, w = g, a = "sex")
   round(NEI, 2)
          R2-D2   CAPTAIN TYPHO         EMPEROR SENATOR ASK AAK    ORN FREE TAA 
          -0.73           -1.00           -0.39            0.24           -0.29 
     MACE WINDU            YODA    KI-ADI-MUNDI     BAIL ORGANA         JAR JAR 
          -0.44           -0.44           -1.00           -0.59           -0.20 
        OBI-WAN          ANAKIN           PADME            SOLA           JOBAL 
          -0.36            0.11            0.32           -0.57           -0.57 
          RUWEE         TAUN WE         LAMA SU       BOBA FETT      JANGO FETT 
          -0.57            1.00            0.21           -0.13           -0.32 
          C-3PO            OWEN            BERU          CLIEGG     COUNT DOOKU 
          -0.44            0.40           -0.35            0.40           -0.44 
        SUN RIT          POGGLE     NUTE GUNRAY 
           1.00           -0.13           -0.13 

Interestingly, while most people’s scores are attenuated towards zero in the normalized scale, R2-D2’s becomes more extreme going from weakly positive (demonstrating “gender” heterophily) to extreme negative (showing same “gender” preference).

Let’s see what’s going on:

R2-D2’s Ego Network with Nodes Colored by Gender

Here we can see that the reason why R2-D2 ends up being high in homophily in the NEI despite containing a network with just three nodes and only a single “same-gender” (i.e., none) tie, is that he is connected to C3PO who is the only other character (a droid) whose gender is also assigned to “none.”

And here’s the graph’s overall NEI:

   NEI.gender = mean(NEI)
   round(NEI.gender, 2)
[1] -0.19

Which shows that our previous unnormalized average under-estimated homophily in this network. Instead, there is a moderately strong tendency for characters to co-appear with others of the same gender classification once the imbalance favoring men is accounted for.

Other Ways of Accounting for Imbalanced Group Sizes in Homophily Metrics

As may already be evident when constructing a homophily measure that takes into account the population (or local network) proportions of various types of alters, there are four pieces of information that we have to take into account:

  1. Number of alters linked to ego of the same category as ego.
  2. Number of alters linked to ego of a different category from ego.
  3. Number of alters not linked to ego of the same category as ego.
  4. Number of alters non-linked to ego of a different category from ego.

Which yields a classic 2 by 2 table.

Here’s a function that produces such a table for each ego for a given attribute:

   abcd <- function(n, w, a) {
      x <- subgraph(w, neighbors(w, n))
      same <- vertex_attr(g, a, n) == vertex_attr(g, a)
      same <- same[!(same %in% n)] #deleting ego node from vector
      tied <- as.vector(V(w)$name) %in% names(neighbors(w, n))
      tied <- tied[!(tied %in% n)] #deleting ego node from vector
      tab <- table(tied, same)
      tab <- tab[2:1, 2:1]
      return(tab)
   }

So for Count Dooku with respect to gender, this 2 X 2 table looks like:

   abcd("COUNT DOOKU", g, "sex")
       same
tied    TRUE FALSE
  TRUE     8     2
  FALSE    9     9

So here we can see that Dooku is linked to eight others of the same gender, but there are nine other men he’s not linked to. In the same way, he has two different-gender ties, but there are nine others of a different gender he’s not tied to.

As Everett and Borgatti (2012) note, we can label the cells of the 2 X 2 EI table with the letters from the above list to highlight each piece of information in each cell:

       same
tied    TRUE FALSE
  TRUE  a    b    
  FALSE c    d    

The EI index only uses information only from the first row (alters tied to ego), and is thus a rescaling of \(a/(a + b)\); it is sensitive to group size imbalances because it ignores the other pieces of information.

The NEI, on the other hand, uses information from all four cells and is therefore a rescaling of \((a/(a + c)) - (b/(b + d))\); therefore, it is insensitive to group size imbalances.

Other measures of homophily could thus be constructed from the information in the 2 X 2 table, that, like the NEI, are not sensitive to group sizes. One measure Everett and Borgatti (2012, 565) recommend is the point biserial correlation coefficient (\(r^{pb}\)), which is given, using the cell labels in the table above, by:

\[ r^{pb} = \frac{ad-bc}{\sqrt{(a+c)(b+d)(a+b(c+d))}} \]

A function that computes this from the output of abcd above is:

   pb.corr <- function(x) {
      a <- x[1,1]
      b <- x[1,2]
      c <- x[2,1]
      d <- x[2,2]
   num <- (a*d - b*c)
   den <- sqrt((a+c)*(b+d)*(a+b)*(c+d))
   return(num/den)
   }

For Dooku, \(r^{pb}\) is:

   round(pb.corr(abcd("COUNT DOOKU", g, "sex")), 2)
[1] 0.29

Which shows a positive tendency for same gender ties.

To calculate \(r\) for the whole network, first we need to create a list containing the corresponding 2 X 2 EI tables for each node for the gender attribute:

   abcd.list <- lapply(V(g)$name, abcd, w = g, a = "sex")
   names(abcd.list) <- V(g)$name
   head(abcd.list)
$`R2-D2`
       same
tied    TRUE FALSE
  TRUE     1     2
  FALSE    1    24

$`CAPTAIN TYPHO`
       same
tied    TRUE FALSE
  TRUE     3     0
  FALSE   14    11

$EMPEROR
       same
tied    TRUE FALSE
  TRUE     7     2
  FALSE   10     9

$`SENATOR ASK AAK`
       same
tied    TRUE FALSE
  TRUE     1     6
  FALSE    5    16

$`ORN FREE TAA`
       same
tied    TRUE FALSE
  TRUE     1     2
  FALSE    5    20

$`MACE WINDU`
       same
tied    TRUE FALSE
  TRUE     8     2
  FALSE    9     9

And then sapply the function pb.corr to each element of this list:

   PBC <- sapply(abcd.list, pb.corr)
   round(PBC, 2)
          R2-D2   CAPTAIN TYPHO         EMPEROR SENATOR ASK AAK    ORN FREE TAA 
           0.35            0.28            0.24           -0.10            0.10 
     MACE WINDU            YODA    KI-ADI-MUNDI     BAIL ORGANA         JAR JAR 
           0.29            0.29            0.33            0.30            0.14 
        OBI-WAN          ANAKIN           PADME            SOLA           JOBAL 
           0.40           -0.14           -0.19            0.28            0.28 
          RUWEE         TAUN WE         LAMA SU       BOBA FETT      JANGO FETT 
           0.28           -0.14           -0.06            0.04            0.12 
          C-3PO            OWEN            BERU          CLIEGG     COUNT DOOKU 
           0.19           -0.20            0.14           -0.20            0.29 
        SUN RIT          POGGLE     NUTE GUNRAY 
          -0.27            0.06            0.06 

Which shows that, after accounting for group sizes, most characters display slight to moderate preferences for same-gender ties, with the exception of Anakin and Padme.

And at the network level:

   round(mean(PBC), 4)
[1] 0.1139

Which reveals a slight preference for same gender ties in this network.

Combining Diversity and Homophily

Sometimes, you may not care about the difference between diversity and homophily. Instead you want to see whether nodes connect to nodes that are the same as them or nodes of a single type, or whether nodes connect to all nodes that are different from them, or nodes of a different type.

In this case, Yule’s \(Q\) is a good option (Borgatti et al. 2024, 159). It is given by:

\[ Q = \frac{ad-bc}{ad+bc} \]

\(Q\) works like a correlation coefficient, and it is 1.0 when ego’s network is either completely homophilous or completely homogeneous, and it is -1.0 when ego’s network is either completely heterophilous or completely heterogeneous.

A function to calculate \(Q\) from the abcd function output is as follows:

   y.Q <- function(x) {
      a <- x[1,1]
      b <- x[1,2]
      c <- x[2,1]
      d <- x[2,2]
   num <- a*d - b*c
   den <- a*d + b*c
   return(num/den)    
   }

And here are the values for the Attack of the Clones network:

   Q <- sapply(abcd.list, y.Q)
   round(Q, 2)
          R2-D2   CAPTAIN TYPHO         EMPEROR SENATOR ASK AAK    ORN FREE TAA 
           0.85            1.00            0.52           -0.30            0.33 
     MACE WINDU            YODA    KI-ADI-MUNDI     BAIL ORGANA         JAR JAR 
           0.60            0.60            1.00            0.69            0.30 
        OBI-WAN          ANAKIN           PADME            SOLA           JOBAL 
           0.70           -0.30           -0.56            0.67            0.67 
          RUWEE         TAUN WE         LAMA SU       BOBA FETT      JANGO FETT 
           0.67           -1.00           -0.23            0.14            0.36 
          C-3PO            OWEN            BERU          CLIEGG     COUNT DOOKU 
           0.62           -0.48            0.45           -0.48            0.60 
        SUN RIT          POGGLE     NUTE GUNRAY 
          -1.00            0.16            0.16 

Let’s compare two egos with extreme oppossite values on \(Q\):

Q = 1

Q = -1

Two Ego Networks with Maximum and Minimum Q Values

As we can see, Ki-Adi-Mundi has an ego network that combines maximum homophily and homogeneity with respect to gender and Taun We has an ego network that combines maximum heterophily with maximum homogeneity of alters with respect to the same attribute.

References

Borgatti, Stephen P, Filip Agneessens, Jeffrey C Johnson, and Martin G Everett. 2024. “Analyzing Social Networks.”
Everett, Martin, and Stephen P Borgatti. 2005. “Ego Network Betweenness.” Social Networks 27 (1): 31–38.
———. 2012. “Categorical Attribute Based Centrality: E–i and g–f Centrality.” Social Networks 34 (4): 562–69.
Gabasova, Evelina. 2016. Star Wars social network.” https://doi.org/10.5281/zenodo.1411479.