Handling Graph Objects in Lists

Sometimes network data comes pre-stored as an R list. This is typical if you have a network with multiple kinds of ties recorded on the same set of actors (and thus multiple networks), or longitudinal network data, where we collect multiple “snapshots” of the same system (containing the same or more typically a different set of actors per time slice).

The networkdata package contains one such data set called atp. It’s a network of Tennis players who played in grand slam or official matches of the Association of Tennis Professionals (hence ATP) covering the years 1968-2021 (Radicchi 2011).

In the directed graph representing each network, a tie goes from the loser to the winner of each match. Accordingly, it can be interpreted as a directed “deference” network (it would be a dominance network if it was the other way around), where actor i “defers” to actor j by getting their ass kicked by them.

Let’s see how this list of networks works:

   library(networkdata)
   library(igraph)
   g <- atp
   head(g)
[[1]]
IGRAPH 08a202a DNW- 497 1213 -- ATP Season 1968
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from 08a202a (vertex names):
 [1] U Unknown    ->Jose Mandarino     Alfredo Acuna->Alan Fox          
 [3] Andres Gimeno->Ken Rosewall       Andres Gimeno->Raymond Moore     
 [5] Juan Gisbert ->Tom Okker          Juan Gisbert ->Zeljko Franulovic 
 [7] Onny Parun   ->Jan Kodes          Peter Curtis ->Tom Okker         
 [9] Premjit Lall ->Clark Graebner     Rod Laver    ->Ken Rosewall      
[11] Thomas Lejus ->Nicola Pietrangeli Tom Okker    ->Arthur Ashe       
[13] U Unknown    ->Jaidip Mukherjea   U Unknown    ->Jose Luis Arillla 
+ ... omitted several edges

[[2]]
IGRAPH c895c57 DNW- 446 1418 -- ATP Season 1969
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from c895c57 (vertex names):
 [1] U Unknown           ->U Unknown        
 [2] Alejandro Olmedo    ->Ron Holmberg     
 [3] Arthur Ashe         ->Rod Laver        
 [4] Bob Carmichael      ->Jim Osborne      
 [5] Cliff Richey        ->Zeljko Franulovic
 [6] Francois Jauffret   ->Martin Mulligan  
 [7] Fred Stolle         ->John Newcombe    
+ ... omitted several edges

[[3]]
IGRAPH d6709af DNW- 451 1650 -- ATP Season 1970
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from d6709af (vertex names):
 [1] Mark Cox            ->Jan Kodes        
 [2] Charlie Pasarell    ->Stan Smith       
 [3] Cliff Richey        ->Arthur Ashe      
 [4] Francois Jauffret   ->Manuel Orantes   
 [5] Georges Goven       ->Jan Kodes        
 [6] Harald Elschenbroich->Zeljko Franulovic
 [7] Ilie Nastase        ->Zeljko Franulovic
+ ... omitted several edges

[[4]]
IGRAPH a73a020 DNW- 459 2580 -- ATP Season 1971
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from a73a020 (vertex names):
 [1] Andres Gimeno    ->Ken Rosewall   Arthur Ashe      ->Rod Laver     
 [3] Charlie Pasarell ->Cliff Drysdale Frank Froehling  ->Clark Graebner
 [5] Joaquin Loyo Mayo->Thomaz Koch    John Alexander   ->John Newcombe 
 [7] John Newcombe    ->Marty Riessen  Nikola Pilic     ->Cliff Drysdale
 [9] Owen Davidson    ->Cliff Drysdale Robert Maud      ->Cliff Drysdale
[11] Roger Taylor     ->Marty Riessen  Roy Emerson      ->Rod Laver     
[13] Tom Okker        ->John Newcombe  Allan Stone      ->Bob Carmichael
+ ... omitted several edges

[[5]]
IGRAPH 761ed05 DNW- 504 2767 -- ATP Season 1972
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from 761ed05 (vertex names):
 [1] Jean Loup Rouyer->Stan Smith     Marty Riessen   ->Cliff Drysdale
 [3] Roy Emerson     ->Arthur Ashe    Roy Emerson     ->Arthur Ashe   
 [5] Tom Leonard     ->John Newcombe  Tom Okker       ->Arthur Ashe   
 [7] Tom Okker       ->Rod Laver      Adriano Panatta ->Andres Gimeno 
 [9] Adriano Panatta ->Ilie Nastase   Allan Stone     ->John Alexander
[11] Allan Stone     ->Marty Riessen  Andres Gimeno   ->Jan Kodes     
[13] Andres Gimeno   ->Stan Smith     Andrew Pattison ->Ilie Nastase  
+ ... omitted several edges

[[6]]
IGRAPH 92ff576 DNW- 592 3653 -- ATP Season 1973
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from 92ff576 (vertex names):
 [1] Harold Solomon    ->Stan Smith       John Alexander    ->Stan Smith      
 [3] Patrice Dominguez ->Paolo Bertolucci Paul Gerken       ->Jimmy Connors   
 [5] Roy Emerson       ->Rod Laver        Adriano Panatta   ->Ilie Nastase    
 [7] Bjorn Borg        ->Adriano Panatta  Brian Gottfried   ->Cliff Richey    
 [9] Charlie Pasarell  ->John Alexander   Cliff Richey      ->Stan Smith      
[11] Corrado Barazzutti->Bjorn Borg       Francois Jauffret ->Ilie Nastase    
[13] Georges Goven     ->Manuel Orantes   Manuel Orantes    ->Ilie Nastase    
+ ... omitted several edges

We create a graph object and then examine its contents, which we can see is a set of graph objects. In unnamed R lists each of the objects inside is indexed by a number in double brackets. So [[6]] just means the sixth network in the list object (corresponding to the year 1973).

Now let’s say we wanted to compute a network statistic like density. One way to proceed would be:

   edge_density(g)
Error in `ensure_igraph()`:
! Must provide a graph object (provided wrong object type).

Which gives us a weird error about the wrong object type. The reason is that edge_density expects an igraph graph object as input, but g is not a graph object it is a list of such objects. For it to work you have to reference a particular element inside the list not the whole list.

To do that, we use the double bracket notation:

   edge_density(g[[6]])
[1] 0.01044096

Which gives us the density for the 1973 network.

Looping Through Lists

But what if we wanted a table of network statistics for all the years or some subset of years? Of course, we could just type a million versions of the edge_density command or whatever, but that would be tedious. We could also write a for loop or something like that (less tedious). Even less tedious is to use the many apply functions in R that are designed to work with lists, which is a subject onto itself in R programming.

But here we can just use the simple version. Let’s say we wanted a vector of densities (or any other whole network statistic) for the whole 54 years. In that case, our friend sapply can do the job:

   sapply(g, edge_density)
 [1] 0.004920653 0.007144657 0.008130081 0.012272740 0.010914671 0.010440961
 [7] 0.010567864 0.013315132 0.012088214 0.014019237 0.014135328 0.011649909
[13] 0.011172821 0.011261426 0.012703925 0.012177336 0.012648755 0.012445937
[19] 0.012034362 0.012351377 0.010174271 0.009772014 0.019526953 0.012236462
[25] 0.014050245 0.015054181 0.013872832 0.014727924 0.014329906 0.013935502
[31] 0.013962809 0.013870042 0.013665097 0.013818887 0.012551113 0.011571679
[37] 0.012329090 0.012923683 0.011402945 0.012677988 0.012256963 0.013512884
[43] 0.012543025 0.013661748 0.013786518 0.013679697 0.015052857 0.015075622
[49] 0.015081206 0.014346468 0.015764351 0.020169225 0.011889114 0.016935400

sapply is kind of a “meta” function that takes two inputs: A list, and the name of a function (which could be native, a package, or user defined); sapply then “applies” that function to each element inside the list. Here we asked R to apply the function edge_density to each element of the list of networks g and it obliged, creating a vector of length 54 containing the info.

We could use any igraph function, like number of nodes in the graph:

   sapply(g, vcount)
 [1] 497 446 451 459 504 592 595 535 553 524 509 572 582 573 554 532 495 513 510
[20] 523 596 597 405 542 509 498 520 496 502 499 497 480 486 479 497 517 505 492
[39] 524 488 493 464 482 459 457 453 428 430 431 438 419 364 345 393

We could also select subset of elements inside the list. For instance this counts the number of nodes for the first five years:

   sapply(g[1:5], vcount)
[1] 497 446 451 459 504

Or for years 2, 6, 8, and 12:

   sapply(g[c(2, 6, 8, 12)], vcount)
[1] 446 592 535 572

Note the single bracket notation here to refer to subsets of elements in the list. Inside the brackets we could put any arbitrary vector, as long as the numbers in the vector do no exceed the length of the list.

Of course, sometimes the functions we apply to elements of the list don’t return single numbers but vectors or other igraph objects. In that case it would be better to use lapply which is just like sapply but returns another list with the set of answers inside it.

For instance, let’s say we wanted the top five players for each year. In this deference network, a “top” player is one who beats many others, which means they have high indegree (lots of losers pointing at them).

First we create a custom function to compute the indegree and return an ordered named vector of top 5 players:

   top5 <- function(x) {
      library(igraph)
      t <- degree(x, mode = "in")
      t <- sort(t, decreasing = TRUE)[1:5]
      return(t)
   }

Now, we can just feed that function to lapply:

   top.list <- lapply(g, top5)
   head(top.list)
[[1]]
   Arthur Ashe      Rod Laver Clark Graebner   Ken Rosewall      Tom Okker 
            33             27             25             23             22 

[[2]]
John Newcombe     Tom Okker     Rod Laver    Tony Roche   Arthur Ashe 
           45            41            40            40            33 

[[3]]
      Arthur Ashe      Cliff Richey         Rod Laver        Stan Smith 
               51                49                48                45 
Zeljko Franulovic 
               42 

[[4]]
     Ilie Nastase         Tom Okker     Marty Riessen        Stan Smith 
               69                63                61                61 
Zeljko Franulovic 
               60 

[[5]]
  Ilie Nastase     Stan Smith Manuel Orantes  Jimmy Connors    Arthur Ashe 
            99             72             68             65             55 

[[6]]
 Ilie Nastase     Tom Okker Jimmy Connors   Arthur Ashe    Stan Smith 
           96            81            68            63            63 

Which is a list of named vectors containing the number of victories of the top five players each year.

Because the object top.list is just a list, we can subset it just like before. Let’s say we wanted to see the top players for more recent years:

   top.list[49:54]
[[1]]
   Andy Murray  Dominic Thiem  Kei Nishikori Novak Djokovic   David Goffin 
            63             55             53             50             47 

[[2]]
         Rafael Nadal          David Goffin      Alexander Zverev 
                   58                    55                    52 
Roberto Bautista Agut         Dominic Thiem 
                   45                    43 

[[3]]
   Dominic Thiem Alexander Zverev   Novak Djokovic    Fabio Fognini 
              51               50               46               45 
   Roger Federer 
              44 

[[4]]
   Daniil Medvedev     Novak Djokovic       Rafael Nadal Stefanos Tsitsipas 
                55                 52                 52                 49 
     Roger Federer 
                47 

[[5]]
     Andrey Rublev     Novak Djokovic Stefanos Tsitsipas       Rafael Nadal 
                40                 36                 27                 26 
   Daniil Medvedev 
                24 

[[6]]
   Daniil Medvedev Stefanos Tsitsipas        Casper Ruud   Alexander Zverev 
                54                 52                 52                 51 
    Novak Djokovic 
                49 

A series of names which make sense to you if you follow Tennis.

Naming Lists

Finally, sometimes it useful to name the elements of a list. In this case, for instance, having the year number would be easier to remember what’s what. For this, you can use the names command, which works via standard R assignment:

   names(g) <- c(1968:2021)
   head(g)
$`1968`
IGRAPH 08a202a DNW- 497 1213 -- ATP Season 1968
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from 08a202a (vertex names):
 [1] U Unknown    ->Jose Mandarino     Alfredo Acuna->Alan Fox          
 [3] Andres Gimeno->Ken Rosewall       Andres Gimeno->Raymond Moore     
 [5] Juan Gisbert ->Tom Okker          Juan Gisbert ->Zeljko Franulovic 
 [7] Onny Parun   ->Jan Kodes          Peter Curtis ->Tom Okker         
 [9] Premjit Lall ->Clark Graebner     Rod Laver    ->Ken Rosewall      
[11] Thomas Lejus ->Nicola Pietrangeli Tom Okker    ->Arthur Ashe       
[13] U Unknown    ->Jaidip Mukherjea   U Unknown    ->Jose Luis Arillla 
+ ... omitted several edges

$`1969`
IGRAPH c895c57 DNW- 446 1418 -- ATP Season 1969
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from c895c57 (vertex names):
 [1] U Unknown           ->U Unknown        
 [2] Alejandro Olmedo    ->Ron Holmberg     
 [3] Arthur Ashe         ->Rod Laver        
 [4] Bob Carmichael      ->Jim Osborne      
 [5] Cliff Richey        ->Zeljko Franulovic
 [6] Francois Jauffret   ->Martin Mulligan  
 [7] Fred Stolle         ->John Newcombe    
+ ... omitted several edges

$`1970`
IGRAPH d6709af DNW- 451 1650 -- ATP Season 1970
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from d6709af (vertex names):
 [1] Mark Cox            ->Jan Kodes        
 [2] Charlie Pasarell    ->Stan Smith       
 [3] Cliff Richey        ->Arthur Ashe      
 [4] Francois Jauffret   ->Manuel Orantes   
 [5] Georges Goven       ->Jan Kodes        
 [6] Harald Elschenbroich->Zeljko Franulovic
 [7] Ilie Nastase        ->Zeljko Franulovic
+ ... omitted several edges

$`1971`
IGRAPH a73a020 DNW- 459 2580 -- ATP Season 1971
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from a73a020 (vertex names):
 [1] Andres Gimeno    ->Ken Rosewall   Arthur Ashe      ->Rod Laver     
 [3] Charlie Pasarell ->Cliff Drysdale Frank Froehling  ->Clark Graebner
 [5] Joaquin Loyo Mayo->Thomaz Koch    John Alexander   ->John Newcombe 
 [7] John Newcombe    ->Marty Riessen  Nikola Pilic     ->Cliff Drysdale
 [9] Owen Davidson    ->Cliff Drysdale Robert Maud      ->Cliff Drysdale
[11] Roger Taylor     ->Marty Riessen  Roy Emerson      ->Rod Laver     
[13] Tom Okker        ->John Newcombe  Allan Stone      ->Bob Carmichael
+ ... omitted several edges

$`1972`
IGRAPH 761ed05 DNW- 504 2767 -- ATP Season 1972
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from 761ed05 (vertex names):
 [1] Jean Loup Rouyer->Stan Smith     Marty Riessen   ->Cliff Drysdale
 [3] Roy Emerson     ->Arthur Ashe    Roy Emerson     ->Arthur Ashe   
 [5] Tom Leonard     ->John Newcombe  Tom Okker       ->Arthur Ashe   
 [7] Tom Okker       ->Rod Laver      Adriano Panatta ->Andres Gimeno 
 [9] Adriano Panatta ->Ilie Nastase   Allan Stone     ->John Alexander
[11] Allan Stone     ->Marty Riessen  Andres Gimeno   ->Jan Kodes     
[13] Andres Gimeno   ->Stan Smith     Andrew Pattison ->Ilie Nastase  
+ ... omitted several edges

$`1973`
IGRAPH 92ff576 DNW- 592 3653 -- ATP Season 1973
+ attr: name (g/c), name (v/c), age (v/n), hand (v/c), country (v/c),
| surface (e/c), weight (e/n)
+ edges from 92ff576 (vertex names):
 [1] Harold Solomon    ->Stan Smith       John Alexander    ->Stan Smith      
 [3] Patrice Dominguez ->Paolo Bertolucci Paul Gerken       ->Jimmy Connors   
 [5] Roy Emerson       ->Rod Laver        Adriano Panatta   ->Ilie Nastase    
 [7] Bjorn Borg        ->Adriano Panatta  Brian Gottfried   ->Cliff Richey    
 [9] Charlie Pasarell  ->John Alexander   Cliff Richey      ->Stan Smith      
[11] Corrado Barazzutti->Bjorn Borg       Francois Jauffret ->Ilie Nastase    
[13] Georges Goven     ->Manuel Orantes   Manuel Orantes    ->Ilie Nastase    
+ ... omitted several edges

Now instead of the useless one, two, three, etc. names, we have the actual year numbers as the names of the elements on each list.

So if we wanted to know the top five players for 1988 we could just type:

   top5(g[["1988"]])
   Stefan Edberg     Andre Agassi     Boris Becker    Mats Wilander 
              63               59               52               49 
Aaron Krickstein 
              48 

Note the double bracket notation and the fact that the name of the list is a character not a number (hence the scare quotes).

If we don’t want to remember the bracket business, we could also use the $ operator to refer to particular list elements:

   top5(g$"1988")
   Stefan Edberg     Andre Agassi     Boris Becker    Mats Wilander 
              63               59               52               49 
Aaron Krickstein 
              48 

Of course, we can also use the names to subset the list. Let’s say we wanted the top five players for 1970, 1980, 1990, 2000, 2010, and 2020.

All we have to do is type:

   decades <- c("1970", "1980", "1990", "2000", "2010", "2020")
   lapply(g[decades], top5)
$`1970`
      Arthur Ashe      Cliff Richey         Rod Laver        Stan Smith 
               51                49                48                45 
Zeljko Franulovic 
               42 

$`1980`
     Ivan Lendl    John Mcenroe Brian Gottfried      Bjorn Borg Eliot Teltscher 
             97              76              63              62              62 

$`1990`
  Boris Becker  Stefan Edberg     Ivan Lendl   Pete Sampras Emilio Sanchez 
            62             57             50             47             44 

$`2000`
Yevgeny Kafelnikov        Marat Safin    Gustavo Kuerten      Magnus Norman 
                63                 61                 59                 58 
    Lleyton Hewitt 
                53 

$`2010`
   Rafael Nadal   Roger Federer    David Ferrer Robin Soderling   Jurgen Melzer 
             63              54              53              53              51 

$`2020`
     Andrey Rublev     Novak Djokovic Stefanos Tsitsipas       Rafael Nadal 
                40                 36                 27                 26 
   Daniil Medvedev 
                24 

Note that we are back to the single bracket notation.

With a bit of practice, lists will become your friends!

References

Radicchi, Filippo. 2011. “Who Is the Best Player Ever? A Complex Network Analysis of the History of Professional Tennis.” PloS One 6 (2): e17249.