12 Introduction to Matrices
The lessons so far have been dedicated to exploring the various ways social networks can be represented in graph form, and computing metrics directly from graphs. This lesson will focus on using a matrix as a second way of representing networks. More accurately, as noted in sec-networks, matrices are useful way to represent graphs quantitatively, which in their turn, represent networks (flashback to the three-step network shuffle).
Networks represented as graphs feel quite intuitive to us. Actors are points (nodes) and links or interactions between actors are lines (edges). We feel as if we can get a good sense of the network by looking at the pictorial network diagram, with points representing actors and lines representing pairwise links between actors.
Networks represented as a matrix, however, are a bit more abstract. They are far more mathematically useful however. With many tens, hundreds, thousands, or millions of nodes and edges, creating a graph with these amounts of data only results in what network analysts call a “hairball.” Nothing can be understood by looking at such a picture.
12.1 Matrices
Thus, switching to representing social networks as a matrix provides us with more analytic leverage. This was a brilliant idea that first occurred to Elaine Forsyth and Leo Katz in the mid 1940s (Forsyth and Katz 1946). When we represent the network as a matrix, we are able to efficiently calculate features of the network that we would not be able to estimate via “eyeballing.”
What is a matrix?1 A matrix is, quite simply, a set of attributes that represent the values of a particular case. Breaking that explanation down, we can imagine a matrix as in Table tbl-genmatrix. This common matrix, which we will refer to as an attribute-value matrix, a toy example of which is presented in Table tbl-genmatrix, seems similar to a spreadsheet. Well, that is because a spreadsheet is a matrix!
Attribute 1 | Attribute 2 | Attribute 3 | |
---|---|---|---|
Case 1 | Value 1 | Value 4 | Value 7 |
Case 2 | Value 2 | Value 5 | Value 8 |
Case 3 | Value 3 | Value 6 | Value 9 |
Case 4 | Value 10 | Value 11 | Value 12 |
Case 5 | Value 13 | Value 14 | Value 15 |
The most important feature of a matrix is thus its organization into rows and columns. The number of rows and the number of columns define the dimensions of the matrix (like the length and and the width of your apartment define its dimensions in space). So when we say that a matrix is 5 \(\times\) 3 when mean that it has five rows and three columns. When referring to the dimensions of matrix the rows always come first, and the columns always come second. So the more general way of saying is that the dimensions of a matrix are R \(\times\) C, where R is the number of rows and C is the number of columns.
The intersection of a particular row (say row 2 in Table tbl-genmatrix and a particular column (say column 3 Table tbl-genmatrix defines a cell in the matrix. So when referring to a particular value in Table tbl-genmatrix we would speak of the \(ij^{th}\) cell in the matrix (or \(c_{ij}\)), where c is a general stand-in for the value of a given cell, i a general stand-in for a given row, and j is a generic stand-in for a given column. We refer to i as the matrix row index and j as the matrix column index.
Typically, we give matrices names using boldfaced capital letters, so if we call the matrix shown in Table tbl-genmatrix, matrix B, then we can refer to a specific cell in the matrix using the notation b\(_{ij}\) (note the lowercase), which says “the cell corresponding to row i and column j of the B matrix.”
Thus, in Table tbl-genmatrix, cell b\(_{32}\) refers to the intersection between row 3 (representing case 3) and column 2 (representing attribute 2), where we can find value 6. For instance, let’s say cases are people and attributes are information we collected on each person (e.g., by surveying them) like their age, gender identity, and racial identification and so forth. Thus, if attribute 2 in Table tbl-genmatrix was age, and case 3 was a person, then value 6 would record that persons age (e.g., 54 years old).
12.1.1 Relationship Matrices
We do not generally use attribute-value matrices to represent networks. Instead, we typically use a particular type of matrix called a relationship matrix. A relationship matrix is when, instead of asking what value of an attribute a case has, we ask about the value of describing how a case relates to other cases. If attribute-value matrices relate cases to attributes, then relationship matrices relate cases to one another (which is precisely the idea behind a “network”).
To do that, we put the same list of cases on both the rows and columns the matrix. Thus, we create a matrix with the organizational properties shown in Table tbl-relmatrix.
Case 1 | Case 2 | Case 3 | |
---|---|---|---|
Case 1 | Value 1 | Value 2 | Value 3 |
Case 2 | Value 4 | Value 5 | Value 6 |
Case 3 | Value 7 | Value 8 | Value 9 |
A relationship matrix thus captures exactly that, the relationship between two cases as shown in Table tbl-relmatrix. So each cell, as the intersection of two cases (the row case and column case) gives us the value of the relationship between the cases. This value could be “friends” (if the two people are friends) or “not friends” (if they are not friends). The value could be the strength of the relationship. For instance each cell could contain the number of times a given case (e.g., a person) messaged another case.
Relationship matrices are different from attribute value matrices, in that the latter are typically rectangular matrices. In a rectangular matrix, the number of rows (e.g., people) can be different from the number of columns (e.g., attributes). For instance, the typical attribute-value matrix used by social scientists who collect survey data on people are typically rectangular containing many more cases (rows) and columns (attributes). Some networks, like two mode networks represented as bipartite graphs, are best studied using rectangular matrices.
Relationship matrices have some unique attributes. For instance, all relationship matrices are square matrices. A square matrix is one that has the same number of rows and columns: \(R = C\). So the relationship matrix shown in Table tbl-relmatrix is \(3 \times 3\). A square matrix with n rows (and thus the same number of columns) is said to be a matrix of order n.
12.1.2 Diagonal versus off-diagonal cells
In a relationship matrix, we need to distinguish between two types of cells. First, there are the cells that fall along the main diagonal an imaginary line that runs from the uppermost left corner to the lowermost right corner; these are called diagonal cells, the values corresponding to which are shown in italics in Table tbl-relmatrix. So if we name the matrix in Table tbl-relmatrix matrix A, then we can see that any cell a\(_{ij}\) in which i = j falls along the main diagonal; these are Values 1, 5, and 9 Table tbl-relmatrix. Every other cell in which i \(\neq\) j, is an off-diagonal cell.2
In reference to the main diagonal, off-diagonal cells are said to be above the main diagonal if the row index for that cell is smaller than the column index (e.g., a\(_{i < j}\)). So in Table tbl-relmatrix, values 2, 3, and 6, corresponding to cells a\(_{12}\) a\(_{13}\) and a\(_{23}\), respectively, are above the main diagonal. In the same way, cells in which the row index is larger than the column index are said to be below the main diagonal (e.g., a\(_{i > j}\)). So in Table tbl-relmatrix, values 4, 7, and 8, corresponding to cells a\(_{21}\) a\(_{31}\) and a\(_{32}\), respectively, are below the main diagonal.
Note that in a square matrix, the values above and below the main diagonal have a “triangular” arrangement. Accordingly, sometimes we refer to these areas of a square matrix as the upper and lower triangles.
Note also that if the relationship matrix represents the relationship between the cases, and the cases are people in a social network, then the diagonal cells in a relationship matrix represent the relationship of people with themselves! Now if you have seen M. Night Shyamalan movies about people with split personalities, it is quite possible for people to have a rich set of relationships with themselves. Some of these may even form a social network inside a single head (Martin 2017). But we are not psychiatrists, so we are interested primarily in interpersonal not intrapersonal relations.
Case 1 | Case 2 | Case 3 | |
---|---|---|---|
Case 1 | – | Value 1 | Value 2 |
Case 2 | Value 3 | – | Value 4 |
Case 3 | Value 5 | Value 6 | – |
This means that most of the time, we can ignore the diagonal cells in relationship matrices and rewrite them as in Table tbl-relmatrixnd, in which values appear only for the off-diagonal cells. So here we can see the relationship between Case 1 and Case 2 is Value 1, and the relationship between Case 2 and Case 1 is Value 3. Wait, would that mean Value 2 and 4 are the same? The answer is maybe. Depends on what type of network tie is being captured, as these were discussed in the lesson on graph theory. If the tie is symmetric (and thus represented in an undirected graph), then the values will have to be the same. But if the asymmetric (and thus represented in a directed graph) then they don’t have to be.
By convention, in a relationship matrix, we say that the case located in row i sends (a tie) to the case located in column j, so if the relationship matrix was capturing friendship, we might say that i considers j to be a friend (sends the consideration) and so if i is Case 1 (row 1) and j is Case 2 (column 2), that would be Value 1 (e.g., “Are we friends?” Value 1 = Yes/No). But when i is now Case 2 (row 2) and j is Case 1 (column 1), we are now asking if Case 2 considers Case 1 to be a friend (e.g., “Are we friends?” Value 3 = Yes/No). If friendship is considered an asymmetric tie in this case, then that could be true, or it could not be. For instance, Case 2 can rebuff Case 1’s friendship offer.
Note that if the tie we recorded in a relationship matrix is symmetric, we can simplify the presentation even further. The reason is that as already noted, if a relationship is symmetric, then the value of the tie that i sends to j is necessarily the same as the value of the tie that j sends to i. This means that, in the relationship matrix, the value of cell a\(_{ij}\) has to be the same as the value of the cell a\(_{ji}\) for all rows i and columns j in the matrix. This yields a symmetric relationship matrix, like that shown in Table tbl-relmatrixndsymm.
Case 1 | Case 2 | Case 3 | |
---|---|---|---|
Case 1 | – | Value 1 | Value 2 |
Case 2 | Value 1 | – | Value 3 |
Case 3 | Value 2 | Value 3 | – |
Note that a symmetric relationship matrix is simpler than its asymmetric counterpart, because now we only have to worry about half of the values. So before, in Table tbl-relmatrixnd we had to worry about six distinct relationship values, but now we only have to worry about three. This means that, in a symmetric matrix, all the network information we need to look at is contained in either the lower triangle or the upper triangle. As we will see, in many applications, we can ignore one of the triangles altogether!
There are many types of relationship and attribute-value matrices as the basic principles just stated can be varied to capture different underlying facets of relationships. Subsequent lessons will cover various ways different aspects of networks can be best captured in matrix form and then manipulated to produce sociologically meaningful results.
References
Note that the answer to this question is simpler than the more profound: “What is the Matrix?” https://www.youtube.com/watch?v=O5b0ZxUWNf0↩︎
The mathematical symbol for “does not equal” is \(\neq\).↩︎