Basics of the igraph Package

with tags igraph -

There are multiple packages for the analysis of networks in R. This page concentrates on the igraph package, which allows for a broad range of applications. But before we get into it in more detail, it is useful to know that there are two possible ways to represent the edges, i.e. the connections, of a network:

  • Adjacency matrix: This is a square matrix, where each row and column corresponds to an entity. If two entities are conencted, the respective field in the matrix takes the value one and zero otherwise.
  • A list of connections: In its most simple form this is a list, where each edge of a network is represented by a row with one entity in the first column and the other in the second. For the purpose of this post this is our preferred representation.

Create an artificial graph

We can illustrate these two representations by looking at an artificial network. Such a network could be generated with certain functions of the igraph package. However, the following code only uses base functionalities of R. It results in a data frame with the names of the connected entities in the first and second row. The third row contains a random indicator for the strength of the connection. It is based on the square of a value from a standard normal distribution. To reduce the number of resulting edges, only values above a certain threshold are kept. Also, the code excludes the connections a node has with itself.

# Set seed for reproducibility
set.seed(12345)

# Generate data
raw <- data.frame(byer = rep(letters, 1, each = 26),
                  sllr = rep(letters, 26),
                  con = rnorm(26 * 26)^2) # Calculate arbitrary weights

# Drop connections with oneself
raw <- raw[raw$byer != raw$sllr,]

# Only keep strong connections
raw <- raw[abs(raw$con) > 1,]

# Reformat row numbers
rownames(raw) <- NULL

# Look at result
head(raw)
##   byer sllr      con
## 1    a    f 3.304964
## 2    a    l 3.302623
## 3    a    s 1.255997
## 4    a    v 2.119310
## 5    a    x 2.412236
## 6    a    y 2.552676

Basically, this is a list repesentation of the artifical network. In a next step we convert the data frame to an igraph object.

igraph objects

# Load package
library(igraph)

The transformation is straightforward and can be done with the graph_from_data_frame function, where for simplicity we set the argument directed = FALSE to indicated that the network is not directed. Note that it is important that the first and second column of the data frame contain the names of the nodes that have a connection.

graph_df <- graph_from_data_frame(raw, directed = FALSE)

The function simplify can be used to get rid of loops and multiple edges:

graph_df <- simplify(graph_df)

Now take a look at the object:

graph_df
## IGRAPH 2e4c8fd UN-- 26 173 -- 
## + attr: name (v/c)
## + edges from 2e4c8fd (vertex names):
##   [1] a--f a--g a--l a--m a--s a--v a--x a--y a--z b--d b--f b--g b--h b--k b--l
##  [16] b--m b--o b--p b--q b--r b--t b--u b--v b--w b--x b--z c--g c--h c--j c--l
##  [31] c--m c--n c--p c--q c--r c--s c--u c--v c--w c--x c--z d--e d--f d--h d--o
##  [46] d--q d--t d--x d--z e--f e--g e--h e--i e--j e--k e--o e--q e--r e--s e--t
##  [61] e--v e--w e--x e--y e--z f--g f--i f--k f--l f--m f--n f--q f--r f--u f--z
##  [76] g--l g--m g--p g--s g--t g--u g--w g--x g--y g--z h--m h--o h--r h--s h--t
##  [91] h--v h--x h--y h--z i--j i--k i--n i--q i--u i--v i--y i--z j--m j--p j--r
## [106] j--t j--u j--v j--w j--y j--z k--l k--m k--n k--p k--q k--t k--u k--v k--x
## + ... omitted several edges

It does not look like the data frame list. However, it can be seen that each element of the list of edges refers to a conenction between to entities.

Let’s compare this list to the adjacency matrix representation. It can be obtained directly from the igraph object by using the as_adj function.

graph_adj <- as_adj(graph_df)

Now look at the first six rows and columns of the resulting object.

graph_adj[1:6, 1:6]
## 6 x 6 sparse Matrix of class "dgCMatrix"
##   a b c d e f
## a . . . . . 1
## b . . . 1 . 1
## c . . . . . .
## d . 1 . . 1 1
## e . . . 1 . 1
## f 1 1 . 1 1 .

An existing connection between two nodes is indicated by the value one. If there is no connection between them, the field contains a dot. For example, the (undirected) connection between a and f is described by the 1 in the sixth value of the first column and the sixth value in the first row. This corresponds to the first entry in the list representation of the above igraph object. If the network were directed, only one field would contain a 1 - unless the connection was mutual.

Note that the resulting object is not an object of class igraph. as_adj produces a dgCMatrix object, which comes with the Matrix package and was specifically designed to implement so-called sparse matrices. Sparse matrices are simply matrices which contain a lot of zeros. By only considereing non-zero values, sparse matrices allow for a lot of additional computational efficiency, which is beneficial for the analysis of very large networks.

Now that we are familiar with the basic structure of igraph objects we can proceed with the calculation of basic network summary statistics.

References

Csardi G., & Nepusz, T. (2006). The igraph software package for complex network research, InterJournal Complex Systems, 1695.