Going viral with R’s igraph package

R’s igraph package provides a simple and flexible set of utilities for working with graphs.  In this post, we’ll use this package to animate the simulated spread of a disease through a network.

Graphs

A graph is just a collection of nodes joined by edges:

In graph models, the nodes of a graph typically represent entities and the edges represent relationships between these entities. Both nodes and edges may have attributes or qualities that characterize the entities and the relationships. A node might have an attribute of “color”. An edge might have an attribute of “weight” that encodes the strength of the relationship between the vertices that the edge joins. The igraph package makes it very simple to manage the assignment of attributes to the components of a graph:

Building a simulation

The ease and flexibility with which you can assign and update attributes of a graph’s nodes and edges makes igraph a powerful tool for prototyping simulations with graphs. Let’s consider a very simplified example. Let G be a graph whose nodes are people in a population. Two nodes in G will share an edge if they have daily in-person contact with each other. Suppose also that a virus has infected a few nodes of G. These infected nodes can spread the virus to other uninfected nodes with whom they have daily contact. Newly infected nodes will then spread the virus to other nodes, and so on. Let’s use the simple probability model:

    \[p(x \mbox{ infects } y | x \mbox{ is infected}, y \mbox{ is uninfected}, x\mbox{ and } y \mbox{ share an edge})= \frac{1}{2}.\]

Here is a function that implements this simulation model:

The function spreadVirus will return an ordered list of graphs representing the discrete states of the model as the virus spreads. We’ll demonstrate the use of this function on a simple directed graph. A directed graph is a graph where edges are “one-way”, so a directed edge from nodes X to Y only allows information to pass from X to Y and not from Y to X.

This is a very simplistic model and it does not, for example, take into account the importance of time dynamic effects in the virus spread. Thanks to the flexibility of igraph, however, it is not difficult to extend a basic simulation like spreadVirus to include much more complex effects.

We should note, however, that although this implementation is very easy to understand, it is not very efficient. The heavy use of loops will be quite slow when we start working with large graphs. Though we might gain some efficiency through vectorization or other optimizations, for running larger simulations we would probably want to port this routine to another platform like C++. Nevertheless, the strength of R’s igraph for quickly building and testing a prototype should be apparent from this example.

More realistic graphs

So far, the graphs we have worked with have been quite simple. How do we start building more complex graphs? There are many great sources for real graph data like Stanford’s SNAP dataset collection. There are also many routines for generating random graphs in the igraph packages. The degree.sequence.game method in igraph is one such example. The degree of a node in a graph is the number of edges attached to that node. One way to study graphs is to look at the distribution of the degrees of the nodes in the graph. For many interesting graphs, the degree distribution often has the form of an exponential distribution:

    \[p(\mathrm{degree}(\mathrm{node}) = N ) \propto \exp\left(-\tau N \right),\]

see for example this Wikipedia article which discusses the degree distribution of all web links. As suggested in the igraph manual, we can generate a larger random graph with an exponential distribution as follows:

Here’s an example simulation on this new graph:

One thought on “Going viral with R’s igraph package

Leave a Reply

Your email address will not be published. Required fields are marked *