Following up on the string of posts about software for network analysis, I recently taught a workshop for PhD students in the social sciences here at Stanford on using Python for network analysis.  My session was part of a three day series of workshops introducing computational social science to students who are looking to get their feet wet.  I’m posting a link (here) to the page on my website where you can download the materials I developed to teach the workshop, including commented scripts, sample datasets, and a few slides.

Some brief impressions:  I’ve taught stats/methods for grad students before, but this was a different beast.  Computational social science and network analysis are attractive areas for many grad students here, but without a ‘canon’ of some type to fall back on, it’s hard to know what to emphasize for students with little background.  I ended up focusing more basic data and control structures in Python, which I thought would be more useful for understanding the way the networkx package handles inputs and outputs.  I’m not sure that was the most effective approach, though–at least in terms of conveying why Python is a good choice for network analysis.  Next time, I think I’ll try to integrate more substantive examples.

Also, inspired by Ben’s last post–maybe we should put a few network analysis packages to a speed test?  I get this question all the time, and I usually just refer to my own anecdotal evidence, but it’s probably worth pitting iGraph, networkx, etc. across platforms against one another in calculating, say, shortest paths in a relatively large network.  More on this later…

P.S. It’s only taken me 3 months to write my first post!

  • Thanks for posting. I’ve started to teach myself some Python in part to try out the network packages. The materials look very helpful.

  • Thanks for the post, Dan. I would actually really like to see a speed test between networkx in Python, and igraph, and sna/network in R. I imagine there are things they each do well, but probably differs depending on the scale and the task. For instance I don’t know of any Python module that does something like the ERGM in Ben’s post. But Python may be better in how it handles its memory and doing many in-memory operations.

    • Dan’s tutorial has a couple tests–reading data and calculating transitivity–that, at least on my machine, suggest igraph has a major advantage over networkx in Python (0.023 vs 0.41s on reading and 1.39 vs 0.03s on transitivity).

      I kind of ran the same tests in R for network and igraph. I converted the data to Pajek files first and the elapsed time using network’s read.paj() was 17.888s, compared to igraph’s read.graph() at 0.120s. The network package doesn’t exactly have a transitivity function to my knowledge: sna does, but I believe the data would need to be converted to a matrix prior (22k nodes, forget about it), and ergm can count the transitive triads using summary(er.net~transitivity), so I went with that. igraph calculated transitivity at 0.015s and the transitive triad count took ergm 54.406s. The test isn’t a perfect, but igraph seems to be much faster in both Python and R in terms of reading data and calculating transitivity.

      • Dan Wang

        Right–and I would add that there also seems to be a trade-off in usability vs. speed. Networkx, I think, is great for instructing people with some knowledge linked lists or hashmap data structures on network analysis. However, as Ben pointed out, it’s substantially slower than iGraph. I think Ben pointed out in his earlier post that iGraph is natively programmed in C++ (even in the Python version), which greatly speeds things up. Networkx is built entirely using python, making heavy use of the dictionary data structure, which is not exactly the most efficient way of storing graphs (for example, an undirected edge A-B has to be stored as two key value pairs in a networkx graph, A->B and B->A). iGraph, however, is somewhat unintuitive, especially for simple exploratory tasks, like looking up data attached to a node or edge.

        What’s interesting, though, is that I don’t think there has to be this trade-off between speed and user-friendliness at all. It seems that the interactive interface of iGraph could be improved without sacrificing any speed at all.

        • Huh. Maybe there’s a possibility for a wrapper to iGraph, if it doesn’t exist already?

          • There’s the GUI project called OpenPajek which is based off of iGraph.

            https://code.google.com/p/openpajek/

            Though if it tries to closely emulate Pajek then it probably won’t solve the user interface problem, despite being GUIfied. I don’t know though, I haven’t used it. The last update was a year and a half ago, prior iGraph’s major version 0.6 update.

        • Wishing that someone raised the matter of speed AND user-friendliness in Pajek’s early days. E.g., How to calculate transitivity in Pajek:

          http://list.fmf.uni-lj.si/pipermail/pajek/2012-June/001179.html

          Unfortunately, there’s something of a path dependence built into all software. Changing one function could mean changing all functions that call on it along with the scripts everyone has written for it. For example, it was a very big deal when iGraph in R changed 0 to 1 as the originating point for its objects indexes. On the one hand, it met with conventional practice in R, on the other hand it broke everyone’s iGraph R scripts written prior to the update. (It seems like it’s not complete, either, as component indexing seems to begin at 0.)

        • Dan Wang

          Alex, a friend of mine wrote a wrapper function to append some igraph commands to netx graph objects, https://github.com/spool/nxigraph, but it’s not very robust. And, of course, it doesn’t address the speed issue. It’s rare, but sometimes, I do have to use both netx and igraph for the same script. For example, I think netx’s random graph generators are better–but analyzing the generated graphs is far faster with igraph.

          One of my collaborators, Jure Leskovec, developed a set of SNA tools for his dissertation project while he was at CMU, http://snap.stanford.edu/snap/. It’s all in C++, but it takes advantage, of what were some cutting-edge graph-theoretic algorithms at the time.

  • Pingback: #SNAc week 1: what are networks and what use is it to study them? | @annindk's CPD()