Week 9: Social Network Analysis | Computational Journalism

This week is about the analysis of networks of people, not the analysis of data on social networks. We might mine tweets, but fundamentally we are interested here in the people and their connections — the social network — not the content.

Slides.

Social networks have of course existed for as long as there have been people, and have been the subject of careful study since the early 20th century (see for example this 1951 study which compared groups performing the same a task using different network shapes, showing that “centrality” was an important predictor of behavior.) Recently it has become a lot easier to study social networks because of the amount of data that we all produce online — not just our social networking posts, but all of our emails, purchases, location data, instant messages, etc.

Different fields have different reasons to study social networks. In intelligence and law enforcement, the goal may be to identify terrorists or criminals. Marketing and PR are interested in how people influence one another to buy things or believe things. In journalism, social network analysis is potentially useful in all four places where CS might apply to journalism. That is, social network analysis could be useful for:

reporting, by identifying key people or groups in a story
presentation, to show the user how the people in a story relate to one another
filtering, to allow the publisher to target specific stories to specific communities
tracking effects, by watching how information spreads

Because we’re going to have a whole week on tracking effects (see syllabus) we did not talk about that in class.

In a complex investigative story, we might use social network analysis to identify individual people or groups, based on who they are connected. This is what ICIJ did in their Skin and Bone series on the international human tissue trade. To present a complex story we might just simply show the social network of the people and organizations involved, as in the Wall Street Journal’s Galleon’s Web interactive on the famous insider trading scandal. I haven’t yet heard of anyone in journalism targeting specific audiences identified by social network analysis, but I bet it will happen soon.

Although visualization remains the main technique, there have been a number of algorithms designed for social network analysis. First there are multiple “centrality” measures, which try to determine who is “important” or “influential” or “powerful.” There are many of these.

But they don’t necessarily compute what a journalist wants to know. First, each algorithm is based on a specific assumption about how “things” flow through the network. Betweenness centrality assumes flows are always along the shortest path. Eigenvector centrality assumes a random walk. Whether this models the right thing depends on what is flowing — is it emails? information? money? orders? — and how you expect it to flow. Borgatti explains the assumptions behind centrality measures in great detail.

Often journalists are interested in “power” or “influence.” Unfortunately this is a very complicated concept, and while there is almost certainly some correlation between power and network centrality, it’s just not that simple. Communication Intermediaries — say, a secretary — may have extremely high betweeness centrality without any real authority.

Even worse, your network just may not contain the data you are actually interested in. You can produce a network showing corporate ownership, but if company A owns a big part of company B it doesn’t necessarily mean that A “controls” B. It depends on the precise relationship between the two companies, and how much autonomy B is given. Similar arguments can be made for links like “sits on the board of.”

This also brings up the point that there may be more than one kind of connection between people (or entities, more generally) in which case “social network analysis” is more correctly called “link analysis,” and if you use any sort of algorithm on the network you’ll have to figure out how to treat different types of links.

There are also algorithms for trying to find “communities” in networks. This requires a mathematical definition of a “cluster” of people, and one of the most common is modularity, which counts how many more intra-group edges there are than would be expected by chance in a graph with the same number of edges randomly placed.

Overall, social network analysis algorithms are useful in journalism, but not definitive. They are just not capable of understanding the complex context of a real-world social network. But the combination of a journalist and a good analysis system can be very powerful.

The readings were:

Identifying the Community Power Structure, an old handbook for community development workers about figuring out who is influential by very manual processes. I hope this helps you think about what “power” is, which is not a simple topic, and traditional “analog” methods of determining it.
Analyzing the data behind Skin and Bone, ICIJ. The best use of social network analysis in journalism that I am aware of.
Sections I and II of Community Detection in Graphs. An introduction to a basic type of social network algorithm.
Visualizing Communities, about the different ways to define a community
Centrality and Network Flow, or, one good reason to be suspicious of centrality measures
The Network of Global Corporate Control, a remarkable application of network analysis
The Dynamics of Protest Recruitment Through an Online Network, good analysis of Twitter data from Spain “May 20” protest movement
Exploring Enron, social network analysis of Enron emails, by Jeffrey Heer who went on to help create the D3 library

Here are a few other examples of the use of social network analysis in journalism:

Visualizing the Split on Toronto City Council, a social network analysis that shows evolution over time
Muckety, an entire site that only does stories based on link analysis
Theyrule.net, an old map of U.S. boards of directors
Who Runs Hong Kong?, a story explained through a social network analysis tool, South China Morning Post