Syllabus Fall 2018

The course is a hands-on, research-level introduction to the areas of computer science that have a direct relevance to journalism, and the broader project of producing an informed and engaged public. We study two big ideas: the application of computation to produce journalism (such as data science for investigative reporting), and journalism about areas that involve computation (such as the analysis of credit scoring algorithms.)

Alon the way we will touch on many topics: information recommendation systems but also filter bubbles, principles of statistical analysis but also the human processes which generate data, network analysis and its role in investigative journalism, visualization techniques and the cognitive effects involved in viewing a visualization.

Assignments will require programming in Python, but the emphasis will be on clearly articulating the connection between the algorithmic and the editorial. Research-level computer science material will be discussed in class, but the emphasis will be on understanding the capabilities and limitations of this technology.

Format of the class, grading and assignments.
This is a fourteen week, six point course for CS & journalism dual degree students. (It is a three point course for cross-listed students, who also do not have to complete the final project.) The class is conducted in a seminar format. Assigned readings and computational techniques will form the basis of class discussion. The course will be graded as follows:

  • Assignments: 40%. There will be five homework assignments.
  • Final project 40%: Dual students will be complete a medium-ish final project (others will have this 40% from assignments)
  • Class participation: 20%

Assignments will involve experimentation with fundamental computational techniques. Some assignments will require intermediate level coding in Python, but the emphasis will be on thoughtful and critical analysis. As this is a journalism course, you will be expected to write clearly. The final project can be either a piece of software (especially a plugin or extension to an existing tool), a data-driven story, or a research paper on a relevant technique.

Dual degree students will also have a final project. This will be either a research paper, a computationally-driven story, or a software project. The class is conducted on pass/fail basis for journalism students, in line with the journalism school’s grading system. Students from other departments will receive a letter grade.

Week 1: High dimensional data – 9/12
CS techniques can help journalism in two main ways: using computation to do journalism, and doing journalism about computation. Either way, we’ll be working a lot with the abstraction of high dimensional vectors. We’ll start with an overview of interpreting high-dimensional data, then jump right into clustering and the document vector space model, which we’ll need to study natural language processing and recommendation engines.

Slides.

References

Viewed in class

Week 2: Text analysis – 9/19
We’ll start by picking up the story of text analysis in journalism, including the development of thew Overview document mining system. Then probabilistic topic modeling (ala LDA), matrix factorization, more general plate-notation graphical models, and word embedding approaches based on deep learning. Then on to fundamental recommendation approaches such as collaborative filtering. Bringing it to practice we will look at Columbia Newsblaster (a precursor to Google News) and the New York Times recommendation engine.

Required

References

Discussed in class

Assignment:  LDA analysis of State of the Union speeches.

Week 3: Filter Design
We’ve studied filtering algorithms, but how are they used in practice — and how should they be? We will study the details of several algorithmic filtering approaches used by social networks, and effects such as polarization and filter bubbles.

Readings

References

Viewed in class

Week 4: Computational Journalism Platforms
We introduce the Overview document mining system and the Computational Journalism Workbench. Then we develop pitches for final projects, which may include writing plugins for these systems.

Readings

References

Assignment – Design a filtering algorithm for an information source of your choosing

Week 5: Quantification, Counting, and Statistics – 10/6
Every journalist needs a basic grasp of statistics. Not t-tests and all of that, but more grounded and more practical. How do we know we’re measuring the right thing? Why are we doing stats at all? Then a journalism oriented tutorial on the fundamental ideas of probability, conditional probability, and Bayes’ theorem.

Required:

Recommended

No class 10/13

Week 6: Inference – 10/20
There is a long history of fields grappling with the problem of determining truth in the face of uncertainty, from statistics to intelligence analysis. We’ll start with statistics, the notion of randomness that is so crucial to the idea of statistical significance. Then we’ll talk about determining causality, p-hacking and reproducibility, and the more qualitative, closer-to-real-world method of analysis of competing hypothesis.

Required

Recommended

Viewed in class

Week 7: Discrimination and Algorithmic Accountability – 10/27
Two topics this week. Discrimination is an important topic for reporters and for society, but analyzing discrimination data is more subtle and complex than it might seem. Algorithmic accountability is the study of the algorithms that regulate society, from high frequency trading to predictive policing. We’re at their mercy, unless we learn how to investigate them.

Required

References

Assignment: Analyze NYPD stop and frisk data for racial discrimination.

No class 11/3

Week 8: Visualization, Network Analysis – 11/3
Visualization helps people interpret information. We’ll look at design principles from user experience considerations, graphic design, and the study of the human visual system. Network analysis (aka social network analysis, link analysis) is a promising and popular technique for uncovering relationships between diverse individuals and organizations. It is widely used in intelligence and law enforcement, and inreasingly in journalism.

Readings

References

Examples:

Assignment: Compare different centrality metrics in Gephi.

Week 9 Knowledge representation
How can journalism benefit from encoding knowledge in some formal system? Is journalism in the media business or the data business? And could we use knowledge bases and inferential engines to do journalism better? This gets us deep into the issue of how knowledge is represented in a computer. We’ll look at traditional databases vs. linked data and graph databases, entity and relation detection from unstructured text, and traditional both probabilistic and propositional formalisms. Plus: NLP in investigative journalism, automated fact checking, and more.

Readings

References

Viewed in class

Assignment: Text enrichment experiments using OpenCalais entity extraction.

Week 10: Truth and Trust – 11/17
We went through The Ethics of Persuasion slides.
Computational propaganda. Structure of information operations. Fake news detection and tagging. Credibility schema. Systems to detect and combat abuse and harassment.

Speaker: Ed Bice, Meedan

Readings

References

No class 11/24

Week 11: Privacy, Security, and Censorship
Who is watching our online activities? Who gets to access to all of this mass intelligence, and what does the ability to survey everything all the time mean both practically and ethically for journalism? In this lecture we cover both the basics of digital security, and methods to deal with specific journalistic situations — anonymous sources, handling leaks, border crossings, and so on.

Readings

  • Digital Security for Journalists, Part 1 and Part 2, Stray

References

Week 12: Tracking flow and impact – 12/8
How does information flow in the online ecosystem? What happens to a story after it’s published? How do items spread through social networks? We’re just beginning to be able to track ideas as they move through the network, but it’s still very difficult to really measure the public interest impact of journalism.

Readings

References

Week 13: Final Project Presentations – 12/15

Comments are closed.