A community of authors, partners and contributors

Contact us at our google group:



James Honaker is a Research Associate at the Center for Research on Computation and Society at Harvard University.



Vito D'Orazio is an Assistant Professor in the School of Economic, Political, and Policy Sciences at the University of Texas at Dallas.




Zelig: Everyone's statistical software

Zelig is a framework that brings together an abundance of common statistical models found across R packages into a unified interface, and provides a common architecture for estimation and interpretation, as well as bridging functions to absorb increasingly more models into the collective library.

The TwoRavens project uses the Zelig statistical library to power statistical analyses.


The Dataverse Project

Dataverse is a software application that enables institutions to host research data repositories. It provides a preservation and archival infrastructure, while researchers can share, keep control of and get recognition for their data through an easy to access web browser interface. Dataverse supports the sharing of research data with a persistent data citation, and data publishing and management workflows with versioning and metadata standards.

The TwoRavens project connects to data archived in any Dataverse repository, including over 25,000 public tabular datasets in the Harvard Dataverse instance.

Privacy Tools

Privacy Tools for Sharing Research Data

This project is a broad, multidisciplinary effort to help enable the collection, analysis, and sharing of personal data for research in social science and other fields while providing privacy for individual subjects. In particular, we aim to build an array of computational, statistical, legal, and policy tools that can be incorporated into data repositories to make privacy-protective data-sharing easier for lay researchers.

A version of the TwoRavens project serves as the interface for some of the prototype tools for releasing privacy preserving statistical summaries.

Event Data

Modernizing Political Event Data for Big Data Social Science Research

Begun in January of 2016, this project generates large-scale event data across multiple news sources in English, Spanish, and Arabic. It has a generalized architecture capable of handling additional languages with minimal changes. Modern natural language processing tools and algorithms are used to extract events, accurately identify named entities, and to geolocate their actions.

Users will access the event data, either to query the database or for statistical analysis, through the TwoRavens interface.