A core objective in constructing the database was to make data on campaign finance and elections (1) more centralized and accessible, (2) easier to work with, and (3) more versatile in terms of the types of questions that can be addressed. To these ends, I have put a great deal of effort into compiling, processing, and augmenting the database. In making the database public, I hope to provide a valuable resource to fellow researchers. A list of the main value-added features of the database is below:
Data processing: Names, addresses, and occupation and employer titles have been cleaned and standardized.
Unique identifiers: Entity resolution techniques were used to assign unique identifiers for all individual and institutional donors included in the database. The contributor IDs make it possible to track giving by individuals across election cycles and levels of government.
Geocoding: Each record has been geocoded and placed into congressional districts. The geocoding scheme relies on the contributor IDs to assign a complete set of consistent geo-coordinates to donors that report their full address in some records but not in others. This is accomplished by combining information on self-reported address across records. The geocoding scheme further takes into account donors with multiple addresses. Geocoding was performed using the Data Science Toolkit maintained by Pete Warden and hosted at http://www.datasciencetoolkit.org/. Shape files for congressional districts are from Census.gov (http://www.census.gov/rdo/data).
Ideological measures: The common-space CFscores allow for direct distance comparisons of the ideal points of a wide range of political actors from state and federal politics. In total, the database includes ideal point estimates for 51,572 candidates and 6,408 political committees as recipients and 13.7 million individuals and 1.3 million organizations as donors.
Corresponding data on candidates, committees, and elections: The recipient database includes information on voting records, fundraising statistics, election outcomes, gender, and other candidate characteristics. All candidates are assigned unique identifiers that make it possible to track candidates if they campaign for different offices. The recipient IDs can also be used to match against the database of contribution records. The database also includes entries for PACs, super PACs, party committees, leadership PACs, 527s, state ballot campaigns, and other committees that engage in fundraising activities.
Identifying sets of important political actors: Contribution records have been matched onto other publicly available databases of important political actors, including Fortune 500 directors and CEOs, members of the Forbes 400, state supreme court justices, health care professionals, and executives appointees to federal agencies. (Please contact firstname.lastname@example.org to request access to these databases).