under construction …
This work is still in progress as part of a larger project by a Pratt-SI Digital Humanities group.
Research goals
Several key moments in U.S. history have had an impact on the ways in which surveillance is discussed in Congress. This is one component of a larger project aiming to better understand the changing rhetoric around surveillance (other components include insights into mainstream media, historically leftist, and popular culture content). Using a corpus collected from these historically significant eras (Church Committee Hearings, 9/11, and the Snowden Leaks),
DATA
Text and data were collected from the Congressional Record (CR) using the Sunlight Foundation’s Capitol Words API in correlation with the same organization’s parsing tool. My area of focus includes a set of entires to the record that feature any of a list of surveillance terms. This list was generated based on documentation from the non-partisan Congressional Research Service reports then applied to content from the record spanning 2012-2014. This time period covers the year before the Snowden leaks, the year during, and the year after.
PROCESS
Corpus building: CapitolWords APi, Python, CR Parser
- Using Congressional Research Services reports on surveillance as corpus, collect a list of terms deemed most relevant to surveillance rhetoric
- With Capitol Words, collect json files featuring terms determined from above list
- Pull URLs from each Capitol Words term file that correspond to text of corresponding entries in the record
- Use Python to collect text from URLS
- Use Sunlight Foundation’s CR parser tool to collect speakers’ names
- Recompile corpus of entries to the CR based on member of Congress
Personality insights Through the WatSon Api
- Clone Alchemy API’s Python SDK and configure it to my own API key
Visualizing with tableau
TRACKING AND STORAGE on github
Code is available on GitHub