Recent Activity

Presentations, working papers, and relevant GitHub repositories are listed in this page.

Presentations

Working Papers

GitHub Repositories

*eventdata: This is the main hub for various repositories related to this project. For more information, please visit GitHub.

*PETRARCH2: A Python Engine for Text Resolution And Related Coding Hierarchy part 2. PETRARCH is a natural language processing tool for machine-coding events data. It is designed to process fully-parsed news summaries in Penn Treebank format, from which ‘whom-did-what-to-whom’ relations are extracted. For more information, please check this on GitHub.

*UD Petrarch: Code for the new Python Engine for Text Resolution And Related Coding Hierarchy (PETRARCH) event data coder. The coder now has all of the functions from the older TABARI coder and the new CAMEO.verbpatterns.140609.txt dictionary incorporates both parser-based matching and extensive synonym sets. The program coded 60,000 AFP sentences from the GigaWord corpus without crashing, using the included dictionaries. For more information, please visit the documentation.

*Focus Locality Extraction: This repository provides a tool that helps to extract geolocations automatically from unstructured text-based news reports. For more details, please check this on GitHub.

*Mordecai: Full text geoparsing as a Python library. Extract the place names from a piece of text, resolve them to the correct place, and return their coordinates and structured geographic information. For more details, please check this on GitHub.

*Synset_Validator: This is a tool to facilitate the translation of english CAMEO dictionary to other foreign languages. This tool is designed to provide: (1) a structured interface for translation: We are going to translate actions and and associated text pattern, known as “rule”. (2) a collaboration between translators: It becomes a channel between human translators (i.e coders.) who are usually geographically distant.(3) an automatic generation of translated version of the CAMEO dictionary using the statistics of the user given feedbacks. For more details, please check this on GitHub.

*APART: Automatic Political Actor Recommendation In Real Time. It is a frequency-based actor ranking algorithm using partial string matching-based (e.g., Levenshtein/Edit distance, MinHash, etc.) actor grouping for dynamic new actor recommendations over multiple time windows. For more information, please check this on GitHub.

*Web Scrapper and Crawler: This web scraper is used by the Open Event Data Alliance. The scraper functions by specifying a whitelist of trusted RSS feed URLs and scraping the articles from these RSS feeds. The scraper makes use of goose in order to scrape arbitrary pages, and stores the output content in a MongoDB instance. For more information, please check this on GitHub and for its documentation.