Infrastructure For Political And Social Event Data Using Machine Learning

This project contributes to the computerized extraction of conflict event data at a global scale. Most conflict event data are expensively coded by humans from news reports. This project relies on recent advances in artificial intelligence (AI) and large language models (LLM) to address that problem, and builds on earlier NSF efforts that created a publicly available large language model to study inter- and intra-state conflict, called ConfliBERT (Hu et al., 2022). This project expands ConfliBERT to multilingual settings, including Arabic and Spanish. It also focuses on creating network data for individuals, groups, locations, and events. This project will also build a research community to foster training, education, and outreach with groups at local, national, and international levels, including academia and government. Ultimately, our work can help researchers and policymakers better understand conflict in foreign locations with high accuracy and in real time.

This website also includes information about our past NSF-funded real-time and historical event data API project entitled “Modernizing Political Event Data for Big Data Social Science Research,” which led to the creation of event data on political and social events around the globe to better understand the complex dynamics of international relations and civil conflict. The data provide deep spatial and temporal coverage for events that affect regional, national, and international domains, and are drawn from multiple language sources and made freely available within hours of the events occurring. We also provide the software and methodologies needed to analyze the data (please see the Research Papers page for more information).

To access the real-time event data coded using the CAMEO framework, please see the Data page

This material is based on work supported by the National Science Foundation under Grant No. OAC-2311142, Grant No. SBE-SMA-1539302, and Grant No. OAC-1931541, Resource Implementations for Data Intensive Research in the Social, Behavioral and Economic Sciences (RIDIR). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by National Science Foundation grant number ACI-1548562, and the Extreme Science and Engineering Discovery Environment (XSEDE) Jetstream at TACC / Indiana University through allocation SES170012.

If you would like more information about our project or would like to join, please contact Dr. Patrick T. Brandt.