Details of Grant

EPSRC Reference:

EP/T023333/1

Title:

Exaggeration, cohesion, and fragmentation in on-line forums

Principal Investigator:

Pierrehumbert, Professor J

Other Investigators:

Dong, Dr X

Researcher Co-Investigators:

Project Partners:

Department:

Engineering Science

Organisation:

University of Oxford

Scheme:

Standard Research

Starts:

01 October 2020

Ends:

30 September 2024

Value (£):

604,763

EPSRC Research Topic Classifications:

Artificial Intelligence

Computational Linguistics

EPSRC Industrial Sector Classifications:

No relevance to Underpinning Sectors

Related Grants:

Panel History:

Panel Date	Panel Name	Outcome
10 Feb 2020	Responsible NLP for Intelligent Interfaces Panel 2020	Announced

Summary on Grant Application Form

On-line forums can support the formation of social communities with shared interests and needs. They can also have a negative side if groups of users support each other in divisive attitudes or false beliefs. The social fragmentation resulting from these so-called echo-chamber effects has been identified as an engine behind the rise of violence and extremism, political gridlock, and decreases in social mobility. This project is motivated by the observation that echo-chamber effects involve a gradual shift from more moderate language to more extreme language. Further, damage repair is difficult when extreme social fragmentation has already occurred. The ability to use patterns in on-line language for early detection of on-line social fragmentation would thus be a major breakthrough in supporting earlier, and more effective, intervention against harmful trends in on-line forums.

We have identified two major challenges in creating this capability. First, current NLP methods are poor at understanding expressions whose meaning is a degree on a scale, such as a scale defined on the dimensions of cost, quality, honesty, or performance. For example, "rather racist", "really racist", and "incredibly racist" express different degrees of disapproval, but such differences are not adequately captured by current algorithms. This limitation is central to our problem, because echo-chamber effects often involve incremental exaggerations of factual claims, emotions, or attitudes. The second challenge results from the fact that methods for using linguistic content in the analysis of social behaviour are limited. While much research has uncovered systematic associations between word choices and social groups, very little has addressed relationships between linguistic inferences and social trends. However, tracking the gradual shifts towards semantic extremes in echo-chamber effects requires making certain linguistic inferences. This is because inferring which underlying dimension of meaning is relevant in any specific case critically depends on information about who is talking and what they are talking about. For example, "Liverpool is far better" might to relate a scale of cultural excellence in a discussion amongst music fans, but to a scale of costs amongst people who are discussing housing. A fundamental advance in the methodology for combining linguistic and social information is thus needed to characterise echo-chamber effects on-line and make predictions about risks of future fragmentation.

The project is a new collaboration between an experimental and computational linguist (the PI) and an expert in machine learning and social network analysis (the Co-I). Its components integrate the expertise of both collaborators. Advanced text-mining and data analytics will be used to generate the materials for a large-scale and experimentally normed data set of scalar expressions, using archives of the popular on-line forum Reddit. No normed data set of this type exists, and it will provide the training and test materials needed to develop and evaluate new algorithms. Using a modular work plan, the project team will first develop and validate separate algorithms to assess and predict the meanings of scalar expressions, and the level of fragmentation in the social network of Reddit users. These components will then be integrated using advanced graph-based machine learning methods. The primary outcome of the project will be a software package that will facilitate the work of on-line moderators by flagging subReddits or threads that display early stages of echo-chamber effects. The normed data set will also be extremely valuable for improving NLP applications that require nontrivial semantic inference, such as sentiment analysis, chatbots, and question-answering systems. More generally, the project is a demonstration project for advanced methodology in processing linguistic meaning in relation to social relationships and human behaviour.

Key Findings

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Potential use in non-academic contexts

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Impacts

Description	This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised

Sectors submitted by the Researcher

This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk

Project URL:

Further Information:

Organisation Website:

http://www.ox.ac.uk