EPSRC logo

Details of Grant 

EPSRC Reference: EP/K015206/1
Title: Natural Language Processing Working Together with Arabic and Islamic Studies
Principal Investigator: Atwell, Professor ES
Other Investigators:
Dickins, Professor J
Researcher Co-Investigators:
Dr C Brierley
Project Partners:
Department: Sch of Computing
Organisation: University of Leeds
Scheme: Standard Research
Starts: 01 April 2013 Ends: 30 September 2015 Value (£): 336,632
EPSRC Research Topic Classifications:
Artificial Intelligence Comput./Corpus Linguistics
Human Communication in ICT
EPSRC Industrial Sector Classifications:
No relevance to Underpinning Sectors
Related Grants:
Panel History:
Panel DatePanel NameOutcome
06 Nov 2012 *ICT* Announced
Summary on Grant Application Form
Summary

This is an interdisciplinary project which addresses the ICT call of "working together" by aligning ICT expertise and research interests from Computational and Corpus Linguistics, with Humanities research streams in Arabic and Islamic Studies, focusing on the Qur'an as a core text. It is also an international collaboration between the Universities of Leeds and Jordan, and further addresses the "working together" call via incoming and outgoing mobility in the form of Visiting Researcher placements in the School of Computing at Leeds (incoming) and the Centre for the Study of Islam in the Contemporary World at Jordan (outgoing). This agreement is proactive and novel, and has high impact, ensuring knowledge transfer from different methodological perspectives and cultures.

The study of Tajwid or Qur'anic recitation is a sub-field and taught module* in Islamic Studies programmes at both universities and elsewhere, and the original insight informing this project is to view Tajwid mark-up in the Qur'an as additional text-based data for computational analysis. This mark-up is already incorporated into Qur'anic Arabic script, and identifies prosodic-syntactic phrase boundaries of different strengths, plus gradations of prosodic and semantic salience through colour-coded highlighting of pitch accented syllables, and hence prosodically and semantically salient words.

The Computational Linguistics Module in Year 1 entails development and evaluation of software for generating a phonetically-transcribed, stressed and syllabified version of the entire text of the Qur'an, using the International Phonetic Alphabet (IPA). This canonical pronunciation tier for Classical Arabic will be informed and evaluated by Arabic linguists, Tajwid scholars, and phoneticians, and published in an updated version of the open-source Boundary-Annotated Qur'an Corpus [1], [2], preferably for LREC2 2014. The software will also be re-usable for Natural Language Engineering applications for Modern Standard Arabic, and for constructing dictionaries for Arabic language learners.

The Text Analytics Module in Year 2 implements statistical techniques such as keyword extraction3 to explore semiotic relationships between sound and meaning in the Qur'an, invoking a Saussurean-type view of the sign as '...a bi-unity of expression and content...' [5]. Our investigation entails: (i) text data mining for statistically significant phonemes, syllables, words, and correlates of rhythmic juncture [6], [7]; and (ii) interpretation of results from interdisciplinary perspectives: Corpus Linguistics (ICT); Tajwid science, plus Tafsir or Qur'anic exegesis (Islamic Studies); Arabic (Language and Literature); and Phonetics and Phonology (Linguistics).

In terms of ICT applications, the team will collaborate with stakeholders and beneficiaries to develop an associated or follow-on funding proposal for the UK Research Councils, to include publication of project software as an advanced corpus-query and visualization tool for Islamic Studies and Humanities scholars, plus Arabic language learners. This again represents an extension of the "working together" theme.

Finally, our approach is interdisciplinary and pioneers stylistic analysis of sound and rhythm encoded in writing as a semiotic system for religious and other literary texts. As such it is entirely novel and has direct implications for research-led teaching in both partner institutions plus a broad cross-section of research groups and user communities, namely: Natural Language Processing and Artificial Intelligence; Qur'anic and Islamic Studies; Arabic Language and Literature; Linguistics and Phonetics; Digital Humanities; and Psychology.

All references appear in Case for Support.

Key Findings
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Potential use in non-academic contexts
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Impacts
Description This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Summary
Date Materialised
Sectors submitted by the Researcher
This information can now be found on Gateway to Research (GtR) http://gtr.rcuk.ac.uk
Project URL:  
Further Information:  
Organisation Website: http://www.leeds.ac.uk