• Home
  • About Us
  • Research
  • PhD
  • Resources
  • Tools
  • Contact
Logo

Resources and Datasets

A list of free resources and datasets provided to you by UCREL NLP

Habibi

Arabic Song Lyrics

KALIMAT

Arabic NLP Dataset

EASC

Essex Arabic Summaries Corpus

MultiLing

Multi-document Summaries Corpora

ABMC

Arabic in Business and Management Corpora

ADD

Arabic Dialects Dataset

Annual Reports Corpus

UK Annual REports Key Sections Corpora

StratScore

N-gram list for UK annual report sections

Strategic Commentary Corpus

A corpus of UK Annual Reprots Strategic Commentary

Arabic Diseases Ontology

Arabic Infectious Diseas Ontology

Arabic Infectious Diseases Corpus

A corpus of Arabic Tweets about Infectious Diseases

CFIE

2012 - 2020

COUNTER Urdu Corpus

COrpus of Urdu News TExt Reuse

Vard

2013 - 2015

Arabic COVID Corpus

Covid-19 Arabic Tweets

CLEU Urdu Corpus

Cross-Language English-Urdu Corpus

Plant Names and Historical Places

Data and scripts for extracting plant names and collocates from historical texts

CLEU Urdu Corpus

Cross-Language English-Urdu Corpus

Human Judgements

Human Judgements of Sentiment Values

Igbo Translations

Igbo-English Machine Translations

Arabic Influenza and Covid

Influenza and Covid-19 Arabic labeled Tweets

S-BiDD

Self-reported BD diagnosis dataset