This package includes a pronunciation dictionary for Modern Standard Arabic ASR. It has been used in combination with the Kaldi Gale Recipe.
» Go to page
This package includes files for building Arabic ASR using the GALE database from LDC and the Kaldi Speech Recognition Toolkit.
The test set is a mix of conversational and report speech
» Go to page
The QED Corpus is an open multilingual collection of subtitles for educational videos and lectures collaboratively transcribed and translated over the AMARA web-based platform.
The current release of the QED Corpus v1.4 contain 20 languages distributed over 44620 files.
» Go to page
This corpus contains speech from Al Jazeera with both human-annotated and automatically-assigned labels for MSA and four major dialect groups (Egyptian, Levantine, North African, Gulf).
» Go to page
This is a novel Arabic corpus that unifies stance detection, stance rationale, relevant document retrieval and fact checking. The corpus contains 422 claims that are made about the war in Syria and related Middle East political issues, where each claim is labeled for factuality, indicating whether they are True or False The corpus also contains 3,042 articles that are retrieved for these claims, where each claim-article pair is annotated for stance indicating whether the article agrees, disagrees, discusses or is unrelated to the claim. The corpus also points to which sentence(s) from the articles corresponds to the stance rationale. This is the first corpus to offer such a combination.
A collection of parallel Arabic-English tweets and an additional list of Twitter accounts that post parallel tweets.
» Go to page
Copyright Qatar Computing Research Institute. All rights reserved.