The DANTE database: what it is, how it was created, and what it can contribute to the dictionaries and lexicons of the future

author: Michael Rundell, Lexicography MasterClass
published: Dec. 2, 2011,   recorded: November 2011,   views: 9537


Related Open Educational Resources

Related content

Report a problem or upload files

If you have found a problem with this lecture or would like to send us extra material, articles, exercises, etc., please use our ticket system to describe your request and upload the data.
Enter your e-mail into the 'Cc' field, and we will keep you updated with your request's status.
Lecture popularity: You need to login to cast your vote.


Dante ( is a lexical database which provides a fine-grained, corpus-based description of the core vocabulary of English. Every fact recorded in the database is derived from, and explicitly supported by, evidence from a 1.7 billion-word corpus of current English. Almost all of these facts are machine-retrievable. Dante – the Database of ANalysed Texts of English – was designed and created for Foras na Gaeilge by the Lexicography Master Class and an 18-strong team of skilled lexicographers, using the Sketch Engine ( for corpus-querying, and IDM’s Dictionary Production System (DPS: for entry-building. The resulting database records the semantic, grammatical, combinatorial, and text-type characteristics of over 42,000 single-word lemmas and 23,000 compounds and phrasal verbs, and includes over 27,000 idioms and phrases, underpinned by over 600,000 sentence examples from the corpus. The project pioneered new approaches in project management, software customisation, text origination, and quality control. Collectively, these initiatives enabled us to achieve significant levels of automation (hence cost saving) in the lexicographic process, as well as greater systematicity. Most of these innovations are transferable, so our experience on the Dante project has implications for lexicographic methodology as a whole. Though Dante began life as an ‘English framework’ destined for the development of a new English-Irish dictionary ( it was designed to be a linguistic resource beyond this primary function. It offers publishers a launchpad for the development or updating of monolingual or bilingual dictionaries, and provides rich data for researchers, software developers, and materials writers. In this talk we will discuss the project’s methodological innovations, demonstrate the wealth and range of data in Dante, and reflect on the long-term potential of this unique database.

See Also:

Download slides icon Download slides: elex2011_rundell_dante_01.pdf (443.6 KB)

Help icon Streaming Video Help

Link this page

Would you like to put a link to this lecture on your homepage?
Go ahead! Copy the HTML snippet !

Write your own review or comment:

make sure you have javascript enabled or clear this field: