Croeso!
Eurfa is the largest Welsh dictionary under a free license, and it was the first dictionary of a Celtic language to list verbal inflections and mutated forms. You can find out more about Eurfa here, or see a list of the main abbreviations used in the dictionary. And, for interest, a poem in Old Irish about words: Pangur Ban.
Eurfa now includes in-context citations for most words from a number of corpora:
Bilingual (Welsh-English, Welsh-Spanish)
- The 18m-word Kynulliad3 corpus (K3). This contains formal written Welsh (the majority of it translated from English).
- The 450k-word Siarad corpus (S). These transcribed conversations contain "Welsh as she is spoke", including English codeswitches. For readability, the version here (download) removes much of the transcription marking.
- The 200k-word Patagonia corpus (P). These transcribed conversations contain spoken Welsh from Patagonia. This has fewer codeswitches, and many of them are in Spanish rather than English. For readability, the version here (download) removes much of the transcription marking.
- The 200k-word Korrect/Kywiro corpus (Ko). This contains Welsh translations of English text in free/open software programs.
Monolingual (Welsh only)
- A 220k-word subset of the 300k-word CIG1 child (18-30 months) language acquisition corpus (Kig1), containing non-child utterances only. The version here removes much of the transcription marking.
- A 100k-word subset of the 560k-word CIG2 child (3-7 years) language acquisition corpus (Kig2), containing non-child utterances only. The version here removes much of the transcription marking.
Cyfieithu (mae ieithmon yn athronydd ...)
Hudol yw tân syniadau y dewin
sy'n dewis hen dalpiau
o lofa iaith, a'i hailfywhau
yn aur o newydd eiriau.
(David Evan Morris)