In conclusion, this dictionary is calibrated to the learners’ needs, and organized in such a way that is easy for the reader. Corpus linguistics is at the core of the effort, but a wide array of human skills and computational linguistic techniques were vital in the process.
The main frequency index
The frequency index is the main portion of this dictionary: it contains a ranked list of the top 5000
lemmas in French, starting with the highest-scoring one and progressing to the lowest-scoring word.
Each entry has the following information:
ranked score (1, 2, 3…), headword, part(s) of speech, English gloss, sample context, English translation of sample context, dispersion value, raw frequency total, indication of register variation For example, here is the entry for the word “aimer”:
242 aimer v
7F
H
T
2P
p
* tu sais que je t’aime -- you know I love you
71 | 10085 −n
This entry shows that the word (and all of its related forms) ranks 242nd among all French words in terms of combined frequency and dispersion. The part-of-speech code shows that it’s a verb. Two possible English glosses are “to like” and “to love”. One context from the corpus is shown, which uses one of the related forms of this verb: “aime”. An English translation for the usage context then appears. Next, the number “71” flags the dispersion value for the word on a scale from 27 to 100; the word and its forms are reasonably evenly spread across the corpus. The number “10085”
indicates the raw frequency, or how many times the word and its related forms occur in the corpus.
Finally, a register code –n indicates that this word is noticeably infrequent in nonfiction.
Here are some additional notes for the items appearing in the entries.
The part(s) of speech
Several categories have been combined to increase readability. For example, nadj
The English gloss
The gloss is meant to be indicative only – it’s not a complete listing of all possibilities. This is not an exhaustive bilingual dictionary. Many of these words also participate in idioms, fixed expressions, collocations, or multi-word expressions. These meanings are not included in the glosses since the focus is on single words. The glosses are written in standard American English. In certain parts of the dictionary (e.g. in the thematic lists) only shortened forms of the glosses are used.
The French usage context
As noted above, all of the usage contexts come from the corpus itself. In selecting them the goal was to find contexts that illustrate clearly the core meaning of the word as concisely as possible.
Contexts will sometimes unavoidably include words that are not in the top 5000 words, as well as occasional idiomatic usages. The contexts are taken verbatim, with only very infrequent correction (e.g. spelling errors). Capitalization is (for the most part) neutralized to improve readability.
Sometimes the contexts are not always grammatically correct, especially when taken
2 See http://www.illc.uva.nl/EuroWordNet/.
Page 7
from a spoken language transcript where speech errors, non-standard usage, and non-prescribed forms are common (e.g. “j’sais pas” vs. “je ne sais pas”). Finally, the contexts reflect real-world usage, and hence may not always be factually or politically correct. No editorial endorsement or philosophical conclusions should be ascribed to the authors on the basis of the contexts used.
T
s−
tA
e
The English translation of the context
Translations are rendered into American English, with it attendant spelling and vocabulary choices.