Creative Commons License

Manulex and Manulex-infra's data (including the result sets obtained through the usage of the site) are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Permissions beyond the scope of this license may be available by contacting us.

Key Numbers
  • 54 schoolbooks

  • 1.9 million words extracted

  • 48,886 wordforms

  • 23,812 lemmas


Feel free to send us any question, or suggestion.

If you find a problem in the site, please let us know so that we can fix it as soon as we can!

Contact email : Bernard Lété

Contact form

ANR Logo

Financial support for this work was obtained through grants from the French National Research Agency (ANR) awarded to Bernard Lété:

  • Developing a neuro-computational model of learning to read (ANR-06-BLAN-0337-03)

  • Differential Diagnostic of learning of mild-retarded children (ANR-06-APPR-005)

The research was under the direction of Jonathan Grainger and Bernard Lété, respectively.


The eManulex website benefits from resources made freely available on the Web, and is pleased to thank their authors: Flag icons by mayosoft, the Tango Desktop Library project...


This website lets you easily search through the Manulex databases, which constitute a comprehensive description of the written French addressed to a child while he/she learn to read in the primary grades (1st to 5th grade).

The databases allow the manipulation and the control of experimental variables in empirical studies based on objective data, and the development of instructional methods to keep with the distributional characteristics of French orthography.

Searches can be made on any available criterion and the results can be exported as Excel and Open-Office Calc compatible files. Complex queries are facilitated thanks to our request wizard, especially when text criteria are given.


Manulex is based on a corpus of 1.9 million words extracted from 54 readers used in French primary schools between the first and fifth grades. The readers cover a range of topic areas, each with an appreciable amount of data coming from different types of texts (from novels to various kinds of fiction, from newspaper reporting to technical writing, and from poetry to theater plays) written by different authors from a variety of backgrounds.

The database contains two lexicons: the wordform lexicon (48886 entries) and the lemma lexicon (23812 entries). Each lexicon provides a grade-level-based list of words found in first-grade, second-grade, and third-to-fifth grade readers (hereafter called levels G1, G2, G3-5, respectively). A fourth level (G1-5) was generated by combining all readers.


Manulex-infra is an extension of Manulex. It was developed to describe the distributional characteristics of the sublexical and lexical units in Manulex.

All entries in the Manulex-wordform lexicon were used for the computations, except abbreviations, interjections, and compound entries. This left a total of 45080 entries. Among these, 10861 were in G1, 18131 were in G2, and 42422 were in G3-5.

At each grade level, quantitative estimates were computed for several infralexical variables such as grapheme-to-phoneme mappings, bigrams, syllables, and for lexical variables such as lexical neighborhood, homophony and homography.


These are the people who have been involved in creating the databases, and this website. You can find more information about our work in our respective pages. You can also get the original papers from the download page.


The electronic eManulex website was created by:


The Manulex database was developed by:

  • Bernard Lété, INRP & Louis Lumière University, Laboratory of Cognitive Mechanism Studies, Lyon, France
  • Liliane Sprenger-Charolles, CNRS & Paris Descartes University, Laboratory of Psychology of Perception, Paris, France
  • Pascale Colé, CNRS & Provence University, Laboratory of Cognitive Psychology, Marseille, France


The Manulex-infra database was developed by:

  • Ronald Peereman, CNRS & Pierre Mendes France University, Laboratory of Psychology and Neurocognition, Grenoble, France
  • Bernard Lété, INRP & Louis Lumière University, Laboratory of Cognitive Mechanism Studies, Lyon, France
  • Liliane Sprenger-Charolles, CNRS & Paris Descartes University, Laboratory of Psychology of Perception, Paris, France


For Manulex:

Lété, B., Sprenger-Charolles, L., & Colé, P. (2004). Manulex: A grade-level lexical database from French elementary-school readers. Behavior Research Methods, Instruments, & Computers, 36, 156-166.

For Manulex-infra:

Peereman, R., Lété, B., & Sprenger-Charolles, L. (2007). Manulex-infra: Distributional characteristics of grapheme-phoneme mappings, infra-lexical and lexical units in child-directed written material. Behavior Research Methods, 39, 593-603.

For the electronic version eManulex:

Ortéga, É., & Lété, B. (2010). eManulex: Electronic version of Manulex and Manulex-infra databases. Retrieved from

Notice: references to the papers must always be accompanied by the reference to the electronic version.