Skip to Content

Department of Linguistics

Corpora

The Providence (English) Corpus

The Providence Corpus consists of twice-monthly digital audio/video recordings of hour-long mother-child spontaneous speech interactions from 6 English-speaking children between approximately 1-3 years. The data were collected in and around southern New England from 2000-2004, and total approximately 363 hours. The child utterances are transcribed in broad phonetic transcription. This work was funded by NIH, carried out by Katherine Demuth and colleagues at Brown University in Providence, RI. The data are available on the CHILDES database.

Those wishing to use the corpus should cite the following reference:
Demuth, K., Culbertson, J. & Alter, J. 2006. Word-minimality, epenthesis, and coda licensing in the acquisition of English. Language & Speech, 49, 137-174.

The Lyon (French) Corpus

The Lyon Corpus consists of twice-monthly digital audio/video recordings of hour-long mother-child spontaneous speech interactions from 4 French-speaking children between approximately 1-3 years. The data were collected in Lyon, France from 2000-2004, and total approximately 185 hours. The child utterances are transcribed in broad phonetic transcription. The work was funded by NIH, and carried out in collaboration with Harriet Jisa and colleagues at Dynamique du Langage at the University of Lyon 2, France. The data are available on the CHILDES database.

Those wishing to use the corpus should cite the following reference:
Demuth, K. & Tremblay, A. 2008. Prosodically-conditioned variability in children's production of French determiners. Journal of Child Language, 35, 99-127.

The Demuth Sesotho Corpus

The Demuth Sesotho Corpus contains 98-hours of spontaneous speech interactions with four children aged 2-4. Audio taping took place at monthly intervals for 3-4 hours during interactions with family and peers in rural Lesotho. The data are morphologically tagged, and available as part of the CHILDES database. For a more detailed description of the Sesotho files please refer to pages 23-30 in the CHILDES documentation. Corpus preparation and research have been funded by NSF, Fulbright, and SSRC.

Those wishing to use this corpus should notify Katherine Demuth and cite the following reference:
Demuth, K. 1992. Acquisition of Sesotho. In D. Slobin (ed.), The Cross-Linguistic Study of Language Acquisition, vol 3, 557-638. Hillsdale, N.J.: Lawrence Erlbaum Associates.

To download the Demuth Sesotho Corpus, click here.

Want to learn some Sesotho? Here are 13 easy lessons for getting started.


Contact us

Level 3, Australian Hearing Hub
16 University Ave
Macquarie University
Sydney, NSW, 2109, Australia

phone: +61 2 9850 6705 
email: ling.cll@mq.edu.au
Follow us on Facebook!

Participate in Research

Take part in one of our studies, and help advance knowledge of child language development!

Read more about our studies or register your interest today!

come and join us

Researcher Enquiries 

Postdocs and PhD, MRes or Honours students
Email: Katherine Demuth (Lab Director) or Titia Benders (Deputy Lab Director) 

Undergraduates interested in gaining research experience
Email: Nyaradzai Marunda (Lab Coordinator)

Affiliations:
Macquarie University Centre for Language Sciences (CLaS)
ARC Centre of Excellence in Cognition and its Disorders (CCD)
The Hearing CRC