Research within the Centre for Language Technology is largely carried out within the context of projects. Current projects are listed here; past projects can be found here. See the web pages of individual members of the Centre for other details of research interests.
AnswerFinder (coordinator: Diego Molla)
This project looks at how existing document bases can be used as a source of answers to questions posed in natural language. Traditional information retrieval systems (such as Web search engines) return complete documents, but in many cases the user wants answers, not documents. We are convinced that, if we want to search large document bases to find the answer to a question, current corpus-based and statistical methods for information retrieval systems need to be combined with more informed approaches.
AnswerFinder combines traditional approaches to natural language processing with current robust approaches. In the indexing stage, AnswerFinder exploits the linguistic information in the target documents. The result is a simplified representation of the logical form of the document sentences. In the retrieval stage, the question is thoroughly analysed. A fall-back retrieval procedure finds the sentences whose logical forms indicate that they contain the answer, and extracts the answers.
DADA-HCS (coordinator: Steve Cassidy)
The DADA-HCS (Distributed Access and Data Annotation for the Human Communication Science) project is funded by the ARC Special Research Initiative on eResearch. The project aims to build infrastructure to allow collaborative annotation of Linguistic resources and sharing of these resources among researchers around Australia.
DANTE (coordinator: Robert Dale)
DANTE -- Detection and Normalisation of Temporal Expressions -- is a project concerned with identifying and interpreting references to times and dates in documents with high reliability, so that the information can be then used for high-quality temporal tracking of entities and events. This work is funded by the DSTO.
GainSpring (coordinator: Robert Dale)
Each year across the world, millions of company announcements are made available via national stock exchanges, disclosing information considered to be important to the marketplace. The documents can vary in length from short, one page announcements of the resignation of a director, to very long annual reports, often in excess of 100 pages. Secondary news sources such as Reuters and AP multiply the quantity of information further still. In such a data-rich environment, it can be impossible to find key information. The GainSpring project, funded by the Capital Markets Cooperative Research Centre explores the use of techniques from language technology as a way of extracting useful information from these documents. Our prototype systems use a combination of text categorisation, named entity recognition, information extraction and text summarisation to deliver the right information to the right people in a timely manner, via web, email, voice or SMS.
The Meeting Room Project (coordinator: Steve Cassidy)
The meeting room project is aimed at building speech technology applications in the context of a meeting room. One of our core goals is to build technology that can be deployed without invasive or complex instrumentation (eg. headsets, large array microphones). We are also interested in the confluence of speech technology and language technology and how both can be applied to make speech technology in the meeting room be a viable and useful resource.
PENG Online (coordinator: Rolf Schwitter)
PENG is a computer-processable controlled natural language designed for knowledge representation. Texts written in PENG look seemingly informal and are easy to understand by humans and easy to process by machines. In contrast to other controlled natural languages, the author of a PENG text does not need to remember the rules of the controlled language, since the PENG editor guides the writing process via predictive interface techniques. If you want to try out the PENG editor online, then please contact us and we will send you a login and a password.