Topic Detection in Meeting Dialog

Topic Detection in Meeting Dialog

Meeting Room
Meeting Room Project

Topic Detection in Meeting Dialogs

This project, part of our Meeting Room Project, is trying to find ways to automatically locate topic breaks in meeting transcripts. That is, given the text of a meeting, find where the topic of conversation changes significantly. In order to evaluate our technology we need human annotated meetings which include topic change information. We're asking for volunteers to read through some transcripts and record where they think topic changes occur.

Guidelines

  • A `topic' is not a very well defined unit of semantic content, hence we can't give a hard and fast definition of what to look for when marking them up. We would like you to use your intuitive notion of `topic' to decide when a break occurs in these dialogs.
  • In some cases there may be obvious changes, eg. from talking about economics to talking about television. At other times the change is more subtle -- perhaps micro to macro economics -- and unless you are an expert in the field, you might not pick it up. To try to record these differences, please mark each topic break with a number between 1 and 5: 1 meaning a very subtle break, 5 meaning a major change in subject.
  • Don't look for topic changes if you aren't finding any, it's possible that in some of these meetings only one topic is discussed.

Markup Format

Each meeting transcript is presented as a web page with each speaker clearly marked. Within a turn, an underlined section means that another speaker is interrupting the first, possibly overlapping with them. Each speaker will have an identifier, in some cases this is S1, S2 etc, in others it's a name. Each speaker turn is also numbered in the top left of the box containing that turn. Use this number to identify where breaks occur, that is, write down the turn number for the first turn in each new topic along with the break strength (1-5) and any comments you might have. We ask that you return to us a small text file containing this information, eg:

12 2
29 3
89 5   Start talking about fruit
103 1  now fruit canning
130 4  now vegetable production
    

If you think that a topic change happens inside a turn, please include this in a comment and estimate about where you think this occurs, eg "topic change about 30% in to turn after 'Eggplant'"

Dialogs

We are targeting the following dialogs in the first instance which we'd like to have annotated by as many people as possible to allow us to measure inter-annotator reliability. If you are willing to help, please select a dialog (at random) from the list below and then work your way through the list as your patience allows. Even if you can only do one dialog, that would be useful. Please email the completed text file (as above) to me (Steve Cassidy).

I've added links to MP3 files of the meetings that I have audio for, listening to these while following the transcript may make annotation a little easier.

adv105su068
ICSI_1450MP3
ICSI_1430MP3
mtg485sg142
NIST_1148MP3
NIST_1007MP3
CMU_1500MP3
CMU_1400MP3
LDC_1400MP3
LDC_1500MP3
tut301mu021
dis115ju087

Many thanks for any help that you can provide with this task.


Back to the top of this page