ICSI_20010208-1430
ICSI_20010208-1430
1me013: So , uh
2fe008: Alright .
3me011: Um , so I wanted to discuss digits briefly
,
4me013: Oh good .
5me011: but that won't take too long .
6me013: Right .
7me013: OK , agenda items ,
8me013: Uh , we have digits ,
9me013: What else we got ?
10mn014: New version of the presegmentation .
11me013: New version of presegmentation .
12fe008: Um ,
13fe008: do we wanna say something about the ,
14fe016: Yeah , why don't you summarize the
15fe008: an update of the , uh , transcript ?
16me013: Update on transcripts .
17fe016: And I guess that includes some
18fe016: the filtering for the ,
19fe016: the ASI refs , too .
20fe008: Mm .
21me013: Filtering for what ?
22fe016: For the references that we need to go from the
,
23fe016: the ,
24fe016: fancy transcripts to the sort of
25fe008: It'll it'll be basically it'll be a recap of a
meeting that we had jointly this morning .
26fe016: starts to say something that starts with s
27fe016: braindead .
28me013: Uhhuh .
29fe016: With Don , as well .
30fe008: Mmhmm .
31me013: Got it .
32me013: Anything else more pressing than those things
? So
33me013: So , why don't we just do those . You said
yours was brief , so
34me011: OK .
35me011: OK well , the , w uh
36me011: as you can see from the numbers on the digits
we're almost done .
37me011: The digits goes up to .. about four thousand
.
38me011: Um ,
39me011: and so , uh ,
40me011: we probably will be done with the TI digits in
,
41me011: um , another couple weeks .
42me011: um , depending on how many we read each time
.
43me011: So there were a bunch that we skipped .
44me011: You know , someone fills out the form and then
they're not at the meeting and so it's blank .
45me011: Um , but those are almost all filled in as
well .
46me011: And so , once we're it's done it would be very
nice to train up a recognizer and actually start working with
this data .
47me018: So we'll have a corpus that's the size of TI
digits ?
48me011: And so
49me011: One particular test set of TI digits .
50me018: Test set , OK .
51me011: So , I I extracted ,
52me011: Ther there was a file sitting around which
people have used here as a test set .
53me011: It had been randomized and so on and that's
just what I used to generate the order .
54me018: Aah , great .
55me011: of these particular ones .
56me018: Great .
57me011: Um
58me013: So , I'm impressed by what we could do ,
59me013: Is take the standard training set for TI
digits , train up with whatever , you know ,
60me013: great features we think we have , uh for
instance ,
61me013: and then test on uh this test set .
62me013: And presumably uh it should do reasonably well
on that , and then , presumably , we should go to the distant
mike , and it should do poorly .
63me018: Yeah .
64me011: Right .
65me013: And then we should get really smart over the
next year or two , and it that should get better .
66me011:
67me011: And inc increase it by one or two percent ,
yeah .
68me013: Yeah ,
69me013:
70me001:
71me018:
72me011:
73me013: Yeah .
74me011: Um , but ,
75me011: in order to do that we need to extract out the
actual digits .
76me013: Right .
77me011: Um ,
78me011: so that the reason it's not just a transcript
is that there're false starts , and misreads , and miscues
and things like that .
79me011: And so I have a set of scripts and Xwaves
where you just select the portion ,
80me011: hit R ,
81me011: um , it tells you what the next one should be
,
82me011: and you just look for that .
83me011: You know , so it it'll put on the screen ,
84me011: The next set is six nine ,
85me011: nine two two .
86me011: And you find that ,
87me011: and ,
88me011: hit the key and it records it in a file in a
particular format .
89me013: So is this
90me011: And so the
91me011: the question is , should we have the
transcribers do that or should we just do it ?
92me011: Well , some of us . I've been do I've done
,
93me011: eight meetings , something like that ,
94me011: just by hand .
95me011: Just myself , rather .
96me011: So it will not take long .
97me011: Um
98me013: Uh , what what do you think ?
99fe008: My feeling is that we discussed this right
before coffee and I think it's a it's a fine idea partly
because , um , it's not un unrelated to their present skill
set ,
100fe008: but it will add , for them , an extra
dimension , it might be an interesting break for them . And
also it is contributing to the , uh ,
101fe008: c composition of the transcript 'cause we can
incorporate those numbers directly and it'll be a more
complete transcript . So I'm I think it's fine , that part
.
102me011: There is there is
103me013: So you think it's fine to have the
transcribers do it ?
104fe008: Mmhmm .
105me013: Yeah , OK .
106me011: There's one other small bit , which is just
entering the information which at s which is at the top of
this form ,
107me018: Good .
108me011: onto the computer ,
109me011: to go along with the where the digits are
recorded automatically .
110me013: Yeah .
111me011: And so it's just ,
112me011: you know , typing in name , times time , date
, and so on .
113me011: Um , which again either they can do ,
114me011: but it is , you know ,
115me011: firing up an editor ,
116me018:
117me011: or ,
118me011: again , I can do .
119me011: Or someone else can do .
120fe008: And , that , you know , I'm not ,
121fe008: that that one I'm not so sure if it's into the
the ,
122fe008: things that ,
123fe008: I ,
124fe008: wanted to use the hours for , because the
,
125fe008: the time that they'd be spending doing that
they wouldn't be able to be putting more words on .
126me013: Mmm .
127fe008: But that's really your choice , it's your
128me018: So are these two separate tasks that can
happen ?
129me018: Or do they have to happen at the same time
before
130me011: No they don't have this
131me011: you have to enter the data before ,
132me011: you do the second task , but they don't have
to happen at the same time .
133me011: So it's it's just I have a file whi which has
this information on it ,
134me018: OK .
135me011: and then when you start using my scripts ,
136me011: for extracting the times ,
137me011: it adds the times at the bottom of the file
.
138me011: And so , um ,
139me011: I mean , it's easy to create the files and
leave them blank , and so actually we could do it in either
order .
140me018: Oh , OK .
141me011: Um ,
142me011: it's it's sort of nice to have the same person
do it just as a doublecheck ,
143me011: to make sure you're entering for the right
person .
144me011: But ,
145me011: either way .
146me013: Yeah .
147me013: Yeah just by way of uh , uh , a uh ,
148me011:
149me013: order of magnitude , uh ,
150me013: um , we've been working with this AURORA , uh
data set .
151me013: And , uh , the best score ,
152me018:
153me013: on the ,
154me013: nicest part of the data , that is , where
you've got training and test set that are basically the same
kinds of noise and so forth ,
155me013: uh , is about , uh
156me013: I think the best score was something like five
percent ,
157me013: uh , error , per digit .
158me013: So , that
159me011: Per digit .
160mn014: Per digit .
161me013: You're right . So if you were doing .. ten
digit ,
162me013: uh , recognition , you would really be in
trouble .
163me018: Mmhmm .
164me013: So So the
165me013: The point there , and this is uh car noise uh
, uh things , but but real
166me013: real situation , well , real ,
167me013: Um , the uh there's one microphone that's
close , that they have as as this sort of thing , close
versus distant .
168me013: Uh but in a car , instead of instead of having
a projector noise it's it's car noise .
169me011:
170me013: Uh but it wasn't artificially added to get
some some artificial signal to noise ratio . It was just
people driving around in a car .
171me018:
172me013: So , that's that's an indication , uh that was
with ,
173me013: many sites competing , and this was the very
best score and so forth , so .
174me018: Although the models weren't ,
175me013: More typical numbers like
176me018: that good , right ? I mean ,
177me018: the models are pretty crappy ?
178me013: You're right . I think that we could have done
better on the models , but the thing is that we got this this
is the kind of typical number ,
179me013: for all of the , uh , uh ,
180me013: things in this task , all of the , um ,
181me013: languages .
182me013: And so I I think we'd probably the models
would be better in some than in others . Um ,
183me018: Hmm .
184me013: so , uh .
185me013: Anyway , just an indication once you get into
this kind of realm even if you're looking at connected digits
it can be pretty hard .
186fe008: Hmm .
187fe008: It's gonna be fun to see how we ,
188fe008: compare at this .
189me018: How did we do on the TI digits ?
190me013: Yeah .
191fe008: Very exciting .
192me011: Well the prosodics are so much different s ,
it's gonna be ,
193me011: strange .
194me011: I mean the prosodics are not the same as TI
digits , for example .
195me013: Yeah .
196me011: So I'm I'm not sure how much of effect that
will have .
197me018: H how do
198fe016: What do you mean , the prosodics ?
199me011: Um , just what we were talking about with
grouping .
200me011: That with these , the grouping ,
201me011: there's no grouping at all , and so it's
just
202me011: the only sort of
203me011: discontinuity you have is at the beginning and
the end .
204fe016: So what are they doing in AURORA , are they
reading actual phone numbers , or ,
205me011: AURORA I don't know .
206me011: I don't know what they do in AURORA .
207fe016: a a digit at a time , or ?
208me013: Uh , I'm not sure how no , no I mean it's
connected it's connected , uh ,
209fe016: 'cause it's
210fe016: Connected .
211me013: digits , yeah .
212fe016: So there's also the
213me013: But .
214me011: But
215fe016: not just the prosody but the cross
216fe016: the crossword modeling is probably quite
different .
217me011: Right .
218me011: But in TI digits ,
219me018: H
220me011: they're reading things like zip codes and
phone numbers and things like that , so it's gonna be
different .
221fe016: Right .
222me018: How do we do on TI digits ?
223me011: I don't remember . I mean , very good , right
?
224me013: Yeah , I mean we were in the .
225me011: One and a half percent , two percent ,
something like that ?
226me013: Uh , I th no I think we got under a percent ,
but it was but it's but I mean .
227me011: Oh really ? OK .
228fe008: s
229me013: The very best system that I saw in the
literature was a point two five percent or something that
somebody had at at Bell Labs , or .
230me011: Alright .
231me013: Uh , but .
232me018: Hmm .
233me013: But , uh , sort of pulling out all the stops .
But I think a lot of systems sort of get half a percent , or
threequarters a percent , and we're we're in there somewhere
.
234me011: Right .
235me011: But that I mean it's really it's it's
closetalking mikes , no noise , clean signal ,
236fe016:
237me011: just digits , I mean ,
238me013: Yeah .
239me011: every everything is good .
240fe016: It's the beginning of time in speech
recognition .
241me011: Yes , exactly .
242me013:
243me018:
244me013: Yeah .
245me011: And we've only recently got it to anywhere
near human .
246fe016: It's like the ,
247fe016: single cell ,
248me018: Pre
249fe016: you know ,
250me018: prehistory .
251fe016: it's the beginning of life , yeah .
252me013:
253me011: And it's still like an order of magnitude
worse than what humans do .
254fe016: Right .
255me013: Yeah .
256me011: So .
257me013: When When they're wide awake , yeah .
258me001:
259me011: Yeah .
260me013: Um , after coffee , you're right .
261me018:
262fe016:
263me011: After coffee .
264me018:
265mn014:
266me013: Not after lunch .
267me011: OK , so , um ,
268me011: what I'll do then is I'll go ahead and enter
,
269me011: this data .
270me011: And then , hand off to Jane ,
271me011: and the transcribers to do the actual
extraction of the digits .
272me013: Yeah .
273me013: Yeah .
274me013: One question I have that
275me013: that I mean , we wouldn't know the answer to
now but might ,
276me011: Hmm .
277me013: do some guessing , but I was talking before
about doing some model modeling of arti uh ,
278me013: uh , marking of articulatory ,
279me013: features ,
280me013: with overlap and so on .
281me013: And ,
282me013: and ,
283me013: um ,
284me013: On some subset .
285me013: One thought might be to do this uh , on on the
digits , or some piece of the digits . Uh , it'd be easier
,
286me013: uh , and so forth . The only thing is I'm a
little concerned that maybe the kind of phenomena ,
287me013: in w i i
288me013: The reason for doing it is because the the
argument is that certainly with conversational speech , the
stuff that we've looked at here before ,
289me013: um , just doing the simple mapping ,
290me013: from , um ,
291me013: the phone ,
292me013: to the corresponding features that you could
look up in a book ,
293me013: uh , isn't right .
294me013: It isn't actually right . In fact there's
these overlapping processes where some voicing some up and
then some , you know ,
295me013: some nasality is comes in here , and so forth
. And you do this gross thing saying Well I guess it's this
phone starting there .
296me013: So ,
297me013: uh , that's the reasoning . But ,
298me013: It could be that when we're reading digits ,
because it's it's for such a limited set ,
299me013: that maybe
300me013: maybe that phenomenon doesn't occur as much .
I don't know .
301me013: D e anybody ,
302fe008: .
303me013: do you have any ,..
304me013: anybody have any opinion about that ?
305fe008: It s strikes me that there are more each of
them is more informative because it's so ,
306me018:
307fe008: random , and that people might articulate more
, and you that might end up with more a closer correspondence
.
308me013: Mmhmm .
309me011: Yeah that's I I agree . That it's just
310me013: Yeah .
311me018: Sort of less predictability , and
312fe008: Mmhmm .
313me013: Yeah .
314fe016: Well it's definitely true that , when people
are ,
315me011: It's a
316fe016: reading ,
317fe016: even if they're rereading what ,
318fe016: they had said spontaneously , that they have
very different patterns . Mitch showed that , and some ,
319me013: Right .
320fe016: dissertations have shown that .
321fe016: So the fact that they're reading , first of
all , whether they're reading in a room of ,
322fe016: people , or rea you know , just the fact that
they're reading will make a difference .
323me013: Yeah .
324fe016: And ,
325me011: Well
326fe016: depends what you're interested in .
327me011: Would ,
328me011: this corpus really be the right one to even
try that on ?