ICSI_20010208-1430

1me013: So , uh
2fe008: Alright .
3me011: Um , so I wanted to discuss digits briefly ,
4me013: Oh good .
5me011: but that won't take too long .
6me013: Right .
7me013: OK , agenda items ,
8me013: Uh , we have digits ,
9me013: What else we got ?
10mn014: New version of the presegmentation .
11me013: New version of presegmentation .
12fe008: Um ,
13fe008: do we wanna say something about the ,
14fe016: Yeah , why don't you summarize the
15fe008: an update of the , uh , transcript ?
16me013: Update on transcripts .
17fe016: And I guess that includes some
18fe016: the filtering for the ,
19fe016: the ASI refs , too .
20fe008: Mm .
21me013: Filtering for what ?
22fe016: For the references that we need to go from the ,
23fe016: the ,
24fe016: fancy transcripts to the sort of
25fe008: It'll it'll be basically it'll be a recap of a meeting that we had jointly this morning .
26fe016: starts to say something that starts with s
28me013: Uhhuh .
29fe016: With Don , as well .
30fe008: Mmhmm .
31me013: Got it .
32me013: Anything else more pressing than those things ? So
33me013: So , why don't we just do those . You said yours was brief , so
34me011: OK .
35me011: OK well , the , w uh
36me011: as you can see from the numbers on the digits we're almost done .
37me011: The digits goes up to .. about four thousand .
38me011: Um ,
39me011: and so , uh ,
40me011: we probably will be done with the TI digits in ,
41me011: um , another couple weeks .
42me011: um , depending on how many we read each time .
43me011: So there were a bunch that we skipped .
44me011: You know , someone fills out the form and then they're not at the meeting and so it's blank .
45me011: Um , but those are almost all filled in as well .
46me011: And so , once we're it's done it would be very nice to train up a recognizer and actually start working with this data .
47me018: So we'll have a corpus that's the size of TI digits ?
48me011: And so
49me011: One particular test set of TI digits .
50me018: Test set , OK .
51me011: So , I I extracted ,
52me011: Ther there was a file sitting around which people have used here as a test set .
53me011: It had been randomized and so on and that's just what I used to generate the order .
54me018: Aah , great .
55me011: of these particular ones .
56me018: Great .
57me011: Um
58me013: So , I'm impressed by what we could do ,
59me013: Is take the standard training set for TI digits , train up with whatever , you know ,
60me013: great features we think we have , uh for instance ,
61me013: and then test on uh this test set .
62me013: And presumably uh it should do reasonably well on that , and then , presumably , we should go to the distant mike , and it should do poorly .
63me018: Yeah .
64me011: Right .
65me013: And then we should get really smart over the next year or two , and it that should get better .
66me011:
67me011: And inc increase it by one or two percent , yeah .
68me013: Yeah ,
69me013:
70me001:
71me018:
72me011:
73me013: Yeah .
74me011: Um , but ,
75me011: in order to do that we need to extract out the actual digits .
76me013: Right .
77me011: Um ,
78me011: so that the reason it's not just a transcript is that there're false starts , and misreads , and miscues and things like that .
79me011: And so I have a set of scripts and Xwaves where you just select the portion ,
80me011: hit R ,
81me011: um , it tells you what the next one should be ,
82me011: and you just look for that .
83me011: You know , so it it'll put on the screen ,
84me011: The next set is six nine ,
85me011: nine two two .
86me011: And you find that ,
87me011: and ,
88me011: hit the key and it records it in a file in a particular format .
89me013: So is this
90me011: And so the
91me011: the question is , should we have the transcribers do that or should we just do it ?
92me011: Well , some of us . I've been do I've done ,
93me011: eight meetings , something like that ,
94me011: just by hand .
95me011: Just myself , rather .
96me011: So it will not take long .
97me011: Um
98me013: Uh , what what do you think ?
99fe008: My feeling is that we discussed this right before coffee and I think it's a it's a fine idea partly because , um , it's not un unrelated to their present skill set ,
100fe008: but it will add , for them , an extra dimension , it might be an interesting break for them . And also it is contributing to the , uh ,
101fe008: c composition of the transcript 'cause we can incorporate those numbers directly and it'll be a more complete transcript . So I'm I think it's fine , that part .
102me011: There is there is
103me013: So you think it's fine to have the transcribers do it ?
104fe008: Mmhmm .
105me013: Yeah , OK .
106me011: There's one other small bit , which is just entering the information which at s which is at the top of this form ,
107me018: Good .
108me011: onto the computer ,
109me011: to go along with the where the digits are recorded automatically .
110me013: Yeah .
111me011: And so it's just ,
112me011: you know , typing in name , times time , date , and so on .
113me011: Um , which again either they can do ,
114me011: but it is , you know ,
115me011: firing up an editor ,
116me018:
117me011: or ,
118me011: again , I can do .
119me011: Or someone else can do .
120fe008: And , that , you know , I'm not ,
121fe008: that that one I'm not so sure if it's into the the ,
122fe008: things that ,
123fe008: I ,
124fe008: wanted to use the hours for , because the ,
125fe008: the time that they'd be spending doing that they wouldn't be able to be putting more words on .
126me013: Mmm .
127fe008: But that's really your choice , it's your
128me018: So are these two separate tasks that can happen ?
129me018: Or do they have to happen at the same time before
130me011: No they don't have this
131me011: you have to enter the data before ,
132me011: you do the second task , but they don't have to happen at the same time .
133me011: So it's it's just I have a file whi which has this information on it ,
134me018: OK .
135me011: and then when you start using my scripts ,
136me011: for extracting the times ,
137me011: it adds the times at the bottom of the file .
138me011: And so , um ,
139me011: I mean , it's easy to create the files and leave them blank , and so actually we could do it in either order .
140me018: Oh , OK .
141me011: Um ,
142me011: it's it's sort of nice to have the same person do it just as a doublecheck ,
143me011: to make sure you're entering for the right person .
144me011: But ,
145me011: either way .
146me013: Yeah .
147me013: Yeah just by way of uh , uh , a uh ,
148me011:
149me013: order of magnitude , uh ,
150me013: um , we've been working with this AURORA , uh data set .
151me013: And , uh , the best score ,
152me018:
153me013: on the ,
154me013: nicest part of the data , that is , where you've got training and test set that are basically the same kinds of noise and so forth ,
155me013: uh , is about , uh
156me013: I think the best score was something like five percent ,
157me013: uh , error , per digit .
158me013: So , that
159me011: Per digit .
160mn014: Per digit .
161me013: You're right . So if you were doing .. ten digit ,
162me013: uh , recognition , you would really be in trouble .
163me018: Mmhmm .
164me013: So So the
165me013: The point there , and this is uh car noise uh , uh things , but but real
166me013: real situation , well , real ,
167me013: Um , the uh there's one microphone that's close , that they have as as this sort of thing , close versus distant .
168me013: Uh but in a car , instead of instead of having a projector noise it's it's car noise .
169me011:
170me013: Uh but it wasn't artificially added to get some some artificial signal to noise ratio . It was just people driving around in a car .
171me018:
172me013: So , that's that's an indication , uh that was with ,
173me013: many sites competing , and this was the very best score and so forth , so .
174me018: Although the models weren't ,
175me013: More typical numbers like
176me018: that good , right ? I mean ,
177me018: the models are pretty crappy ?
178me013: You're right . I think that we could have done better on the models , but the thing is that we got this this is the kind of typical number ,
179me013: for all of the , uh , uh ,
180me013: things in this task , all of the , um ,
181me013: languages .
182me013: And so I I think we'd probably the models would be better in some than in others . Um ,
183me018: Hmm .
184me013: so , uh .
185me013: Anyway , just an indication once you get into this kind of realm even if you're looking at connected digits it can be pretty hard .
186fe008: Hmm .
187fe008: It's gonna be fun to see how we ,
188fe008: compare at this .
189me018: How did we do on the TI digits ?
190me013: Yeah .
191fe008: Very exciting .
192me011: Well the prosodics are so much different s , it's gonna be ,
193me011: strange .
194me011: I mean the prosodics are not the same as TI digits , for example .
195me013: Yeah .
196me011: So I'm I'm not sure how much of effect that will have .
197me018: H how do
198fe016: What do you mean , the prosodics ?
199me011: Um , just what we were talking about with grouping .
200me011: That with these , the grouping ,
201me011: there's no grouping at all , and so it's just
202me011: the only sort of
203me011: discontinuity you have is at the beginning and the end .
204fe016: So what are they doing in AURORA , are they reading actual phone numbers , or ,
205me011: AURORA I don't know .
206me011: I don't know what they do in AURORA .
207fe016: a a digit at a time , or ?
208me013: Uh , I'm not sure how no , no I mean it's connected it's connected , uh ,
209fe016: 'cause it's
210fe016: Connected .
211me013: digits , yeah .
212fe016: So there's also the
213me013: But .
214me011: But
215fe016: not just the prosody but the cross
216fe016: the crossword modeling is probably quite different .
217me011: Right .
218me011: But in TI digits ,
219me018: H
220me011: they're reading things like zip codes and phone numbers and things like that , so it's gonna be different .
221fe016: Right .
222me018: How do we do on TI digits ?
223me011: I don't remember . I mean , very good , right ?
224me013: Yeah , I mean we were in the .
225me011: One and a half percent , two percent , something like that ?
226me013: Uh , I th no I think we got under a percent , but it was but it's but I mean .
227me011: Oh really ? OK .
228fe008: s
229me013: The very best system that I saw in the literature was a point two five percent or something that somebody had at at Bell Labs , or .
230me011: Alright .
231me013: Uh , but .
232me018: Hmm .
233me013: But , uh , sort of pulling out all the stops . But I think a lot of systems sort of get half a percent , or threequarters a percent , and we're we're in there somewhere .
234me011: Right .
235me011: But that I mean it's really it's it's closetalking mikes , no noise , clean signal ,
236fe016:
237me011: just digits , I mean ,
238me013: Yeah .
239me011: every everything is good .
240fe016: It's the beginning of time in speech recognition .
241me011: Yes , exactly .
242me013:
243me018:
244me013: Yeah .
245me011: And we've only recently got it to anywhere near human .
246fe016: It's like the ,
247fe016: single cell ,
248me018: Pre
249fe016: you know ,
250me018: prehistory .
251fe016: it's the beginning of life , yeah .
252me013:
253me011: And it's still like an order of magnitude worse than what humans do .
254fe016: Right .
255me013: Yeah .
256me011: So .
257me013: When When they're wide awake , yeah .
258me001:
259me011: Yeah .
260me013: Um , after coffee , you're right .
261me018:
262fe016:
263me011: After coffee .
264me018:
265mn014:
266me013: Not after lunch .
267me011: OK , so , um ,
268me011: what I'll do then is I'll go ahead and enter ,
269me011: this data .
270me011: And then , hand off to Jane ,
271me011: and the transcribers to do the actual extraction of the digits .
272me013: Yeah .
273me013: Yeah .
274me013: One question I have that
275me013: that I mean , we wouldn't know the answer to now but might ,
276me011: Hmm .
277me013: do some guessing , but I was talking before about doing some model modeling of arti uh ,
278me013: uh , marking of articulatory ,
279me013: features ,
280me013: with overlap and so on .
281me013: And ,
282me013: and ,
283me013: um ,
284me013: On some subset .
285me013: One thought might be to do this uh , on on the digits , or some piece of the digits . Uh , it'd be easier ,
286me013: uh , and so forth . The only thing is I'm a little concerned that maybe the kind of phenomena ,
287me013: in w i i
288me013: The reason for doing it is because the the argument is that certainly with conversational speech , the stuff that we've looked at here before ,
289me013: um , just doing the simple mapping ,
290me013: from , um ,
291me013: the phone ,
292me013: to the corresponding features that you could look up in a book ,
293me013: uh , isn't right .
294me013: It isn't actually right . In fact there's these overlapping processes where some voicing some up and then some , you know ,
295me013: some nasality is comes in here , and so forth . And you do this gross thing saying Well I guess it's this phone starting there .
296me013: So ,
297me013: uh , that's the reasoning . But ,
298me013: It could be that when we're reading digits , because it's it's for such a limited set ,
299me013: that maybe
300me013: maybe that phenomenon doesn't occur as much . I don't know .
301me013: D e anybody ,
302fe008: .
303me013: do you have any ,..
304me013: anybody have any opinion about that ?
305fe008: It s strikes me that there are more each of them is more informative because it's so ,
306me018:
307fe008: random , and that people might articulate more , and you that might end up with more a closer correspondence .
308me013: Mmhmm .
309me011: Yeah that's I I agree . That it's just
310me013: Yeah .
311me018: Sort of less predictability , and
312fe008: Mmhmm .
313me013: Yeah .
314fe016: Well it's definitely true that , when people are ,
315me011: It's a