ICSI_20010208-1430

1me013: So , uh

2fe008: Alright .

3me011: Um , so I wanted to discuss digits briefly ,

4me013: Oh good .

5me011: but that won't take too long .

6me013: Right .

7me013: OK , agenda items ,

8me013: Uh , we have digits ,

9me013: What else we got ?

10mn014: New version of the presegmentation .

11me013: New version of presegmentation .

12fe008: Um ,

13fe008: do we wanna say something about the ,

14fe016: Yeah , why don't you summarize the

15fe008: an update of the , uh , transcript ?

16me013: Update on transcripts .

17fe016: And I guess that includes some

18fe016: the filtering for the ,

19fe016: the ASI refs , too .

20fe008: Mm .

21me013: Filtering for what ?

22fe016: For the references that we need to go from the ,

23fe016: the ,

24fe016: fancy transcripts to the sort of

25fe008: It'll it'll be basically it'll be a recap of a meeting that we had jointly this morning .

26fe016: starts to say something that starts with s

27fe016: braindead .

28me013: Uhhuh .

29fe016: With Don , as well .

30fe008: Mmhmm .

31me013: Got it .

32me013: Anything else more pressing than those things ? So

33me013: So , why don't we just do those . You said yours was brief , so

34me011: OK .

35me011: OK well , the , w uh

36me011: as you can see from the numbers on the digits we're almost done .

37me011: The digits goes up to .. about four thousand .

38me011: Um ,

39me011: and so , uh ,

40me011: we probably will be done with the TI digits in ,

41me011: um , another couple weeks .

42me011: um , depending on how many we read each time .

43me011: So there were a bunch that we skipped .

44me011: You know , someone fills out the form and then they're not at the meeting and so it's blank .

45me011: Um , but those are almost all filled in as well .

46me011: And so , once we're it's done it would be very nice to train up a recognizer and actually start working with this data .

47me018: So we'll have a corpus that's the size of TI digits ?

48me011: And so

49me011: One particular test set of TI digits .

50me018: Test set , OK .

51me011: So , I I extracted ,

52me011: Ther there was a file sitting around which people have used here as a test set .

53me011: It had been randomized and so on and that's just what I used to generate the order .

54me018: Aah , great .

55me011: of these particular ones .

56me018: Great .

57me011: Um

58me013: So , I'm impressed by what we could do ,

59me013: Is take the standard training set for TI digits , train up with whatever , you know ,

60me013: great features we think we have , uh for instance ,

61me013: and then test on uh this test set .

62me013: And presumably uh it should do reasonably well on that , and then , presumably , we should go to the distant mike , and it should do poorly .

63me018: Yeah .

64me011: Right .

65me013: And then we should get really smart over the next year or two , and it that should get better .

66me011:

67me011: And inc increase it by one or two percent , yeah .

68me013: Yeah ,

69me013:

70me001:

71me018:

72me011:

73me013: Yeah .

74me011: Um , but ,

75me011: in order to do that we need to extract out the actual digits .

76me013: Right .

77me011: Um ,

78me011: so that the reason it's not just a transcript is that there're false starts , and misreads , and miscues and things like that .

79me011: And so I have a set of scripts and Xwaves where you just select the portion ,

80me011: hit R ,

81me011: um , it tells you what the next one should be ,

82me011: and you just look for that .

83me011: You know , so it it'll put on the screen ,

84me011: The next set is six nine ,

85me011: nine two two .

86me011: And you find that ,

87me011: and ,

88me011: hit the key and it records it in a file in a particular format .

89me013: So is this

90me011: And so the

91me011: the question is , should we have the transcribers do that or should we just do it ?

92me011: Well , some of us . I've been do I've done ,

93me011: eight meetings , something like that ,

94me011: just by hand .

95me011: Just myself , rather .

96me011: So it will not take long .

97me011: Um

98me013: Uh , what what do you think ?

99fe008: My feeling is that we discussed this right before coffee and I think it's a it's a fine idea partly because , um , it's not un unrelated to their present skill set ,

100fe008: but it will add , for them , an extra dimension , it might be an interesting break for them . And also it is contributing to the , uh ,

101fe008: c composition of the transcript 'cause we can incorporate those numbers directly and it'll be a more complete transcript . So I'm I think it's fine , that part .

102me011: There is there is

103me013: So you think it's fine to have the transcribers do it ?

104fe008: Mmhmm .

105me013: Yeah , OK .

106me011: There's one other small bit , which is just entering the information which at s which is at the top of this form ,

107me018: Good .

108me011: onto the computer ,

109me011: to go along with the where the digits are recorded automatically .

110me013: Yeah .

111me011: And so it's just ,

112me011: you know , typing in name , times time , date , and so on .

113me011: Um , which again either they can do ,

114me011: but it is , you know ,

115me011: firing up an editor ,

116me018:

117me011: or ,

118me011: again , I can do .

119me011: Or someone else can do .

120fe008: And , that , you know , I'm not ,

121fe008: that that one I'm not so sure if it's into the the ,

122fe008: things that ,

123fe008: I ,

124fe008: wanted to use the hours for , because the ,

125fe008: the time that they'd be spending doing that they wouldn't be able to be putting more words on .

126me013: Mmm .

127fe008: But that's really your choice , it's your

128me018: So are these two separate tasks that can happen ?

129me018: Or do they have to happen at the same time before

130me011: No they don't have this

131me011: you have to enter the data before ,

132me011: you do the second task , but they don't have to happen at the same time .

133me011: So it's it's just I have a file whi which has this information on it ,

134me018: OK .

135me011: and then when you start using my scripts ,

136me011: for extracting the times ,

137me011: it adds the times at the bottom of the file .

138me011: And so , um ,

139me011: I mean , it's easy to create the files and leave them blank , and so actually we could do it in either order .

140me018: Oh , OK .

141me011: Um ,

142me011: it's it's sort of nice to have the same person do it just as a doublecheck ,

143me011: to make sure you're entering for the right person .

144me011: But ,

145me011: either way .

146me013: Yeah .

147me013: Yeah just by way of uh , uh , a uh ,

148me011:

149me013: order of magnitude , uh ,

150me013: um , we've been working with this AURORA , uh data set .

151me013: And , uh , the best score ,

152me018:

153me013: on the ,

154me013: nicest part of the data , that is , where you've got training and test set that are basically the same kinds of noise and so forth ,

155me013: uh , is about , uh

156me013: I think the best score was something like five percent ,

157me013: uh , error , per digit .

158me013: So , that

159me011: Per digit .

160mn014: Per digit .

161me013: You're right . So if you were doing .. ten digit ,

162me013: uh , recognition , you would really be in trouble .

163me018: Mmhmm .

164me013: So So the

165me013: The point there , and this is uh car noise uh , uh things , but but real

166me013: real situation , well , real ,

167me013: Um , the uh there's one microphone that's close , that they have as as this sort of thing , close versus distant .

168me013: Uh but in a car , instead of instead of having a projector noise it's it's car noise .

169me011:

170me013: Uh but it wasn't artificially added to get some some artificial signal to noise ratio . It was just people driving around in a car .

171me018:

172me013: So , that's that's an indication , uh that was with ,

173me013: many sites competing , and this was the very best score and so forth , so .

174me018: Although the models weren't ,

175me013: More typical numbers like

176me018: that good , right ? I mean ,

177me018: the models are pretty crappy ?

178me013: You're right . I think that we could have done better on the models , but the thing is that we got this this is the kind of typical number ,

179me013: for all of the , uh , uh ,

180me013: things in this task , all of the , um ,

181me013: languages .

182me013: And so I I think we'd probably the models would be better in some than in others . Um ,

183me018: Hmm .

184me013: so , uh .

185me013: Anyway , just an indication once you get into this kind of realm even if you're looking at connected digits it can be pretty hard .

186fe008: Hmm .

187fe008: It's gonna be fun to see how we ,

188fe008: compare at this .

189me018: How did we do on the TI digits ?

190me013: Yeah .

191fe008: Very exciting .

192me011: Well the prosodics are so much different s , it's gonna be ,

193me011: strange .

194me011: I mean the prosodics are not the same as TI digits , for example .

195me013: Yeah .

196me011: So I'm I'm not sure how much of effect that will have .

197me018: H how do

198fe016: What do you mean , the prosodics ?

199me011: Um , just what we were talking about with grouping .

200me011: That with these , the grouping ,

201me011: there's no grouping at all , and so it's just

202me011: the only sort of

203me011: discontinuity you have is at the beginning and the end .

204fe016: So what are they doing in AURORA , are they reading actual phone numbers , or ,

205me011: AURORA I don't know .

206me011: I don't know what they do in AURORA .

207fe016: a a digit at a time , or ?

208me013: Uh , I'm not sure how no , no I mean it's connected it's connected , uh ,

209fe016: 'cause it's

210fe016: Connected .

211me013: digits , yeah .

212fe016: So there's also the

213me013: But .

214me011: But

215fe016: not just the prosody but the cross

216fe016: the crossword modeling is probably quite different .

217me011: Right .

218me011: But in TI digits ,

219me018: H

220me011: they're reading things like zip codes and phone numbers and things like that , so it's gonna be different .

221fe016: Right .

222me018: How do we do on TI digits ?

223me011: I don't remember . I mean , very good , right ?

224me013: Yeah , I mean we were in the .

225me011: One and a half percent , two percent , something like that ?

226me013: Uh , I th no I think we got under a percent , but it was but it's but I mean .

227me011: Oh really ? OK .

228fe008: s

229me013: The very best system that I saw in the literature was a point two five percent or something that somebody had at at Bell Labs , or .

230me011: Alright .

231me013: Uh , but .

232me018: Hmm .

233me013: But , uh , sort of pulling out all the stops . But I think a lot of systems sort of get half a percent , or threequarters a percent , and we're we're in there somewhere .

234me011: Right .

235me011: But that I mean it's really it's it's closetalking mikes , no noise , clean signal ,

236fe016:

237me011: just digits , I mean ,

238me013: Yeah .

239me011: every everything is good .

240fe016: It's the beginning of time in speech recognition .

241me011: Yes , exactly .

242me013:

243me018:

244me013: Yeah .

245me011: And we've only recently got it to anywhere near human .

246fe016: It's like the ,

247fe016: single cell ,

248me018: Pre

249fe016: you know ,

250me018: prehistory .

251fe016: it's the beginning of life , yeah .

252me013:

253me011: And it's still like an order of magnitude worse than what humans do .

254fe016: Right .

255me013: Yeah .

256me011: So .

257me013: When When they're wide awake , yeah .

258me001:

259me011: Yeah .

260me013: Um , after coffee , you're right .

261me018:

262fe016:

263me011: After coffee .

264me018:

265mn014:

266me013: Not after lunch .

267me011: OK , so , um ,

268me011: what I'll do then is I'll go ahead and enter ,

269me011: this data .

270me011: And then , hand off to Jane ,

271me011: and the transcribers to do the actual extraction of the digits .

272me013: Yeah .

273me013: Yeah .

274me013: One question I have that

275me013: that I mean , we wouldn't know the answer to now but might ,

276me011: Hmm .

277me013: do some guessing , but I was talking before about doing some model modeling of arti uh ,

278me013: uh , marking of articulatory ,

279me013: features ,

280me013: with overlap and so on .

281me013: And ,

282me013: and ,

283me013: um ,

284me013: On some subset .

285me013: One thought might be to do this uh , on on the digits , or some piece of the digits . Uh , it'd be easier ,

286me013: uh , and so forth . The only thing is I'm a little concerned that maybe the kind of phenomena ,

287me013: in w i i

288me013: The reason for doing it is because the the argument is that certainly with conversational speech , the stuff that we've looked at here before ,

289me013: um , just doing the simple mapping ,

290me013: from , um ,

291me013: the phone ,

292me013: to the corresponding features that you could look up in a book ,

293me013: uh , isn't right .

294me013: It isn't actually right . In fact there's these overlapping processes where some voicing some up and then some , you know ,

295me013: some nasality is comes in here , and so forth . And you do this gross thing saying Well I guess it's this phone starting there .

296me013: So ,

297me013: uh , that's the reasoning . But ,

298me013: It could be that when we're reading digits , because it's it's for such a limited set ,

299me013: that maybe

300me013: maybe that phenomenon doesn't occur as much . I don't know .

301me013: D e anybody ,

302fe008: .

303me013: do you have any ,..

304me013: anybody have any opinion about that ?

305fe008: It s strikes me that there are more each of them is more informative because it's so ,

306me018:

307fe008: random , and that people might articulate more , and you that might end up with more a closer correspondence .

308me013: Mmhmm .

309me011: Yeah that's I I agree . That it's just

310me013: Yeah .

311me018: Sort of less predictability , and

312fe008: Mmhmm .

313me013: Yeah .

314fe016: Well it's definitely true that , when people are ,

315me011: It's a

316fe016: reading ,

317fe016: even if they're rereading what ,

318fe016: they had said spontaneously , that they have very different patterns . Mitch showed that , and some ,

319me013: Right .

320fe016: dissertations have shown that .

321fe016: So the fact that they're reading , first of all , whether they're reading in a room of ,

322fe016: people , or rea you know , just the fact that they're reading will make a difference .

323me013: Yeah .

324fe016: And ,

325me011: Well

326fe016: depends what you're interested in .

327me011: Would ,

328me011: this corpus really be the right one to even try that on ?

Back to the top of this page