ICSI_20010322-1450

1me013: Unless maybe we do this , uh , um , cancellation business .

2fe016: Right , but that's I mean , that was our plan , but it's clear from Dan that this is not something you can do in a short amount of time .

3mn017:

4mn017: Yeah , right .

5mn017:

6me013: Oh , the short amount of time thing , right .

7fe016: So so we you know , we had spent a lot of time , um , writing up the HLT paper and we wanted to use that , uh , kind of analysis , but the HLT paper has ,

8me001:

9mn017:

10me011:

11me013: Yeah .

12fe016: you know , it's a very crude measure of overlap . It's not really something you could scientifically say is overlap , it's just whether or not the ,

13fe016: um ,

14me011: c High correlation .

15fe016: the segments that were all synchronized , whether there was some overlap somewhere .

16fe016: And , you know , that pointed out some differences , so he thought well if we can

17fe016: do something quick and dirty because Dan said the crosscancellation , it's not straightforward . If it were straightforward then we would try it , but so , it's sort of

18fe016: good to hear that it was not straightforward , thinking if we can get decent forced alignments ,

19fe016: then at least we can do sort of a overall report of

20fe016: what happens with actual overlap

21fe016: in time , but ,

22fe016: um

23me018: I didn't think that his message said it wasn't straightforward .

24me011: Well if we'd just

25me013: Well ..

26me011: Umhmm .

27me018: I thought he's just saying you have to look over a longer time window when you do it .

28fe016: and the but there are some issues of this timing , um , in the recordings and

29me013: Yeah .

30me018: Right . So you just have to look over longer time when you're trying to align the things , you can't you can't just look

31me011:

32me011: Well . are you talking about the fact that the recording software doesn't do timesynchronous ? Is that what you're referring to ?

33me013:

34me011: That seems to me you can do that over the entire file and get a very accurate

35mn017: I don't thi

36mn017: I d

37mn017: I don't think that was the issue . The issue was that you have

38fe016: I yeah , that was sort of a side issue .

39me011: I didn't think so either .

40mn017: to you have have

41mn017: you first have to have a pretty good speech detection on the individual channels .

42mn017:

43fe016: And it's dynamic , so I guess it was more dynamic than some simple models

44fe016: would be able t to so so there are some things available , and I don't know too much about this area where

45fe016: if people aren't moving around much than you could apply them , and it should work pretty well if you took care of this recording time difference .

46me011: Right , which should be pretty straight forward .

47fe016: Which a at least is well defined , and

48me011: Yeah .

49fe016: um , but then if you add the dynamic aspect of adapting distances , then it wasn't

50fe016: I guess it just wasn't something that he could do quickly .. and not in time for us to be able to do something by two weeks from now , so . Well less than a week .

51fe016: So um , so I don't know what we can do if anything , that's sort of worth ,

52fe016: you know , a EUROSPEECH paper at this point .

53me018: Well , Andreas , how well did it work on the nonlapel stuff ?

54me011: Yeah . That's what I was gonna say . C .

55mn017:

56mn017: I haven't checked those yet . It's very tedious to check these .

57mn017:

58me018: Mmm .

59mn017: Um , we would really need , ideally , a transcriber

60mn017:

61mn017: to time mark the

62mn017: you know , the be at least the beginning and s ends of contiguous speech .

63mn017:

64mn017: Um ,

65mn017:

66mn017: and ,

67mn017: you know , then with the time marks , you can do an automatic

68mn017: comparison of your

69mn017: of your forced alignments .

70me011: Oh , M N C M .

71me018: Because really the the at least in terms of how we were gonna use this in our system was

72mn017: Mmhmm .

73me018: to get an ideal an idea , uh , for each channel about the start and end boundaries . We don't really care about like intermediate word boundaries , so

74mn017:

75mn017: No , that's how I've been looking at it . I mean , I don't care that the individual words are aligned correctly , but

76fe016: Right .

77me018: Yeah .

78me018: Yeah .

79mn017: you don't wanna ,

80mn017: uh , infer from the alignment that someone spoke who didn't . so , so

81me018: Right , exactly . So that's why I was wondering if it

82me018: I mean , maybe if it doesn't work for lapel stuff , we can just not use that and

83mn017:

84me001:

85mn017:

86mn017: Yeah .

87mn017:

88mn017: I haven't

89mn017: I ha just haven't had the time to , um ,

90mn017: do the same procedure on one of the

91mn017:

92mn017: so I would need a k I would need a channel that has

93mn017:

94mn017: a speaker whose

95mn017: who has a lot of overlap but s

96mn017: you know , is a

97mn017: nonlapel mike .

98mn017:

99mn017: And , um ,

100mn017:

101mn017: where preferably , also there's someone sitting next to them

102mn017: who talks a lot .

103mn017:

104me011: Hmm

105mn017: So ,

106me011: So a meeting with me in it .

107mn017: I maybe someone can help me find a good candidate and then I would

108mn017: be willing to

109mn017:

110me018: We c you know what ? Maybe the best way to find that would be to look through these .

111mn017: you know , hand

112me018: 'Cause you can see the seat numbers , and then you can see what type of mike they were using .

113me018: And so we just look for , you know , somebody sitting next to Adam

114me011:

115me018: at one of the meetings

116fe016: Actually y we can tell from the data that we have , um , yeah , there's a way to tell . It might not be a single person who's always overlapping that person but any number of people , and ,

117mn017: ,coffeeperhaps

118me013:

119mn017: From the insertions , maybe ?

120mn017: fr

121mn017: fr from the

122mn017: Right .

123fe016: um , if you align the two

124fe016: hypothesis files across the channels , you know , just word alignment , you'd be able to find that .

125fe016: So

126fe016: so I guess that's sort of a last

127fe016: ther there're sort of a few things we could do . One is just do like nonlapels if we can get good enough alignments . Another one was to try to get

128fe016: somehow align Thilo's energy segmentations with

129fe016: what we have . But then you have the problem of not knowing where the words are because these meetings were done before that segmentation . But maybe there's something that could be done .

130me018: What what is why do you need the , um ,

131me018: the forced alignment for the HLT I mean for the EUROSPEECH paper ?

132fe016: Well ,

133fe016: I guess I I wanted to just do something not on

134fe016: recognition experiments because that's ju way too early , but to be able to report ,

135fe016: you know , actual numbers . Like if we if we had

136fe016: handtranscribed pe good alignments or handchecked alignments , then we could do this paper . It's not that we need it to be automatic .

137fe016: But without knowing where the real words are , in time

138me018: So it was to get it was to get more data and better to to squeeze the boundaries in .

139fe016: To to know what an overlap really if it's really an overlap , or if it's just a

140me018: Ah , OK . Yeah .

141fe016: a a segment correlated with an overlap , and I guess that's the difference to me between like a real paper and a sort of , promissory paper .

142fe016: So ,

143fe016: um , if we d

144fe016: it might be possible to take Thilo's output

145fe016: and like if you have , um ,

146fe016: like right now these meetings are all ,

147me011: Ach

148me011: I forgot the digital camera again . Every meeting

149fe016: um ,

150fe016: you know , they're timealigned , so if these are two different channels

151fe016: and somebody's talking here and somebody else is talking here , just that word ,

152me011: Mmhmm .

153fe016: if Thilo can tell us that there're boundaries here , we should be able to figure that out because the only thing transcribed in this channel is this word .

154fe016: But , um , you know , if there are things

155me011: Two words .

156fe016: Yeah , if you have two and they're at the edges , it's like here and here , and there's speech here , then it doesn't really help you , so , um

157me018: Thilo's won't put down two separate marks in that case

158me011: Thilo's will . But .

159fe016: Well it w it would , but , um , we don't know exactly where the words are because the transcriber gave us two words in this time bin and we don't really know ,

160fe016: I mean , yeah it's

161fe008: Well it's a merging problem . If you had a if you had a s if you had a script which would I've thought about this , um , and I've discussed I've discussed it with Thilo ,

162fe016: I mean , if you have any ideas . I would

163fe008: um , the ,

164fe008: I mean , I I in principle I could imagine writing a script which would approximate it to some degree , but there is this problem of slippage , yeah .

165me011:

166me011: Well maybe

167me011: Maybe that will get enough of the cases to be useful .

168fe016: Right . I mean , that that would be really helpful . That was sort of another possibility .

169me011: You know s 'cause it seemed like most of the cases are in fact the single word

170me011: sorts , or at least a single phrase

171mn017: Mmm .

172fe008: Well they they can be stretched . I wouldn't make that generalization 'cause sometimes people will say , And then I and there's a long pause

173me011: in most of the bins .

174fe016: Yeah .

175fe008: and finish the sentence and and sometimes it looks coherent and and the I mean it's it's not a simple problem .

176fe008: But it's really And then it's coupled with the problem that sometimes , you know , with with a fricative you might get the beginning of the word cut off and so it's coupled with the problem that Thilo's isn't

177fe008: perfect either . I mean , we've i th it's like you have a merging problem plus so merging plus this problem of , uh , not

178me011: Right .

179me011: Hmm

180fe008: y i i if the speech nonspeech were perfect to begin with , the detector , that would already be an improvement , but that's impossible , you know , i that's too much to ask .

181me011:

182fe016: Right .

183me011: Yes .

184fe008: And so i and may you know , I mean , it's

185fe008: I think that there always th there would have to be some handtweaking , but it's possible that a script could be written to merge those two types of things . I've I've discussed it with Thilo and I mean

186me011:

187fe008: in terms of not him doing it , but we we discussed some of the parameters of that and how hard it would be to in principle to write something that would do that .

188mn017:

189fe016: I mean , I guess in the future it won't be as much as an issue if

190fe016: transcribers are using the tightened boundaries to start with , then we have a

191fe016: good idea of where the forced alignment is constrained to . So I'm no I don't know if this

192fe008: Well , it's just , you know , a matter of we had the revolution we had the revolution of improved , uh , interface , um , one month too late , but it's like ,

193me011: Oh .

194me011: Tools .

195fe016: Oh it's it's a yeah .

196fe008: you know , it's wonderful to have the revolution , , so it's just a matter of of , you know , from now on we'll be able to have things channelized to begin with .

197me011: Right . And we'll just have to see how hard that is .

198fe008: Yeah , that's right . That's right .

199me011: So so whether the corrections take too much time . I was just thinking about the fact that if Thilo's missed these short segments ,

200fe016: Yeah .

201me011: that might be quite timeconsuming for them to insert them .

202fe008: Good point .

203fe016: But he he also can adjust this minimum time duration constraint and then what you get is

204fe008: Yeah .

205me011: Spurious .

206fe016: noises mostly , but that might be OK , an

207me011: It might be easier to delete something that's wrong than to insert something that's missing . What do you think , Jane ?

208fe016: Right . And you can also see in the waveform exac yeah .

209me013: If you can feel confident that what the yeah , that there's actually something that you're not gonna miss something , yeah .

210fe008:

211fe016: Yeah .

212me011: Yeah . 'Cause then then you just delete it , and you don't have to pick a time .

213fe016: I think it's

214fe008:

215fe008: Well the problem is I you know I I it's a it's a really good question , and I really find it a pain in the neck to delete things because you have to get the mouse up there

216fe008: on the t on the text line and i and otherwise you just use an arrow to get down I mean , i it depends on how lar th there's so many

217fe008: extra things that would make it

218fe008: one of them harder than the other , or or vice versa . It's not a simple question . But , you know , I mean , in principle , like , you know , if one of them is easier then to bias it towards whichever one's easier .

219me011: Yeah , I guess the semantics aren't clear when you delete a segment , right ? Because you would say You would have to determine what the surroundings were .

220fe016: You could just say it's a noise , though , and write , you know , a postprocessor will just

221fe016: all you have to do is just

222me011: If it's really a noise .

223fe016: or just say it's just put X , you know , like not speech or something , and then you can get

224fe008: I think it's easier to add than delete , frankly , because you have to , uh , maneuver around on the on both windows then .

225fe016: Yeah , or

226me011: To add or to delete ?

227fe008: To delete .

228me011: OK .

229fe016: Anyways , so I I guess

230me011: That Maybe that's an interface issue that might be addressable . But I think it's the semantics that are that are questionable to me , that you delete something So let's say someone is talking to here ,

231fe008: It's possible .

232me011: and then you have a little segment here .

233me011: Well , is that part of the speech ? Is it part of the nonspeech ?

234me011: I mean , w what do you embed it in ?

Back to the top of this page