1me013: Unless maybe we do this , uh , um , cancellation business .
2fe016: Right , but that's I mean , that was our plan , but it's clear from Dan that this is not something you can do in a short amount of time .
4mn017: Yeah , right .
6me013: Oh , the short amount of time thing , right .
7fe016: So so we you know , we had spent a lot of time , um , writing up the HLT paper and we wanted to use that , uh , kind of analysis , but the HLT paper has ,
11me013: Yeah .
12fe016: you know , it's a very crude measure of overlap . It's not really something you could scientifically say is overlap , it's just whether or not the ,
13fe016: um ,
14me011: c High correlation .
15fe016: the segments that were all synchronized , whether there was some overlap somewhere .
16fe016: And , you know , that pointed out some differences , so he thought well if we can
17fe016: do something quick and dirty because Dan said the crosscancellation , it's not straightforward . If it were straightforward then we would try it , but so , it's sort of
18fe016: good to hear that it was not straightforward , thinking if we can get decent forced alignments ,
19fe016: then at least we can do sort of a overall report of
20fe016: what happens with actual overlap
21fe016: in time , but ,
23me018: I didn't think that his message said it wasn't straightforward .
24me011: Well if we'd just
25me013: Well ..
26me011: Umhmm .
27me018: I thought he's just saying you have to look over a longer time window when you do it .
28fe016: and the but there are some issues of this timing , um , in the recordings and
29me013: Yeah .
30me018: Right . So you just have to look over longer time when you're trying to align the things , you can't you can't just look
32me011: Well . are you talking about the fact that the recording software doesn't do timesynchronous ? Is that what you're referring to ?
34me011: That seems to me you can do that over the entire file and get a very accurate
35mn017: I don't thi
36mn017: I d
37mn017: I don't think that was the issue . The issue was that you have
38fe016: I yeah , that was sort of a side issue .
39me011: I didn't think so either .
40mn017: to you have have
41mn017: you first have to have a pretty good speech detection on the individual channels .
43fe016: And it's dynamic , so I guess it was more dynamic than some simple models
44fe016: would be able t to so so there are some things available , and I don't know too much about this area where
45fe016: if people aren't moving around much than you could apply them , and it should work pretty well if you took care of this recording time difference .
46me011: Right , which should be pretty straight forward .
47fe016: Which a at least is well defined , and
48me011: Yeah .
49fe016: um , but then if you add the dynamic aspect of adapting distances , then it wasn't
50fe016: I guess it just wasn't something that he could do quickly .. and not in time for us to be able to do something by two weeks from now , so . Well less than a week .
51fe016: So um , so I don't know what we can do if anything , that's sort of worth ,
52fe016: you know , a EUROSPEECH paper at this point .
53me018: Well , Andreas , how well did it work on the nonlapel stuff ?
54me011: Yeah . That's what I was gonna say . C .
56mn017: I haven't checked those yet . It's very tedious to check these .
58me018: Mmm .
59mn017: Um , we would really need , ideally , a transcriber
61mn017: to time mark the
62mn017: you know , the be at least the beginning and s ends of contiguous speech .
64mn017: Um ,
66mn017: and ,
67mn017: you know , then with the time marks , you can do an automatic
68mn017: comparison of your
69mn017: of your forced alignments .
70me011: Oh , M N C M .
71me018: Because really the the at least in terms of how we were gonna use this in our system was
72mn017: Mmhmm .
73me018: to get an ideal an idea , uh , for each channel about the start and end boundaries . We don't really care about like intermediate word boundaries , so
75mn017: No , that's how I've been looking at it . I mean , I don't care that the individual words are aligned correctly , but
76fe016: Right .
77me018: Yeah .
78me018: Yeah .
79mn017: you don't wanna ,
80mn017: uh , infer from the alignment that someone spoke who didn't . so , so
81me018: Right , exactly . So that's why I was wondering if it
82me018: I mean , maybe if it doesn't work for lapel stuff , we can just not use that and
86mn017: Yeah .
88mn017: I haven't
89mn017: I ha just haven't had the time to , um ,
90mn017: do the same procedure on one of the
92mn017: so I would need a k I would need a channel that has
94mn017: a speaker whose
95mn017: who has a lot of overlap but s
96mn017: you know , is a
97mn017: nonlapel mike .
99mn017: And , um ,
101mn017: where preferably , also there's someone sitting next to them
102mn017: who talks a lot .
105mn017: So ,
106me011: So a meeting with me in it .
107mn017: I maybe someone can help me find a good candidate and then I would
108mn017: be willing to
110me018: We c you know what ? Maybe the best way to find that would be to look through these .
111mn017: you know , hand
112me018: 'Cause you can see the seat numbers , and then you can see what type of mike they were using .
113me018: And so we just look for , you know , somebody sitting next to Adam
115me018: at one of the meetings
116fe016: Actually y we can tell from the data that we have , um , yeah , there's a way to tell . It might not be a single person who's always overlapping that person but any number of people , and ,
119mn017: From the insertions , maybe ?
121mn017: fr from the
122mn017: Right .
123fe016: um , if you align the two
124fe016: hypothesis files across the channels , you know , just word alignment , you'd be able to find that .
126fe016: so I guess that's sort of a last
127fe016: ther there're sort of a few things we could do . One is just do like nonlapels if we can get good enough alignments . Another one was to try to get
128fe016: somehow align Thilo's energy segmentations with
129fe016: what we have . But then you have the problem of not knowing where the words are because these meetings were done before that segmentation . But maybe there's something that could be done .
130me018: What what is why do you need the , um ,
131me018: the forced alignment for the HLT I mean for the EUROSPEECH paper ?
132fe016: Well ,
133fe016: I guess I I wanted to just do something not on
134fe016: recognition experiments because that's ju way too early , but to be able to report ,
135fe016: you know , actual numbers . Like if we if we had
136fe016: handtranscribed pe good alignments or handchecked alignments , then we could do this paper . It's not that we need it to be automatic .
137fe016: But without knowing where the real words are , in time
138me018: So it was to get it was to get more data and better to to squeeze the boundaries in .
139fe016: To to know what an overlap really if it's really an overlap , or if it's just a
140me018: Ah , OK . Yeah .
141fe016: a a segment correlated with an overlap , and I guess that's the difference to me between like a real paper and a sort of , promissory paper .
142fe016: So ,
143fe016: um , if we d
144fe016: it might be possible to take Thilo's output
145fe016: and like if you have , um ,
146fe016: like right now these meetings are all ,
148me011: I forgot the digital camera again . Every meeting
149fe016: um ,
150fe016: you know , they're timealigned , so if these are two different channels
151fe016: and somebody's talking here and somebody else is talking here , just that word ,
152me011: Mmhmm .
153fe016: if Thilo can tell us that there're boundaries here , we should be able to figure that out because the only thing transcribed in this channel is this word .
154fe016: But , um , you know , if there are things
155me011: Two words .
156fe016: Yeah , if you have two and they're at the edges , it's like here and here , and there's speech here , then it doesn't really help you , so , um
157me018: Thilo's won't put down two separate marks in that case
158me011: Thilo's will . But .
159fe016: Well it w it would , but , um , we don't know exactly where the words are because the transcriber gave us two words in this time bin and we don't really know ,
160fe016: I mean , yeah it's
161fe008: Well it's a merging problem . If you had a if you had a s if you had a script which would I've thought about this , um , and I've discussed I've discussed it with Thilo ,
162fe016: I mean , if you have any ideas . I would
163fe008: um , the ,
164fe008: I mean , I I in principle I could imagine writing a script which would approximate it to some degree , but there is this problem of slippage , yeah .
166me011: Well maybe
167me011: Maybe that will get enough of the cases to be useful .
168fe016: Right . I mean , that that would be really helpful . That was sort of another possibility .
169me011: You know s 'cause it seemed like most of the cases are in fact the single word
170me011: sorts , or at least a single phrase
171mn017: Mmm .
172fe008: Well they they can be stretched . I wouldn't make that generalization 'cause sometimes people will say , And then I and there's a long pause
173me011: in most of the bins .
174fe016: Yeah .
175fe008: and finish the sentence and and sometimes it looks coherent and and the I mean it's it's not a simple problem .
176fe008: But it's really And then it's coupled with the problem that sometimes , you know , with with a fricative you might get the beginning of the word cut off and so it's coupled with the problem that Thilo's isn't
177fe008: perfect either . I mean , we've i th it's like you have a merging problem plus so merging plus this problem of , uh , not
178me011: Right .
180fe008: y i i if the speech nonspeech were perfect to begin with , the detector , that would already be an improvement , but that's impossible , you know , i that's too much to ask .
182fe016: Right .
183me011: Yes .
184fe008: And so i and may you know , I mean , it's
185fe008: I think that there always th there would have to be some handtweaking , but it's possible that a script could be written to merge those two types of things . I've I've discussed it with Thilo and I mean
187fe008: in terms of not him doing it , but we we discussed some of the parameters of that and how hard it would be to in principle to write something that would do that .
189fe016: I mean , I guess in the future it won't be as much as an issue if
190fe016: transcribers are using the tightened boundaries to start with , then we have a
191fe016: good idea of where the forced alignment is constrained to . So I'm no I don't know if this
192fe008: Well , it's just , you know , a matter of we had the revolution we had the revolution of improved , uh , interface , um , one month too late , but it's like ,
193me011: Oh .
194me011: Tools .
195fe016: Oh it's it's a yeah .
196fe008: you know , it's wonderful to have the revolution , , so it's just a matter of of , you know , from now on we'll be able to have things channelized to begin with .
197me011: Right . And we'll just have to see how hard that is .
198fe008: Yeah , that's right . That's right .
199me011: So so whether the corrections take too much time . I was just thinking about the fact that if Thilo's missed these short segments ,
200fe016: Yeah .
201me011: that might be quite timeconsuming for them to insert them .
202fe008: Good point .
203fe016: But he he also can adjust this minimum time duration constraint and then what you get is
204fe008: Yeah .
205me011: Spurious .
206fe016: noises mostly , but that might be OK , an
207me011: It might be easier to delete something that's wrong than to insert something that's missing . What do you think , Jane ?
208fe016: Right . And you can also see in the waveform exac yeah .
209me013: If you can feel confident that what the yeah , that there's actually something that you're not gonna miss something , yeah .
211fe016: Yeah .
212me011: Yeah . 'Cause then then you just delete it , and you don't have to pick a time .
213fe016: I think it's
215fe008: Well the problem is I you know I I it's a it's a really good question , and I really find it a pain in the neck to delete things because you have to get the mouse up there
216fe008: on the t on the text line and i and otherwise you just use an arrow to get down I mean , i it depends on how lar th there's so many
217fe008: extra things that would make it
218fe008: one of them harder than the other , or or vice versa . It's not a simple question . But , you know , I mean , in principle , like , you know , if one of them is easier then to bias it towards whichever one's easier .
219me011: Yeah , I guess the semantics aren't clear when you delete a segment , right ? Because you would say You would have to determine what the surroundings were .
220fe016: You could just say it's a noise , though , and write , you know , a postprocessor will just
221fe016: all you have to do is just
222me011: If it's really a noise .
223fe016: or just say it's just put X , you know , like not speech or something , and then you can get
224fe008: I think it's easier to add than delete , frankly , because you have to , uh , maneuver around on the on both windows then .
225fe016: Yeah , or
226me011: To add or to delete ?
227fe008: To delete .
228me011: OK .
229fe016: Anyways , so I I guess
230me011: That Maybe that's an interface issue that might be addressable . But I think it's the semantics that are that are questionable to me , that you delete something So let's say someone is talking to here ,
231fe008: It's possible .
232me011: and then you have a little segment here .
233me011: Well , is that part of the speech ? Is it part of the nonspeech ?
234me011: I mean , w what do you embed it in ?