ICSI_20010322-1450
ICSI_20010322-1450
1me013: Unless maybe we do this , uh , um ,
cancellation business .
2fe016: Right , but that's I mean , that was our plan
, but it's clear from Dan that this is not something you can
do in a short amount of time .
3mn017:
4mn017: Yeah , right .
5mn017:
6me013: Oh , the short amount of time thing , right
.
7fe016: So so we you know , we had spent a lot of time
, um , writing up the HLT paper and we wanted to use that ,
uh , kind of analysis , but the HLT paper has ,
8me001:
9mn017:
10me011:
11me013: Yeah .
12fe016: you know , it's a very crude measure of
overlap . It's not really something you could scientifically
say is overlap , it's just whether or not the ,
13fe016: um ,
14me011: c High correlation .
15fe016: the segments that were all synchronized ,
whether there was some overlap somewhere .
16fe016: And , you know , that pointed out some
differences , so he thought well if we can
17fe016: do something quick and dirty because Dan said
the crosscancellation , it's not straightforward . If it were
straightforward then we would try it , but so , it's sort
of
18fe016: good to hear that it was not straightforward ,
thinking if we can get decent forced alignments ,
19fe016: then at least we can do sort of a overall
report of
20fe016: what happens with actual overlap
21fe016: in time , but ,
22fe016: um
23me018: I didn't think that his message said it wasn't
straightforward .
24me011: Well if we'd just
25me013: Well ..
26me011: Umhmm .
27me018: I thought he's just saying you have to look
over a longer time window when you do it .
28fe016: and the but there are some issues of this
timing , um , in the recordings and
29me013: Yeah .
30me018: Right . So you just have to look over longer
time when you're trying to align the things , you can't you
can't just look
31me011:
32me011: Well . are you talking about the fact that the
recording software doesn't do timesynchronous ? Is that what
you're referring to ?
33me013:
34me011: That seems to me you can do that over the
entire file and get a very accurate
35mn017: I don't thi
36mn017: I d
37mn017: I don't think that was the issue . The issue
was that you have
38fe016: I yeah , that was sort of a side issue .
39me011: I didn't think so either .
40mn017: to you have have
41mn017: you first have to have a pretty good speech
detection on the individual channels .
42mn017:
43fe016: And it's dynamic , so I guess it was more
dynamic than some simple models
44fe016: would be able t to so so there are some things
available , and I don't know too much about this area
where
45fe016: if people aren't moving around much than you
could apply them , and it should work pretty well if you took
care of this recording time difference .
46me011: Right , which should be pretty straight
forward .
47fe016: Which a at least is well defined , and
48me011: Yeah .
49fe016: um , but then if you add the dynamic aspect of
adapting distances , then it wasn't
50fe016: I guess it just wasn't something that he could
do quickly .. and not in time for us to be able to do
something by two weeks from now , so . Well less than a week
.
51fe016: So um , so I don't know what we can do if
anything , that's sort of worth ,
52fe016: you know , a EUROSPEECH paper at this point
.
53me018: Well , Andreas , how well did it work on the
nonlapel stuff ?
54me011: Yeah . That's what I was gonna say . C .
55mn017:
56mn017: I haven't checked those yet . It's very
tedious to check these .
57mn017:
58me018: Mmm .
59mn017: Um , we would really need , ideally , a
transcriber
60mn017:
61mn017: to time mark the
62mn017: you know , the be at least the beginning and s
ends of contiguous speech .
63mn017:
64mn017: Um ,
65mn017:
66mn017: and ,
67mn017: you know , then with the time marks , you can
do an automatic
68mn017: comparison of your
69mn017: of your forced alignments .
70me011: Oh , M N C M .
71me018: Because really the the at least in terms of
how we were gonna use this in our system was
72mn017: Mmhmm .
73me018: to get an ideal an idea , uh , for each
channel about the start and end boundaries . We don't really
care about like intermediate word boundaries , so
74mn017:
75mn017: No , that's how I've been looking at it . I
mean , I don't care that the individual words are aligned
correctly , but
76fe016: Right .
77me018: Yeah .
78me018: Yeah .
79mn017: you don't wanna ,
80mn017: uh , infer from the alignment that someone
spoke who didn't . so , so
81me018: Right , exactly . So that's why I was
wondering if it
82me018: I mean , maybe if it doesn't work for lapel
stuff , we can just not use that and
83mn017:
84me001:
85mn017:
86mn017: Yeah .
87mn017:
88mn017: I haven't
89mn017: I ha just haven't had the time to , um ,
90mn017: do the same procedure on one of the
91mn017:
92mn017: so I would need a k I would need a channel
that has
93mn017:
94mn017: a speaker whose
95mn017: who has a lot of overlap but s
96mn017: you know , is a
97mn017: nonlapel mike .
98mn017:
99mn017: And , um ,
100mn017:
101mn017: where preferably , also there's someone
sitting next to them
102mn017: who talks a lot .
103mn017:
104me011: Hmm
105mn017: So ,
106me011: So a meeting with me in it .
107mn017: I maybe someone can help me find a good
candidate and then I would
108mn017: be willing to
109mn017:
110me018: We c you know what ? Maybe the best way to
find that would be to look through these .
111mn017: you know , hand
112me018: 'Cause you can see the seat numbers , and then
you can see what type of mike they were using .
113me018: And so we just look for , you know , somebody
sitting next to Adam
114me011:
115me018: at one of the meetings
116fe016: Actually y we can tell from the data that we
have , um , yeah , there's a way to tell . It might not be a
single person who's always overlapping that person but any
number of people , and ,
117mn017: ,coffeeperhaps
118me013:
119mn017: From the insertions , maybe ?
120mn017: fr
121mn017: fr from the
122mn017: Right .
123fe016: um , if you align the two
124fe016: hypothesis files across the channels , you
know , just word alignment , you'd be able to find that .
125fe016: So
126fe016: so I guess that's sort of a last
127fe016: ther there're sort of a few things we could do
. One is just do like nonlapels if we can get good enough
alignments . Another one was to try to get
128fe016: somehow align Thilo's energy segmentations
with
129fe016: what we have . But then you have the problem
of not knowing where the words are because these meetings
were done before that segmentation . But maybe there's
something that could be done .
130me018: What what is why do you need the , um ,
131me018: the forced alignment for the HLT I mean for
the EUROSPEECH paper ?
132fe016: Well ,
133fe016: I guess I I wanted to just do something not
on
134fe016: recognition experiments because that's ju way
too early , but to be able to report ,
135fe016: you know , actual numbers . Like if we if we
had
136fe016: handtranscribed pe good alignments or
handchecked alignments , then we could do this paper . It's
not that we need it to be automatic .
137fe016: But without knowing where the real words are ,
in time
138me018: So it was to get it was to get more data and
better to to squeeze the boundaries in .
139fe016: To to know what an overlap really if it's
really an overlap , or if it's just a
140me018: Ah , OK . Yeah .
141fe016: a a segment correlated with an overlap , and I
guess that's the difference to me between like a real paper
and a sort of , promissory paper .
142fe016: So ,
143fe016: um , if we d
144fe016: it might be possible to take Thilo's
output
145fe016: and like if you have , um ,
146fe016: like right now these meetings are all ,
147me011: Ach
148me011: I forgot the digital camera again . Every
meeting
149fe016: um ,
150fe016: you know , they're timealigned , so if these
are two different channels
151fe016: and somebody's talking here and somebody else
is talking here , just that word ,
152me011: Mmhmm .
153fe016: if Thilo can tell us that there're boundaries
here , we should be able to figure that out because the only
thing transcribed in this channel is this word .
154fe016: But , um , you know , if there are things
155me011: Two words .
156fe016: Yeah , if you have two and they're at the
edges , it's like here and here , and there's speech here ,
then it doesn't really help you , so , um
157me018: Thilo's won't put down two separate marks in
that case
158me011: Thilo's will . But .
159fe016: Well it w it would , but , um , we don't know
exactly where the words are because the transcriber gave us
two words in this time bin and we don't really know ,
160fe016: I mean , yeah it's
161fe008: Well it's a merging problem . If you had a if
you had a s if you had a script which would I've thought
about this , um , and I've discussed I've discussed it with
Thilo ,
162fe016: I mean , if you have any ideas . I would
163fe008: um , the ,
164fe008: I mean , I I in principle I could imagine
writing a script which would approximate it to some degree ,
but there is this problem of slippage , yeah .
165me011:
166me011: Well maybe
167me011: Maybe that will get enough of the cases to be
useful .
168fe016: Right . I mean , that that would be really
helpful . That was sort of another possibility .
169me011: You know s 'cause it seemed like most of the
cases are in fact the single word
170me011: sorts , or at least a single phrase
171mn017: Mmm .
172fe008: Well they they can be stretched . I wouldn't
make that generalization 'cause sometimes people will say ,
And then I and there's a long pause
173me011: in most of the bins .
174fe016: Yeah .
175fe008: and finish the sentence and and sometimes it
looks coherent and and the I mean it's it's not a simple
problem .
176fe008: But it's really And then it's coupled with the
problem that sometimes , you know , with with a fricative you
might get the beginning of the word cut off and so it's
coupled with the problem that Thilo's isn't
177fe008: perfect either . I mean , we've i th it's like
you have a merging problem plus so merging plus this problem
of , uh , not
178me011: Right .
179me011: Hmm
180fe008: y i i if the speech nonspeech were perfect to
begin with , the detector , that would already be an
improvement , but that's impossible , you know , i that's too
much to ask .
181me011:
182fe016: Right .
183me011: Yes .
184fe008: And so i and may you know , I mean , it's
185fe008: I think that there always th there would have
to be some handtweaking , but it's possible that a script
could be written to merge those two types of things . I've
I've discussed it with Thilo and I mean
186me011:
187fe008: in terms of not him doing it , but we we
discussed some of the parameters of that and how hard it
would be to in principle to write something that would do
that .
188mn017:
189fe016: I mean , I guess in the future it won't be as
much as an issue if
190fe016: transcribers are using the tightened
boundaries to start with , then we have a
191fe016: good idea of where the forced alignment is
constrained to . So I'm no I don't know if this
192fe008: Well , it's just , you know , a matter of we
had the revolution we had the revolution of improved , uh ,
interface , um , one month too late , but it's like ,
193me011: Oh .
194me011: Tools .
195fe016: Oh it's it's a yeah .
196fe008: you know , it's wonderful to have the
revolution , , so it's just a matter of of , you know , from
now on we'll be able to have things channelized to begin with
.
197me011: Right . And we'll just have to see how hard
that is .
198fe008: Yeah , that's right . That's right .
199me011: So so whether the corrections take too much
time . I was just thinking about the fact that if Thilo's
missed these short segments ,
200fe016: Yeah .
201me011: that might be quite timeconsuming for them to
insert them .
202fe008: Good point .
203fe016: But he he also can adjust this minimum time
duration constraint and then what you get is
204fe008: Yeah .
205me011: Spurious .
206fe016: noises mostly , but that might be OK , an
207me011: It might be easier to delete something that's
wrong than to insert something that's missing . What do you
think , Jane ?
208fe016: Right . And you can also see in the waveform
exac yeah .
209me013: If you can feel confident that what the yeah ,
that there's actually something that you're not gonna miss
something , yeah .
210fe008:
211fe016: Yeah .
212me011: Yeah . 'Cause then then you just delete it ,
and you don't have to pick a time .
213fe016: I think it's
214fe008:
215fe008: Well the problem is I you know I I it's a it's
a really good question , and I really find it a pain in the
neck to delete things because you have to get the mouse up
there
216fe008: on the t on the text line and i and otherwise
you just use an arrow to get down I mean , i it depends on
how lar th there's so many
217fe008: extra things that would make it
218fe008: one of them harder than the other , or or vice
versa . It's not a simple question . But , you know , I mean
, in principle , like , you know , if one of them is easier
then to bias it towards whichever one's easier .
219me011: Yeah , I guess the semantics aren't clear when
you delete a segment , right ? Because you would say You
would have to determine what the surroundings were .
220fe016: You could just say it's a noise , though , and
write , you know , a postprocessor will just
221fe016: all you have to do is just
222me011: If it's really a noise .
223fe016: or just say it's just put X , you know , like
not speech or something , and then you can get
224fe008: I think it's easier to add than delete ,
frankly , because you have to , uh , maneuver around on the
on both windows then .
225fe016: Yeah , or
226me011: To add or to delete ?
227fe008: To delete .
228me011: OK .
229fe016: Anyways , so I I guess
230me011: That Maybe that's an interface issue that
might be addressable . But I think it's the semantics that
are that are questionable to me , that you delete something
So let's say someone is talking to here ,
231fe008: It's possible .
232me011: and then you have a little segment here .
233me011: Well , is that part of the speech ? Is it part
of the nonspeech ?
234me011: I mean , w what do you embed it in ?