We compare CW-classifiers with other online algorithms for linear classification. We compare With both these classifiers we use the same top-1 approach as with the Table 2 shows All classifiers were trained with 10 iterations.
These results confirm those
The training time of the CW-classifiers depends on the number of iterations used, and this of course affects the accuracy of the parser. Figure 1 shows The horizontal line shows the LAS obtained with SVM.
We see that after 4 iterations the CW-classifier has the best performance for the data set (Danish) used in this experiment. In most experiments we Table 1 compares training time (10 iterations) and parsing time of a parser using a We see that training of the CW-classifier is faster, which is to be expected given We also see that parsing is much faster.
Because we explicitly represent For some of the larger data sets, the number of features is so For instance
To solve this problem we have tried to use pruning to remove the features occurring fewest times in the training data. If a feature occurs fewer times than a given cutoff limit This goes against the idea of Experiments also show that this pruning hurts accuracy. Figure 2 shows the
Instead of pruning the features we tried manually removing some of the We removed some of the combinations that lead to the most extra features, which is especially the case with combinations of lexical features. In the extended default feature set
Table 2 shows that this consistently leads to
Table 2 shows the results for the 10 CoNNL-X data sets used. For comparison we have included the results from using the standard classifier in the MaltParser, i.e. SVM with a polynomial kernel. The hyper-parameters for the SVM have not been optimized, and neither has the number of iterations for the We see that in many cases the CW-classifier does significantly better than the SVM, but that the opposite is also the case.
The results presented above are suboptimal for the SVMs because default parameters have been used for these, and optimizing these can improve accuracy a lot. In this section we will compare results obtained with CW-classifiers with the results for the MaltParser from CoNNL-X. In CoNNL-X both the Here we do not
The only We optimize this by Although The results are presented in table 3.
We see that even though the feature In general
We have shown that using confidence-weighted classifiers with transition-based dependency parsing yields results comparable Currently
Currently A possible solution is to use kernels with confidence-weighted classification in the same way they are used with the SVMs.
Another possibility is to extend the feature set in a more critical way than For instance This feature does not convey any information that the The same is the case for some word-form and word-lemma features. All in all We have not yet tried to use automatic
We will also try to The results in table 3 are obtained with the features optimized for the SVMs. These are not necessarily the optimal features for the CW-classifiers.
Another comparison we would like to Unlike the polynomial kernel SVMs used as Trying to use the same extended feature set we use with the CW-classifiers with a linear SVM would provide an interesting comparison.