HDT: Heuristic Incrememental Induction of univariate decision trees
Very useful in Natural Langage Processing.
Clear example below:
[...] analyzing historical data about articles and blog posts, to identify features (also called metrics or variables) that are good predictors of blog popularity when combined together, to build a system that can predict the popularity of an article before it gets published. The goal is to select the right mix of relevant articles to publish, to increase web traffic, and thus advertising dollars, for a niche digital publisher.
As in any similar problem, the historical data is called training set, and it is split into test data and control data for cross-validation purposes to avoid over-fitting. The features are selected to maximize some measure of predictive power, as described here. All of this is (so far) standard practice; the reader not familiar with this can Google the keywords introduced in this paragraph. In our particular case, we use our domain expertise to identify great features. These features are pretty generic and apply to numerous NLP contexts, so you can re-use them for your own data sets.
Post a Comment