Feature Selection for Knowledge Discovery and Data Mining by Huan Liu, Hiroshi Motoda (auth.)

By Huan Liu, Hiroshi Motoda (auth.)

As computing device strength grows and knowledge assortment applied sciences strengthen, a plethora of information is generated in nearly each box the place pcs are used. The com­ puter generated information might be analyzed by way of pcs; with out assistance from computing applied sciences, it truly is definite that vast quantities of knowledge amassed won't ever be tested, not to mention be used to our merits. in spite of state-of-the-art complex machine applied sciences (e. g. , computing device studying and information mining sys­ tems), researching wisdom from facts can nonetheless be fiendishly challenging as a result features of the pc generated info. Taking its least difficult shape, uncooked info are represented in feature-values. the dimensions of a dataset could be measUJ·ed in dimensions, variety of good points (N) and variety of situations (P). either Nand P will be tremendously huge. This enormity could cause critical difficulties to many info mining platforms. characteristic choice is among the lengthy present tools that take care of those difficulties. Its aim is to choose a minimum subset of good points in accordance with a few average standards in order that the unique job might be completed both good, if now not greater. through picking a minimum subset offeatures, inappropriate and redundant beneficial properties are got rid of in response to the criterion. whilst N is diminished, the knowledge house shrinks and in a feeling, the knowledge set is now a greater consultant of the entire facts inhabitants. If worthy, the relief of N may also provide upward push to the aid of P through taking out duplicates.

Show description

Read Online or Download Feature Selection for Knowledge Discovery and Data Mining PDF

Similar mining books

Data Mining im Personalmanagement: Eine Analyse des Einsatzpotenzials zur Entscheidungsunterstützung

Mit Data-Mining-Methoden stehen dem Personalmanagement cutting edge Analysemöglichkeiten zur Verfügung, die dem Entscheidungsträger neue und interessante Informationen liefern können. Franca Piazza untersucht auf foundation der Entscheidungstheorie systematisch und umfassend das Einsatzpotenzial von information Mining im Personalmanagement.

Advances in Web Mining and Web Usage Analysis: 9th International Workshop on Knowledge Discovery on the Web, WebKDD 2007, and 1st International Workshop on Social Networks Analysis, SNA-KDD 2007, San Jose, CA, USA, August 12-15, 2007. Revised Papers

This ebook constitutes the completely refereed post-workshop court cases of the ninth foreign Workshop on Mining internet information, WEBKDD 2007, and the first overseas Workshop on Social community research, SNA-KDD 2007, together held in St. Jose, CA, united states in August 2007 together with the thirteenth ACM SIGKDD foreign convention on wisdom Discovery and knowledge Mining, KDD 2007.

Best Practices for Dust Control in Coal Mining

Compiled by means of the U. S. Dept of overall healthiness and Human providers, CDC/NIOSH workplace of Mine security and well-being study, this 2010 instruction manual was once constructed to spot to be had engineering controls which can aid the decrease employee publicity to respirable coal and silica dirt. The controls mentioned during this guide diversity from long-utilized controls that experience built into criteria to more moderen controls which are nonetheless being optimized.

Offshore operation facilities : equipment and procedures

Offshore Operation amenities: apparatus and methods presents new engineers with the information and strategies that would support them in maximizing potency whereas minimizing fee and is helping them organize for the various operational variables fascinated about offshore operations. This e-book basically provides the operating wisdom of subsea operations and demonstrates easy methods to optimize operations offshore.

Extra resources for Feature Selection for Knowledge Discovery and Data Mining

Example text

That is, for a large number of problems, on average, a method searching in one direction will find the optimal subset as fast as a method searching in the other direction. Search directions are closely related PERSPECTIVES OF FEATURE SELECTION 21 to feature subset generation. , Sequential Backward Generation). • Sequential Forward Generation (SFG) It begins with an empty set of features, S6elect. As search starts, features are added into S6elect one at a time (thus, sequential). At each time, the best feature among unselected o:rres is chosen based on some criterion.

Basic Algorithm of Tree Induction • Initialize by setting variable T to be the training set. • Apply the following steps to T. 1. If all elements in T are of class Cj, create a Cj node and halt I. 2. Otherwise select a feature F with values VI, V2, ... , VN. Partition T into T I , T2, ... , TN, according to their values on F. •• , TN as the nodes. ,~hild 3. Apply the procedure recursively to each child node. Information gain is used to choose feature F. Hence, building from the tl:aining data, we obtain a tree with its leaves being class labels.

Heuristic search is obviously much faster than exhaustive search since it only searches a particular path and finds a near-optimal subset. The above two categories complement each other in a sense that one does what the other is incapable of doing. Heuristic search strategies cannot guarantee the optimality of a subset while complete search strategies can, but there is a possibility that you may not get a solution within a reasonably long period. This is because the latter may take extremely long time with unlimited memory /disk space.

Download PDF sample

Rated 4.35 of 5 – based on 14 votes