Semi-supervised ensemble clustering based on selected constraint projection

Semi-supervised ensemble clustering based on selected constraint projection

Abstract:

Traditional cluster ensemble approaches have several limitations. (1) Few make use of prior knowledge provided by experts. (2) It is difficult to achieve good performance in high-dimensional datasets. (3) All of the weight values of the ensemble members are equal, which ignores different contributions from different ensemble members. (4) Not all pair wise constraints contribute to the final result. In the face of this situation, we propose double weighting semi-supervised ensemble clustering based on selected constraint projection (DCECP) which applies constraint weighting and ensemble member weighting to address these limitations. Specifically, DCECP first adopts the random subspace technique in combination with the constraint projection procedure to handle high-dimensional datasets. Second, it treats prior knowledge of experts as pairwise constraints, and assigns different subsets of pairwise constraints to different ensemble members. An adaptive ensemble member weighting process is designed to associate different weight values with different ensemble members. Third, the weighted normalized cut algorithm is adopted to summarize clustering solutions and generate the final result. Finally, nonparametric statistical tests are used to compare multiple algorithms on real-world datasets. Our experiments on 15 high-dimensional datasets show that DCECP performs better than most clustering algorithms.

Existing System:

However, several limitations exist for conventional cluster ensemble approaches. (1) Few consider knowledge provided by experts in specific domains. (2) Few take into consideration how to handle high-dimensional data. (3) Most treat each ensemble member equally even though different ensemble members make different contributions to the final result. (4) While most pairwise constraints contribute to the final result, some may negatively affect the outcome.

 

 

Disadvantage:

CESCP makes full use of pairwise constraints, which are prior knowledge provided by experts. Different weight values are assigned to different ensemble members according to different subsets of pairwise constraints, which increase the diversity of the ensemble.

CESCP not only adopts the random subspace technique to handle high-dimensional data, but also uses the constraint projection technique to map high-dimensional data into a low-dimensional space.

Proposed System:

The contributions of this paper are as follows: (1) a new semi-supervised clustering ensemble approach based on selected constraint projection (CESCP) is proposed, which not only use the constraint subset selection process to make full use of prior knowledge provided by experts, but also adopts the constraint projection technique to map high-dimensional data into a low-dimensional space. (2) The double weighting semi supervised ensemble clustering based on selected constraint projection (DCECP) framework is designed, which adopts the adaptive ensemble member weighting process to create weights for ensemble members using competition, and increases the diversity of the ensemble.(3) The random subspace method is combined with constraint projection technique to handle high-dimensional data.