Spatial Coordinate Trial : Converting Non-Spatial Data Dimension for DBSCAN

Eka Arriyanti, Ita Arfyanti, Pitrasacha Adytia

Abstract


In big data, noise in data mining is a necessity. Its existence depends on data and algorithm, but it does not mean the algorithm caused noise. Although the advantages of the Density Based Spatial Clustering Application with Noise, DBSCAN algorithm, in executing spatial data (two-dimensional data) have been widely discussed, but it has not been convincing in executing non-spatial data. As an algorithm should perform well on any data for optimizing data mining, this research proposes a trial to convert dimensions of non-spatial data into 2 dimensions for executing with DBSCAN algorithm, and a different input value for epsilon to know about its minimum which begins arising noise in the execution. Method of analysis in trial is with considering the attributes of non-spatial data as variables that represent coordinate points, rather than cardinality. Technically, it is assumed that 2-dimensional coordinate axes as a spot point for coordinate with more than or equal 3 dimensions according to development of Cartesian coordinate system, by first paying attention to relationship of variables (attributes). This way is then called Spatial Coordinate. The different input values are with paying attention to numbers from non-zero minimum distance to the forth of epsilon where the epsilon is in integer. The results of trial and testing on clusters formed, with Silhouette Coefficient, point out that the clusters are well, strong, and quality enough. Therefore, this research gives a new way on how preprocessing non-spatial data for DBSCAN algorithm performance.



Keywords


data; algorithm; noise; DBSCAN; spatial

Full Text: PDF

Refbacks

  • There are currently no refbacks.