What types of data structure are widely used in cluster algorithm? Explain DATA MATRIX AND DISSIMILARITY MATRIX
DATA MATRIX AND DISSIMILARITY MATRIX
First of all, let us know what types of data structures are widely used in cluster analysis. Main memory-based clustering algorithms typically operate on either of the following two data structures.
Data Matrix: This represents n objects, such as persons, with m attributes, such as age, height. weight, gender, race, and so on. The structure is in the form of a relational table, or n-by-m matrix (n objects x m attributes). The Data Matrix is often called a two-mode matrix since the rows represent objects and columns represent attributes.
Dissimilarity Matrix: This stores a collection of proximities that are available for all pairs of n objects. It is often represented by an n-by-n matrix, where dü, i) is the measured difference or dissimilarity between objects i and J. In general, d(i, i) is a non-negative number that is close to 0 when objects i and j are highly similar or near to each other and become larger the more they differ. The distance measure is symmetric in nature that is, d(i ) -d (j), i) and the distance of an object from itself is zero that is, d(i, i) "0, we have the matrix in the figure.
Comments
Post a Comment