What types of data structure are widely used in cluster algorithm? Explain DATA MATRIX AND DISSIMILARITY MATRIX

 DATA MATRIX AND DISSIMILARITY MATRIX

First of all, let us know what types of data structures are widely used in cluster analysis. Main memory-based clustering algorithms typically operate on either of the following two data structures.

 Data Matrix: This represents n objects, such as persons, with m attributes, such as age, height. weight, gender, race, and so on. The structure is in the form of a relational table, or n-by-m matrix (n objects x m attributes). The Data Matrix is often called a two-mode matrix since the rows represent objects and columns represent attributes.



Dissimilarity Matrix: This stores a collection of proximities that are available for all pairs of n objects. It is often represented by an n-by-n matrix, where dü, i) is the measured difference or dissimilarity between objects i and J. In general, d(i, i) is a non-negative number that is close to 0 when objects i and j are highly similar or near to each other and become larger the more they differ. The distance measure is symmetric in nature that is, d(i ) -d (j), i) and the distance of an object from itself is zero that is, d(i, i) "0, we have the matrix in the figure.



Comments

Popular posts from this blog

Discuss classification or taxonomy of virtualization at different levels.

What is RMI? Discuss stub and skeleton. Explain its role in creating distributed applications.

Suppose that a data warehouse consists of the three dimensions time, doctor, and patient, and the two measures count and charge, where a charge is the fee that a doctor charges a patient for a visit. a) Draw a schema diagram for the above data warehouse using one of the schemas. [star, snowflake, fact constellation] b) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? c) To obtain the same list, write an SQL query assuming the data are stored in a relational database with the schema fee (day, month, year, doctor, hospital, patient, count, charge)