Short note on Multimedia Data Mining.

 Multimedia Data Mining

Mining Multimedia Data

  •  Multimedia data mining is the discovery of interesting patterns from multimedia databases that store and manage large collections of multimedia objects, including image data, video data, audio data, as well as sequence data and hypertext data containing text, text markups, and linkages. 
  • Multimedia data mining is an interdisciplinary field that integrates image processing and understanding, computer vision, data mining, and pattern recognition. 
  • Issues in multimedia data mining include content-based retrieval and similarity search, and generalization and multidimensional analysis. Multimedia data cubes contain additional dimensions and measures for multimedia information. Other topics in multimedia mining include classification and prediction analysis, mining associations, and video and audio data mining
  • A multimedia database system stores and manages a large collection of multimedia data, such as audio, video, image, graphics, speech, text, document, and hypertext data, which contain text, text markups, and linkages. Typical multimedia database systems include NASA’s EOS (Earth Observation System), various kinds of image and audio-video databases, and Internet databases.
  • Normally study of multimedia data mining focuses on image data mining. For similarity searching in multimedia data, we consider two main families of multimedia indexing and retrieval systems:

a)  Description-based retrieval systems: which build indices and perform object retrieval based on image descriptions, such as keywords, captions, size, and time of creation.

b) Content-based retrieval systems: It supports retrieval based on the image content, such as color histogram, texture, pattern, image topology, and the shape of objects and their layouts and locations within the image.


                             OR,

Multimedia Data Mining

Multimedia Data Mining is a subfield of data mining that deals with an extraction of implicit knowledge, multimedia data relationships, or other patterns not explicitly stored in multimedia databases


Multimedia Data Types

– any type of information medium that can be represented, processed, stored and transmitted over network in digital form

– Multi-lingual text, numeric, images, video, audio, graphical, temporal, relational, and categorical data.

– Relation with conventional data mining term


Generalizing Multimedia Data

a) Image data:

– Extracted by aggregation and/or approximation

– Size, color, shape, texture, orientation, and relative positions and structures of the contained objects or regions in the image


b)Music data:

– Summarize its melody: based on the approximate patterns that repeatedly occur in the segment

– Summarized its style: based on its tone, tempo, or the major musical instruments played


c)Video:

– provide news video annotation and indexing

– traffic monitoring system with conventional data mining term


Multidimensional Analysis of Multimedia Data

  • Multimedia Data Cube

– Design and construction similar to that of traditional data cubes from relational data

– Contain additional dimensions and measures for multimedia information, such as color, texture, and shape.


  •  The database does not store images but their descriptors.

Feature descriptor: a set of vectors for each visual characteristic

• Color vector: contains the color histogram

• MFC (Most Frequent Color) vector: five color centroids

• MFO (Most Frequent Orientation) vector: five edge orientation centroids

– Layout descriptor: contains a color layout vector and an edge layout vector

Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Pure Versus Partial EC

Discuss classification or taxonomy of virtualization at different levels.