List any two challenge/issue of multimedia mining. Differentiate between web usage mining and web content mining.

 Challenges/Issues in multimedia data mining include- content-based retrieval and similarity search, generalization, and multidimensional analysis.

 Challenges/ISSUES IN MULTIMEDIA MINING

Major Issues in multimedia data mining contain content-based retrieval, similarity search, dimensional analysis, classification, prediction analysis, and mining associations in multimedia data.

1. Content-based retrieval and Similarity search

 Content-based retrieval in multimedia is a stimulating problem since multimedia data is required for detailed analysis from pixel values. We considered two main families of multimedia retrieval systems i.e. similarity search in multimedia data.

(1) Description-based retrieval system created indices and makes object retrieval, based on image descriptions, for example, keywords, captions, size, and time of creation.

(2) Content-based retrieval system supports retrieval of the image content, for example, color histogram, texture, shape, objects, and wavelet transform.

Use of content-based retrieval system: Visual features index images and promotes object retrieval based on feature similarity; it is very desirable in various applications. These applications include diagnosis, weather prediction, TV production, and internet search engines for pictures and e-commerce.


2. Multidimensional Analysis

In order to perform multidimensional analysis of large multimedia databases, multimedia data cubes may be designed and constructed in a method similar to that for traditional data cubes from relational data. A multimedia data cube can have additional dimensions and measures for multimedia data, such as color, texture, and shape. A multimedia data cube has several dimensions. 

Examples are the size of the image or video in bytes; width and height of the frames, creating two dimensions, date on which image or video was created or last modified, format type of the image or video, frame sequence duration in seconds, Internet domain of pages referencing the image or video, the keywords like a color dimension and edge orientation dimension. The  Multimedia data mining system prototype is called MultiMediaMiner which is the extension of the DBMiner system that handles multimedia data. The Image Excavator component of MultiMediaMiner uses image contextual information, like HTML tags on Web pages, to derive keywords. By navigating online directory structures, like Yahoo! directory, it is possible to build hierarchies of keywords mapped on the directories in which the image was found.


3. Classification and Prediction Analysis

Classification and predictive analysis has been used for mining multimedia data, particularly in scientific analysis like astronomy, seismology, and geoscientific analysis. Decision tree classification is an important data mining method in reported image data mining applications. For example, consider the sky images which have been carefully classified by astronomers as the training set, it can create models for the recognition of galaxies, stars, and further stellar objects, based on properties like magnitudes, areas, intensity, image moments, and orientation. The image data are frequently in large volumes and need substantial processing power, for example, parallel and distributed processing. Image data mining classification and clustering are carefully connected to image analysis and scientific data mining and hence many image analysis techniques and scientific data analysis methods could be applied to image data mining.


4. Mining Associations in Multimedia Data

Association rules involving multimedia objects have been mined in image and video databases.

Three categories can be observed:

1. Associations between image content and non-image content features

2. Associations among image contents that are not related to spatial relationships

3. Associations among image contents related to spatial relationships

With the associations between multimedia objects, we can treat every image as a transaction and find commonly occurring patterns among different images. First, an image contains multiple objects, each with various features such as color, shape, texture, keyword and spatial locations, so that there can be a huge number of possible associations. Second, a picture containing multiple repeated objects is an essential feature in image analysis, recurrence of the similar objects should not be ignored in association analysis. Third, to find the associations between the spatial relationships and multimedia images and this can be used for discovering object associations and correlations 


2nd part

The difference between web usage mining and web content mining are as follows:-

Web Usage Mining:

 Web usage mining primarily deals with understanding user behavior interacting with the Web or with a website. One of the aims is to obtain information that may ass website re-organization or assist site adaptation to better suit the user. The mined data often include data logs of users' interactions with the Web. The logs include the Web server logs, proxy server lo, and browser logs. The logs include information about the referring pages, user identification, time the user spends at a site, and the sequence of pages visited. Information is also collected via cookie file While Web structure mining shows that page A has a link to page B, Web usage mining shows wh or how many people took that link, which site they came from, and where they went when they left page B.


Web Content Mining: 

Web content mining extracts or mines useful information or knowledge from Web page content. Web content mining focuses on the content of the Web pages rather than t links. For example, we can automatically classify and cluster Web pages according to their topic Web content is a very rich information resource consisting of many types of information, for example, unstructured free text, image, audio, video, animation, and metadata as well as hyperlinks. A variety of techniques are therefore needed to retrieve the content of interest. The content of web pages includes no machine-readable semantic information. Search engines, subject directories, intelligent agents, cluster analysis, and portals are employed to find what a user might be looking for. It has been suggested that users should be able to pose more sophisticated queries than just specifying the keywords.



Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

What is national data warehouse? What is census data?