Short note on Web Structure Mining.

 Web Structure Mining

  • Web structure mining studies the model underlying the link structures of the Web. It has been used for search engine result ranking and other Web applications.
  • Web structure mining is the process of using graph and network mining theory and methods to analyze the nodes and connection structures on the Web. It extracts our patterns from hyperlinks, where a hyperlink is a structural component that connects a web page to another location. It can also mine the document structure within a page (e.g., analyze the treelike structure of page structures to describe HTML or XML tag usage). Both kinds of web structure mining help us understand web content and may also help transform web content into relatively structured data sets.
                                       OR,
Web Structure Mining
The aim of Web structure mining is to discover the link structure or the model that is assumed to underlie the Web. The model may be based on the topology of the hyperlinks. This can help in discovering similarities between sites or in discovering authority sites for a particular topic or discipline or in discovering overview or survey sites that point to many authority sites (such sites are called hubs) Link structure is only one kind of information that may be used in analyzing the structure of the Web. 

As noted earlier, the links on Web pages provide a useful source of information that may be harnessed in Web searches. Kleinberg has developed a connectivity analysis algorithm called Hyperlink-Induced Topic Search (HITS) based on the assumption that links represent human judgment. His algorithm also assumes that for any query topic, there are a set of "authoritative" or "authority" sites that are relevant and popular focusing on the topic and there are "hub" sites that contain useful links to relevant sites on the topic including links to many related authorities.

Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Explain network topology .Explain tis types with its advantages and disadvantges.