Short note on Web Structure Mining.

- January 01, 2022

Web Structure Mining

Web structure mining studies the model underlying the link structures of the Web. It has been used for search engine result ranking and other Web applications.

Web structure mining is the process of using graph and network mining theory and methods to analyze the nodes and connection structures on the Web. It extracts our patterns from hyperlinks, where a hyperlink is a structural component that connects a web page to another location. It can also mine the document structure within a page (e.g., analyze the treelike structure of page structures to describe HTML or XML tag usage). Both kinds of web structure mining help us understand web content and may also help transform web content into relatively structured data sets.

OR,

Web Structure Mining

The aim of Web structure mining is to discover the link structure or the model that is assumed to underlie the Web. The model may be based on the topology of the hyperlinks. This can help in discovering similarities between sites or in discovering authority sites for a particular topic or discipline or in discovering overview or survey sites that point to many authority sites (such sites are called hubs) Link structure is only one kind of information that may be used in analyzing the structure of the Web.

As noted earlier, the links on Web pages provide a useful source of information that may be harnessed in Web searches. Kleinberg has developed a connectivity analysis algorithm called Hyperlink-Induced Topic Search (HITS) based on the assumption that links represent human judgment. His algorithm also assumes that for any query topic, there are a set of "authoritative" or "authority" sites that are relevant and popular focusing on the topic and there are "hub" sites that contain useful links to relevant sites on the topic including links to many related authorities.

Search This Blog

Notes for BSc CSIT

Short note on Web Structure Mining.

Comments

Post a Comment

Popular posts from this blog

Pure Versus Partial EC

Discuss classification or taxonomy of virtualization at different levels.