Discuss different categories of link mining tasks.
Link Mining Tasks
Link mining puts a new twist on some classic data mining tasks and also poses new problems. One way to understand the different types of learning and inference problems is to categorize them in terms of the components of the data that are being targeted. Table 8.1 gives a simple characterization, Note that for the object-related tasks, even though we are concerned with classifying, clustering, consolidating or ranking the objects, we will be exploiting the links. Similarly, for link-related tasks, we can use information about the objects that participate in the links, and their links to other objects, and so on.
1. Link-based object ranking: It exploits the link structure of a graph to order or prioritize the use of objects within the graph. It focuses on graphs with single object types and single link types. PageRank and HITS are typical link-based object ranking approaches for web information analysis. It is a core analysis task in social network analysis and link mining.
2. Link-based object classification: In traditional classification methods, objects are classified based on the attributes that describe them. Link-based classification predicts the category of an object-based not only on its attributes, but also on its links, and on the attributes of linked objects. Web page classification is a well-recognized example of link-based classification. An example from epidemiology is the task of predicting the disease type of a patient based on characteristics (e.g., symptoms) of the patient, and on characteristics of other people with whom the patient has been in contact.
3. Object type prediction: This predicts the type of an object, based on its attributes and its links, and on the attributes of objects linked to it. In the communication domain, a similar task is to predict whether a communication contact is by e-mail, phone call, or mail.
4. Link-type prediction: This predicts the type or purpose of a link, based on the properties of the objects involved. Given Web page data, we can try to predict whether a link on a page is an advertising link or a navigational link.
5. Predicting link existence: Unlike link type prediction, where we know a connection exists between two objects and we want to predict its type, instead, we may want to predict whether a link exists between two objects. Examples include predicting whether there will be a link between two Web pages and whether a paper will cite another paper.
6. Link cardinality estimation: There are two forms of link cardinality estimation. First, we may predict the number of links to an object. This is useful, for instance, in predicting the authoritativeness of a Web page based on the number of links to it (in-links).
This is important in estimating the number of objects that will be returned by a query. In the Web page domain, we may predict the number of pages that would be retrieved by crawling a site. Regarding citations, we can also use link cardinality estimation to predict the number of citations of a specific author in a given journal.
7. Object reconciliation: In object reconciliation, the task is to predict whether two objects are, in fact, the same, based on their attributes and links. This task is common in information extraction, duplication elimination, object consolidation, and citation matching, and is also known as record linkage or identity uncertainty. Examples include predicting whether two websites are mirrors of each other, whether two citations actually refer to the same paper, and whether two apparent disease strains are really the same.
8. Group detection: Group detection is a clustering task. It predicts when a set of objects belong to the same group or cluster, based on their attributes as well as their link structure. An area of application is the identification of Web communities, where a Web community is a collection of Web pages that focus on a particular theme or topic. A similar example in the bibliographic domain is the identification of research communities.
9. Subgraph detection: Subgraph identification finds characteristic subgraphs within networks. This is a form of graph search. An example from biology is the discovery of subgraphs corresponding to protein structures. In chemistry, we can search for subgraphs representing chemical substructures.
Comments
Post a Comment