Short note on Web mining.

Web mining can widely be seen as the application of adapted data mining techniques to the web, whereas data mining is defined as the application of the algorithm to discover patterns on mostly structured data embedded into a knowledge discovery process. Web mining has a distinctive property to provide a set of various data types. The web has multiple aspects that yield different approaches for the mining process, such as web pages consisting of text, web pages are linked via hyperlinks, and user activity that can be monitored via web server logs. These three features lead to the differentiation between the three areas are web content mining, web structure mining, web usage mining.

There are three types of data mining:

1. Web Content Mining:

Web content mining can be used to extract useful data, information, knowledge from the web page content. In web content mining, each web page is considered as an individual document. The individual can take advantage of the semi-structured nature of web pages, as HTML provides information that concerns not only the layout but also logical structure. The primary task of content mining is data extraction, where structured data is extracted from unstructured websites. The objective is to facilitate data aggregation over various web sites by using the extracted structured data. Web content mining can be utilized to distinguish topics on the web. For Example, if any user searches for a specific task on the search engine, then the user will get a list of suggestions.

2. Web Structured Mining:

Web structure mining can be used to find the link structure of hyperlink. It is used to identify that data either links the web pages or direct link network. In Web Structure Mining, an individual considers the web as a directed graph, with the web pages being the vertices that are associated with hyperlinks. The most important application in this regard is the Google search engine, which estimates the ranking of its outcomes primarily with the PageRank algorithm. It characterizes a page to be exceptionally relevant when frequently connected by other highly related pages. Structure and content mining methodologies are usually combined. For example, web structured mining can be beneficial to organizations to regulate the network between two commercial sites.

3. Web Usage Mining:

Web usage mining is used to extract useful data, information, knowledge from the weblog records, and assists in recognizing the user access patterns for web pages. In Mining, the usage of web resources, the individual is thinking about records of requests of visitors of a website, that are often collected as web server logs. While the content and structure of the collection of web pages follow the intentions of the authors of the pages, the individual requests demonstrate how the consumers see these pages. Web usage mining may disclose relationships that were not proposed by the creator of the pages.

Challenges in Web Mining:

The web pretends incredible challenges for resources, and knowledge discovery based on the following observations:

The complexity of web pages:

The site pages don't have a unifying structure. They are extremely complicated as compared to traditional text documents. There are enormous amounts of documents in the digital library of the web. These libraries are not organized according to a specific order.

The web is a dynamic data source:

The data on the internet is quickly updated. For example, news, climate, shopping, financial news, sports, and so on.

Diversity of client networks:

The client network on the web is quickly expanding. These clients have different interests, backgrounds, and usage purposes. There are over a hundred million workstations that are associated with the internet and still increasing tremendously.

Relevancy of data:

It is considered that a specific person is generally concerned about a small portion of the web, while the rest of the segment of the web contains data that is not familiar to the user and may lead to unwanted results.

The web is too broad:

The size of the web is tremendous and rapidly increasing. It appears that the web is too huge for data warehousing and data mining.

Mining the Web's Link Structures to recognize Authoritative Web Pages:

The web comprises of pages as well as hyperlinks indicating from one to another page. When a creator of a Web page creates a hyperlink showing another Web page, this can be considered as the creator's authorization of the other page. The unified authorization of a given page by various creators on the web may indicate the significance of the page and may naturally prompt the discovery of authoritative web pages. The web linkage data provide rich data about the relevance, quality, and structure of the web's content, and thus is a rich source of web mining.

Application of Web Mining:

Web mining has an extensive application because of various uses of the web. The list of some applications of web mining is given below:-

Marketing and conversion tool
Data analysis on website and application accomplishment.
Audience behavior analysis
Advertising and campaign accomplishment analysis.
Testing and analysis of a site.

OR,

Web mining

Web mining refers to the process of using data mining techniques to extract useful patterns trends and information usually with the help of the internet by dealing with it from web-based documents and services, server logs, and hyperlinks. The main objective of web mining is to discover the designs in web information by collecting and analyzing data in order to get important insights.

Web mining is further divided into three different types

Web content mining
Web structure mining
Web usage mining

Web content mining

Web content mining refers to the process of extracting data from web pages in order to search different patterns trends that gives useful insight. There are various techniques to extract useful data like web scraping.

Let's understand this concept with the help of an example.

In order to conduct an event or any conference, first, you need to gather useful information about the particular location. It means which location is best suited to conduct the conference so that there will be a huge crowd. To perform the analysis, you need to gather information about the specific location about, state, city, and how far the event location from the invitees is located. Web content mining comes into the picture when any location-specific data is extracted from the web.

Web structure mining

Web structure mining refers to the process in which data from hyperlinks that lead to multiple pages are gathered and prepared to search for new patterns and trends. To views an individual's profile from a web page, there is a possibility that they would insert their social media platform links. So, the data is not only extracted from an individual source but also from the nested page through multiple hyperlinks linked with each page.

Web usage mining

When a web application is hosted, multiple web server logs get generated about the application user's web activity.

Search This Blog

Notes for BSc CSIT

Short note on Web mining.

Comments

Post a Comment

Popular posts from this blog

Short note on Uniform Gradient Cash Flow and PERT

Discuss different JavaFX layouts with suitable example.

What is the cloud cube model? Explain in context to the Jericho cloud cube model along with its various dimensions.