Difference between data mining and OLAP.

 Data mining 

 Data mining refers to mining knowledge from a huge amount of data. In other words, you can say that data mining means gathering information and assembling data from various areas like data warehouses and data mining algorithms, searching for trends, patterns that business organizations can use to uplift customer service, thereby increasing their profit.


Properties of data mining

These are the key properties of data mining

  • Finding patterns automatically
  • Focus on huge datasets and databases
  • Predict the outcomes
  • Creation of actionable information


OLAP

  • OLAP stands for Online Analytical Processing. It is a computing method that allows users to extract useful information and query data in order to analyze it from different angles. For example, OLAP business intelligence queries usually aid in financial reporting, budgeting, predict future sales, trends analysis and other purposes. It enables the user to analyze database information from different database systems simultaneously. OLAP data is stored in multidimensional databases.
  • OLAP and data mining look similar since they operate on data to gain knowledge, but the major difference is how they operate on data. OLAP tools provide multidimensional data analysis and a summary of the data.


Key features of OLAP

  • It supports complex calculations
  • Time intelligence
  • It has a multidimensional view of data
  • Business-focused calculations
  • Flexible and self-service reporting

Applications of OLAP

  • Database Marketing
  • Marketing and sales analysis

The difference between data mining and OLAP are as follows:- 

Data mining

  • Data mining refers to the field of computer science, which deals with the extraction of data, trends, and patterns from huge sets of data.
  • It deals with the data summary.
  • It is discovery-driven.
  • It is used for future data prediction.
  • It has huge numbers of dimensions.
  • Bottom-up approach.
  • It is an emerging field.

OLAP
  • OLAP is a technology of immediate access to data with the help of multidimensional structures.
  • It deals with detailed transaction-level data.
  • It is query-driven.
  • It is used for analyzing past data.
  • It has a limited number of dimensions.
  • Top-down approach.
  • It is widely used.





Comments

Popular posts from this blog

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Suppose that a data warehouse consists of the three dimensions time, doctor, and patient, and the two measures count and charge, where a charge is the fee that a doctor charges a patient for a visit. a) Draw a schema diagram for the above data warehouse using one of the schemas. [star, snowflake, fact constellation] b) Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2004? c) To obtain the same list, write an SQL query assuming the data are stored in a relational database with the schema fee (day, month, year, doctor, hospital, patient, count, charge)

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?