Explain Data-intensive computing in detail.


Data-intensive computing is concerned with the generation, manipulation, and analysis of massive amounts of data ranging from hundreds of terabytes (MB) to petabytes (PB) and beyond. The term dataset refers to a collection of information pieces that are useful to one or more applications. Datasets a frequently stored in repositories, which are infrastructures that allow for the storage, retrieval, and indexing of enormous volumes of data. Relevant pieces of information, known as metadata, are tagged to datasets to aid with categorization and search.

Data-intensive computations are common in a wide range of application disciplines. One of the most well-known is computational science. People who undertake scientific simulations and experiments are frequently eager to generate, evaluate, and analyze massive amounts of data. Telescopes scanning the sky generate hundreds of terabytes of data per second; the collection of photographs of the sky easily exceeds bytes over a year. Bioinformatics applications mine databases that can include terabytes of information. Earthquake simulators handle vast amounts of data generated by tracking the tremors of the Earth over the whole world. Aside from scientific computing, other IT business areas demand data-intensive computation assistance. Any telecom company's customer data will likely exceed 10-100 terabytes. This volume of data is mined not only to create bills, but also to uncover situations, trends, and patterns that assist these organizations to deliver better service.


Big data is a term that refers to the vast volume of data - both structured and unstructured that business faces daily. However, it is not the quantity of data that is crucial. What organizations do with the data is important. Big data can be mined for insights that will lead to improved judgments and smart business movements.

Volume:- Data is collected by organizations from several sources, including commercial transactions, smart (IoT) devices, industrial equipment, videos, social media, and more. Previously, storing it would have been a challenge, but cheaper storage on platforms such as data lakes and Hadoop has alleviated the strain.

Velocity: As the Internet of Things expands, data enters organizations at an unprecedented rate and must be processed promptly. RFID tags, sensors, and smart meters are boosting the demand for near-real-time data processing.

Variety: Data comes in a variety of forms, ranging from structured, numeric data in traditional databases to unstructured text documents, emails, films, audio, stock ticker data, and financial transactions. Veracity: This refers to the degree of correctness and trustworthiness of data sets. Raw data acquired from a variety of sources might result in data quality concerns that are difficult to identify.

Variability: Big data may have various meanings or be formatted differently in different data sources.

Value: The business value of the collected data. 

The significance of big data is determined not by how much data you have, but by what you do with it. When large data of different is combined with powerful analytics many benefits can be gained. 

  • Big data assists oil and gas businesses in identifying possible drilling areas and monitoring pipeline operations; similarly, utilities use it to track power networks. 
  • Big data systems are used by financial services businesses for risk management and real-time market data analysis.
  • Big data is used by manufacturers and transportation businesses to manage supply chains and Emergency response, crime prevention, and smart city efforts are some of the other improved delivery routes.


Popular posts from this blog

Digital Token-based payment system

Discuss classification or taxonomy of virtualization at different levels.

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?