Define distributed database. What are the benefits of using distributed database over centralized database? Explain availability, reliability, and scalability features of distributed databases.

 Distributed database (DDB)

 A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. The distributed database (DDB) and distributed database management system (DDBMS) together are called Distributed database systems (DDBS).


The benefits of using distributed database over centralized database are as follows:

Keeping track of data: The ability to keep track of the data distribution, fragmentation, and replication by expanding the DDBMS catalog.

Distributed query processing: The ability to access remote sites and transmit queries and data amona g the various sites via a a communication network.

Distributed transaction management: The ability to devise execution strategies for queries and transactions that access data from more than one site and synchronize the access to distributed data and maintain the integrity of the overall database.

Replicated data management: The ability to decide which copy of a replicated data item to access and to maintain the consistency of copies of a replicated data item.

Distributed database recovery: The ability to recover from individual site crashes and from new types of failures such as the failure of communication links.

Security: Distributed transactions must be executed with the proper management of the security of the data and the authorization/access privileges of users.

Distributed directory (catalog) management: A directory contains information (meta-data) about data in the database. The directory may be global for the entire DDB, or local for each site. The placement and distribution of the directory are design and policy issues.


The availability, reliability, and scalability features of distributed databases are :-

 Availability: 

  • Availability is the probability that the system is continuously available (usable or accessible) during a time interval.
  •  In a centralized DBMS, a computer failure terminates the operations of the DBMS. However, a failure at one site of a DDBMS or a failure of a communication link making some sites inaccessible does not make the entire system inoperable.
  • Distributed DBMSs are designed to continue to function despite such failures. If a single node fails, the system may be able to reroute the failed node's requests to another site.

OR,

Availability

  •  The fraction of the time that a system meets its specification.
  •  The probability that the system is operational at a given time t.

 Reliability: 

  • Reliability refers to system live time, that is, the system is running efficiently most of the time.Because data may be replicated so that it exists at more than one site, the failure of a node or a communication link does not necessarily make the data inaccessible.
  •  Improved performance as the data is located near the site of "greatest demand," and given the inherent parallelism of distributed DBMSs, the speed of database access may be better than that achievable from a remote centralized database. 
  • Furthermore, since each site handles only a part of the entire database, there may not be the same contention for CPU and I/O services as characterized by a centralized DBMS.

OR,

Reliability 

  •  A measure of success with which a system conforms to some authoritative specification of its behavior. 
  •  The probability that the system has not experienced any failures within a given time period. 
  •  Typically used to describe systems that cannot be repaired or where the continuous operation of the system is critical.

Scalability

Scalability measures the ability of database to handle bigger loads with more resources. In distributed databases, more resources mean more nodes, and we talk about horizontal scalability. In a centralized database, more resources mean more CPUs, more memory, more disks, and we talk about vertical scalability.Allows new nodes (computers) to be added anytime without chaining the entire configuration.


Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

What is national data warehouse? What is census data?