Define distributed database. What are the benefits of using distributed database over centralized database? Explain availability, reliability, and scalability features of distributed databases.

- August 01, 2022

Distributed database (DDB)

A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. The distributed database (DDB) and distributed database management system (DDBMS) together are called Distributed database systems (DDBS).

The benefits of using distributed database over centralized database are as follows:

Keeping track of data: The ability to keep track of the data distribution, fragmentation, and replication by expanding the DDBMS catalog.

Distributed query processing: The ability to access remote sites and transmit queries and data amona g the various sites via a a communication network.

Distributed transaction management: The ability to devise execution strategies for queries and transactions that access data from more than one site and synchronize the access to distributed data and maintain the integrity of the overall database.

Replicated data management: The ability to decide which copy of a replicated data item to access and to maintain the consistency of copies of a replicated data item.

Distributed database recovery: The ability to recover from individual site crashes and from new types of failures such as the failure of communication links.

Security: Distributed transactions must be executed with the proper management of the security of the data and the authorization/access privileges of users.

Distributed directory (catalog) management: A directory contains information (meta-data) about data in the database. The directory may be global for the entire DDB, or local for each site. The placement and distribution of the directory are design and policy issues.

The availability, reliability, and scalability features of distributed databases are :-

Availability:

Availability is the probability that the system is continuously available (usable or accessible) during a time interval.

In a centralized DBMS, a computer failure terminates the operations of the DBMS. However, a failure at one site of a DDBMS or a failure of a communication link making some sites inaccessible does not make the entire system inoperable.

Distributed DBMSs are designed to continue to function despite such failures. If a single node fails, the system may be able to reroute the failed node's requests to another site.

OR,

Availability

The fraction of the time that a system meets its specification.
The probability that the system is operational at a given time t.

Reliability:

Reliability refers to system live time, that is, the system is running efficiently most of the time.Because data may be replicated so that it exists at more than one site, the failure of a node or a communication link does not necessarily make the data inaccessible.

Improved performance as the data is located near the site of "greatest demand," and given the inherent parallelism of distributed DBMSs, the speed of database access may be better than that achievable from a remote centralized database.

Furthermore, since each site handles only a part of the entire database, there may not be the same contention for CPU and I/O services as characterized by a centralized DBMS.

OR,

Reliability

A measure of success with which a system conforms to some authoritative specification of its behavior.

The probability that the system has not experienced any failures within a given time period.

Typically used to describe systems that cannot be repaired or where the continuous operation of the system is critical.

Scalability

Scalability measures the ability of database to handle bigger loads with more resources. In distributed databases, more resources mean more nodes, and we talk about horizontal scalability. In a centralized database, more resources mean more CPUs, more memory, more disks, and we talk about vertical scalability.Allows new nodes (computers) to be added anytime without chaining the entire configuration.

Search This Blog

Notes for BSc CSIT