Explain different techniques for distributed database design.(Data Fragmentation, Data Replication, Data Allocation)

- July 29, 2022

The different techniques for distributed database design are:-

1. Data Fragmentation,

2. Data Replication

3. Data Allocation

1. Data Fragmentation,

Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are called fragments. These fragments may be stored at different locations. Moreover, fragmentation increases parallelism and provides better disaster recovery. Fragmentation can be of three types:

Vertical Fragmentation
Horizontal Fragmentation
Hybrid Fragmentation

fragmentation should be done in a way so that the original table can be reconstructed from the fragments. This is needed so that the original table can be reconstructed from the fragments whenever required. This requirement is called "reconstructiveness".

2. Data Replication

Data Replication is the process of generating and reproducing multiple copies of data at one or more sites. Replication is an important mechanism because it enables organizations to provide users with access to current data where and when they need it. It is intended to increase the fault tolerance of a system such that if one database fails another can continue to serve queries or update requests. Replication is sometimes described using the publishing industry metaphor of publishers, distributors, and subscribers.

Publisher: A DBMS that makes data available to other locations through replication. The publisher can have one or more publications (made up of one or more articles), each defining a logically related set of objects and data to replicate.

Distributor: A DBMS that stores replication data and metadata about the publication and in some cases acts as a queue for data moving from the publisher to the subscribers. A DBMS can act as both the publisher and the distributor.

Subscriber: A DBMS that receives replicated data. A subscriber can receive data from multiple publishers and publications. Depending on the type of replication chosen, the subscriber can also pass data changes back to the publisher or republish the data to other subscribers.

3. Data Allocation

Each fragment or each copy of a fragment is stored at a particular site in the distributed system with an "optimal" distribution. This process is called data distribution (or data allocation). The choice of sites and the degree of replication depend on the performance and availability goals of the system and on the types and frequencies of transactions submitted at each site.

Example: If high availability is required, transactions can be submitted at any site, and most transactions are retrieved only, a fully replicated database is a good choice. However, if certain transactions that access particular parts of the database are mostly submitted at a particular site, the corresponding set of fragments can be allocated at that site only. Data that is accessed at multiple sites can be replicated at those sites. If any updates are performed, it may be useful to limit replication. Finding an optimal or even a good solution to distributed data allocation is a complex optimization problem.

There are four alternative strategies regarding the placement of data: centralized, fragmented, complete replication, and selective replication.

Search This Blog

Notes for BSc CSIT