What is meant by data allocation in distributed database design? Explain the alternative strategies regarding the placement of data.

- July 28, 2022

Data Allocation

Each fragment or each copy of a fragment is stored at a particular site in the distributed system with an "optimal" distribution. This process is called data distribution (or data allocation). The choice of sites and the degree of replication depend on the performance and availability goals of the system and on the types and frequencies of transactions submitted at each site.

Example: If high availability is required, transactions can be submitted at any site, and most transactions are retrieved only, a fully replicated database is a good choice. However, if certain transactions that access particular parts of the database are mostly submitted at a particular site, the corresponding set of fragments can be allocated at that site only. Data that is accessed at multiple sites can be replicated at those sites. If any updates are performed, it may be useful to limit replication. Finding an optimal or even a good solution to distributed data allocation is a complex optimization problem.

There are four alternative strategies regarding the placement of data: centralized, fragmented, complete replication, and selective replication.

Centralized

This strategy consists of a single database and DBMS stored at one site with users distributed across the network. The locality of reference is at its lowest as all sites, except the central site, have to use the network for all data access. This also means that communication costs are high. Reliability and availability are low, as a failure of the central site results in the loss of the entire database system.

Fragmented (or Partitioned)

This strategy partitions the database into disjoint fragments, with each fragment assigned to one site. If data items are located at the site where they are used most frequently, the locality of reference is high. As there is no replication, storage costs are low; similarly, reliability and availability are low, although they are higher than in the centralized case, as the failure of a site results in the loss of only that site's data. Performance should be good and communications cost low if the distribution is designed properly.

Complete Replication

This strategy consists of maintaining a complete copy of the database at each site. Therefore, the locality of reference, reliability, availability, and performance is maximized. However, storage costs and communication costs for updates are the most expensive. To overcome some of these problems, snapshots are sometimes used. A snapshot is a copy of the data at a given time. The copies are updated periodically-for example, hourly or weekly-so they may not be always up to date. Snapshots are also sometimes used to implement views in a distributed database to improve the time it takes to perform a database operation on a view.

Selective Replication

This strategy is a combination of fragmentation, replication, and centralization. Some data items are fragmented to achieve a high locality of reference, and others that are used at many sites and are not frequently updated are replicated; otherwise, the data items are centralized. The objective of this strategy is to have all the advantages of the other approaches but none of the disadvantages. This is the most commonly used strategy, because of its flexibility.

Search This Blog

Notes for BSc CSIT