Explain different techniques for distributed database design.(Data Fragmentation, Data Replication, Data Allocation)

The different techniques for distributed database design are:-

1. Data Fragmentation,

2. Data Replication

3. Data Allocation

                                                       


    1. Data Fragmentation,

    Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are called fragments. These fragments may be stored at different locations. Moreover, fragmentation increases parallelism and provides better disaster recovery. Fragmentation can be of three types:

    • Vertical Fragmentation
    • Horizontal Fragmentation
    • Hybrid Fragmentation
    fragmentation should be done in a way so that the original table can be reconstructed from the fragments. This is needed so that the original table can be reconstructed from the fragments whenever required. This requirement is called "reconstructiveness".

    2. Data Replication

    Data Replication is the process of generating and reproducing multiple copies of data at one or more sites. Replication is an important mechanism because it enables organizations to provide users with access to current data where and when they need it. It is intended to increase the fault tolerance of a system such that if one database fails another can continue to serve queries or update requests. Replication is sometimes described using the publishing industry metaphor of publishers, distributors, and subscribers.

    Publisher: A DBMS that makes data available to other locations through replication. The publisher can have one or more publications (made up of one or more articles), each defining a logically related set of objects and data to replicate.

    Distributor: A DBMS that stores replication data and metadata about the publication and in some cases acts as a queue for data moving from the publisher to the subscribers. A DBMS can act as both the publisher and the distributor.

    Subscriber: A DBMS that receives replicated data. A subscriber can receive data from multiple publishers and publications. Depending on the type of replication chosen, the subscriber can also pass data changes back to the publisher or republish the data to other subscribers.

     3. Data Allocation

    • Each fragment or each copy of a fragment is stored at a particular site in the distributed system with an "optimal" distribution. This process is called data distribution (or data allocation). The choice of sites and the degree of replication depend on the performance and availability goals of the system and on the types and frequencies of transactions submitted at each site.
    • Example: If high availability is required, transactions can be submitted at any site, and most transactions are retrieved only, a fully replicated database is a good choice. However, if certain transactions that access particular parts of the database are mostly submitted at a particular site, the corresponding set of fragments can be allocated at that site only. Data that is accessed at multiple sites can be replicated at those sites. If any updates are performed, it may be useful to limit replication. Finding an optimal or even a good solution to distributed data allocation is a complex optimization problem.
    •  There are four alternative strategies regarding the placement of data: centralized, fragmented, complete replication, and selective replication.



    Comments

    Popular posts from this blog

    Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

    Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

    What is national data warehouse? What is census data?