What is the CAP theorem? Which of the three properties (consistency, availability, partition tolerance) are most important in NOSQL systems?

 THE CAP THEOREM

CAP stands for Consistency, Availability, and Partitioning. It is very important to understand the limitations of the NoSQL database. NoSQL cannot provide consistency and high availability together. This was first expressed by Eric Brewer in CAP Theorem. CAP theorem or Eric Brewer's theorem states that we can only achieve at most two out of three guarantees for a database: Consistency, Availability, and Partition Tolerance.

Consistency

Consistency is all about data consistency, or in other words, making sure that within a distributed environment, every node of the database has exactly the same information at any given time. Imagine having two nodes with purchase orders from your eCommerce site. If there is no data and Bigdata 209 consistency amongst them and they're acting as a unique cluster, the moment your client app queries the outdated node, it might show you missing transactions. And if the code is not only showing, but also making calculations based on that data, the results might be disastrous.

So consistency is definitely an important characteristic of any distributed NoSQL database. However, not all of them can provide it. So what do they do instead? They go for something called "eventual consistency." Meaning that while at one point the cluster may not be consistent, it will eventually be so. This helps in making sure that you don't get the types of problems I mentioned before.

Availability

Availability stands for "high availability" or in other words the ability of the database to always be available, no matter what happens. This is not the same and should not be confused with "fault tolerance" however. A highly available database is usually one that has replicas in multiple geographical zones, that way if there is a big network outage, it'll still be accessible through one of its other replicas. For example, a system that's only installed and working on one of our servers can't be highly available because the moment that server fails, we'll lose our database.

Partitioning

Partitioning stands for "partitioning tolerance" or in other words, having the ability to support broken links within the cluster in which the database is distributed. Think about a graph representing your database cluster. You have multiple nodes sharing data and working wonderfully and suddenly there is a problem and a section of that cluster fails. If the database is "partition tolerant" it'll still work despite the sudden lack of some of its nodes. 



Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Explain network topology .Explain tis types with its advantages and disadvantges.