What is the CAP theorem? Which of the three properties (consistency, availability, partition tolerance) are most important in NOSQL systems?
THE CAP THEOREM
CAP stands for Consistency, Availability, and Partitioning. It is very important to understand the limitations of the NoSQL database. NoSQL cannot provide consistency and high availability together. This was first expressed by Eric Brewer in CAP Theorem. CAP theorem or Eric Brewer's theorem states that we can only achieve at most two out of three guarantees for a database: Consistency, Availability, and Partition Tolerance.
Consistency
Consistency is all about data consistency, or in other words, making sure that within a distributed environment, every node of the database has exactly the same information at any given time. Imagine having two nodes with purchase orders from your eCommerce site. If there is no data and Bigdata 209 consistency amongst them and they're acting as a unique cluster, the moment your client app queries the outdated node, it might show you missing transactions. And if the code is not only showing, but also making calculations based on that data, the results might be disastrous.
So consistency is definitely an important characteristic of any distributed NoSQL database. However, not all of them can provide it. So what do they do instead? They go for something called "eventual consistency." Meaning that while at one point the cluster may not be consistent, it will eventually be so. This helps in making sure that you don't get the types of problems I mentioned before.
Availability
Availability stands for "high availability" or in other words the ability of the database to always be available, no matter what happens. This is not the same and should not be confused with "fault tolerance" however. A highly available database is usually one that has replicas in multiple geographical zones, that way if there is a big network outage, it'll still be accessible through one of its other replicas. For example, a system that's only installed and working on one of our servers can't be highly available because the moment that server fails, we'll lose our database.
Partitioning
Partitioning stands for "partitioning tolerance" or in other words, having the ability to support broken links within the cluster in which the database is distributed. Think about a graph representing your database cluster. You have multiple nodes sharing data and working wonderfully and suddenly there is a problem and a section of that cluster fails. If the database is "partition tolerant" it'll still work despite the sudden lack of some of its nodes.
Comments
Post a Comment