The core ideas of RYW consistency, as implemented in various NoSQL systems, are:
Let N = the number of copies of each record distributed across nodes of a parallel system.
Let W = the number of nodes that must successfully acknowledge a write for it to be successfully committed. By definition, W <= N.
Let R = the number of nodes that must send back the same value of a unit of data for it to be accepted as read by the system. By definition, R <= N.
The greater N-R and N-W are, the more node or network failures you can typically tolerate without blocking work.
As long as R + W > N, you are assured of RYW consistency.
Example: Let N = 3, W = 2, and R = 2. Suppose you write a record successfully to at least two nodes out of three. Further suppose that you then poll all three of the nodes. Then the only way you can get two values that agree with each other is if at least one of them — and hence both — return the value that was correctly and successfully written to at least two nodes in the first place.
In a conventional parallel DBMS, N = R = W, which is to say N-R = N-W = 0. Thus, a single hardware failure causes data operations to fail too. For some applications — e.g., highly parallel OLTP web apps — that kind of fragility is deemed unacceptable.
On the other hand, if W< N, it is possible to construct edge cases in which two or more consecutive failures cause incorrect data values to actually be returned. So you want to clean up any discrepancies quickly and bring the system back to a consistent state. That is where the idea of eventual consistency comes in, although you definitely can — and in some famous NoSQL implementations actually do — have eventual consistency in a system that is not RYW consistent.
Much technology goes into eventual consistency, as well as into the data distribution and polling in the first place. And in tunable systems, the choices of N, R, and W — perhaps on a “table” by “table” basis — can get pretty interesting. I’m ducking all those subjects for now, however, not least because of how much I still have to learn about them.
One point I will note, however, is this — RYW consistency and table joins make for awkward companions. If you want to join two tables, each of them distributed across some kind of parallel cluster, there are only two possibilities:
- In most cases, the data you need to join is co-located on the same nodes.
- You’re going to have an awful lot of network traffic.
In an R = W = N scenario, co-location may be realistic. But when R < N and W < N, a join can return incorrect results even when both of the tables being joined would have been read correctly.
In our example above, we had N = 3 and R = W = 2. Single-table RYW consistency was ensured. But suppose you join two records, each of which had been written correctly to 2 out of 3 nodes — but with only 1 node being correct about both records. Then only that 1 node out of 3 will return a correct value for the join, and badness will ensue.
Any architecture I can think of to circumvent that problem results in — you guessed it — an awful lot of network traffic.