Monday, May 11, 2015

NoSQL DB - Managing Consistency, Reliability, Isolation, Concurrency & Replication in Distributed Systems

Consistency, Reliability, Isolation, Concurrency & Replication is very closely related concepts when designing a distributed system. I wanted to give a 1000 ft overview of considerations in distributed database design.
Consistency: Consistency of a transaction is its correctness. A transaction is consistent if it maps one consistent database state to another.[1] Simple ;-)
Following are the three important factors, which interplay with consistency. When you talk about consistency in distributed systems, the impact on reliability, replication and concurrency need to be discussed.

Consistency Buddies:[2][1]
Reliability
Concurrency
Replication


Isolation: Isolation specifies the level of correctness/ consistency.

SQL-92 has defined a set of isolation levels:
Read Uncommitted (Dirty Reads) – Allows a transaction to read uncommitted data by another transaction. [1]
Read Committed (Non-Repeatable Reads) – Allows a transaction to update value read by another open transaction. So when a transaction does two reads on the same data, it returns different results [1]
Repeatable Reads (Phantom reads) – Allows a transaction to add/delete record in a dataset/table while another transaction is doing query on that dataset/table. So if a transaction does two searches, the search results would be different.[1]
Serializable – All transactions are executed one after another [1]

Other Isolation Levels:
Snapshot Read  - Provides repeatable reads but not serializable read. Each transaction sees a snapshot of database when it starts and its reads and writes are performed on this snapshot. [1]
Read Atomic – New ! – Provides repeatable reads but not serializable read. All or none of transaction updates are visible.[3]

Reliability:
Ability to maintain consistency in the event of failures defines reliability of the system.
Types of failures: Machine failure, network failure, runtime failure,
CAP theorem:
C= Consistency
A = Availability
P = Partition

Partition refers to cluster failure. In case of distributed system, it is assumed that some servers will fail or network equipment will fail. System reliability is an issue. So partition is assumed. In case of partition, user has to choose between consistency and availability.
Consistency refers to the data that is read by the client. If user chooses consistency, then they will have lack of availability of data. So user will get message that the system is unavailable to process a request. Eg. In RTB, amount spend on a campaign would definitely prefer consistency over availability.
If user chooses availability, the user does not care of exactness of data. Use case of online reporting would prefer availability to consistency since the system can show data from an hour old record and its fine.

Non-blocking termination protocol -> a protocol is non-blocking if it permits a transaction to terminate at the operational sites without waiting for the recovery of the failed site.[1]

2-phase commit – It is a simple and elegant protocol to establish reliability in distributed systems. It extends the effects of local atomic commit actions to distributed transactions by insisting that all sites involved in the execution of a distributed transaction agree to commit the transaction before its effects are made permanent[1]


Concurrency:
The main concern in designing a concurrency control algorithm is to correctly process transactions that are in conflict. [3]
 The following shows classification of concurrency control approaches [4]



The most common classification criterion, however, is the synchronization prim- itive. The corresponding breakdown of the concurrency control algorithms results in two classes [Bernstein and Goodman, 1981]: those algorithms that are based on mutually exclusive access to shared data (locking), and those that attempt to order the execution of the transactions according to a set of rules (protocols). However, these primitives may be used in algorithms with two different viewpoints: the pessimistic view that many transactions will conflict with each other, or the optimistic view that not too many transactions will conflict with one another.[1]
Locking based concurrency: One common implementation to enforce lock-based consistency is using 2-phase locking. [1]
Timestamp based concurrency: This is implemented by assigning a unique timestamp to each client action. [1]
The basic method involves having time synchronized by message passing
The multi version concurrency control allows the read to take place on the record with latest timestamp. The transactions involving write operations are queued based on the timestamp of arrival. However reads are never blocked. This concurrency mechanism is very useful for read heavy transactions.[1]

Timestamp based optimistic concurrency control delay the validation of the transaction until the final commit. [1]

The disadvantage again of timestamp based validation is that more storage is required to keep all the terminated transactions which were in progress when the particular transaction started.[1]

Replication:[1]
Replication poses significant benefits but its takes effort in keeping all the copies in sync.
Mutual Consistency: It is consistency with respect to replication that can be weak or strong.

Where updates are performed: centralized if update is done on the master copy first or distributed if they allow updates over any replica. Centralized techniques can be further identified as single master or primary copy where the master copy of each data item may be different [1]
Update Propagation: once updates are performed on a replica, the next decision is how updates are propagated to the others. The alternatives are identified as eager or lazy. Eager technique updates all replicas before the transaction commits. Lazy propagation updates replicas after transaction commits. Eager techniques are further identified according to when they push each write to the other replicas. [1]
Four combinations are possible eager centralized, eager distributed, lazy centralized, lazy distributed. [1]         

Mutual Consistency vs Transactional Consistency:
Both are related but different. Mutual consistency refers to replicas converging to the same value while transaction consistency requires that the global execution history be serializable. [1]

1. Principles of Distributed systems by M. Tamer Ozsu, Patrick Valduriez