Summary: This section describes Data-Grid cluster split brain and primary resolution flow.
OverviewSplit brain occurs when there are two or more primary instances running for the same partition. In most cases the reason for such behavior would be a network disruption that does not allow each space instance to communicate with all lookup services running. Usually you will have each space instance communicating with a different lookup service rather all of them (two in most cases). During this split clients may communicate with each primary considering it as its master data copy updating the data within the space. If the active GSM also loosing connectivity with both lookup service it may provision a new backup. Once the connection between all instances of a given partition and all lookup services occurs and a split brain identified (more than a single primary instance identified for a given partition) the system determines which instance will remain the primary and which will turn into a backup instance. The data within the instance that was elected to a backup will be dropped. The primary resolution may involves few steps. Each tries to calculate the most recent primary and its data consistency level. If the first step can't determine who is primary (tie) the second step executed (tie break). If the second step can't determine who is primary the third one is executed (second tie break). Once a primary is elected the other instance moved into a backup mode and recover its entire data from the existing primary.
Primary ResolutionResolution - Step OneEach primary inspected by checking for multiple properties. As a result of this process an inconsistency ranking is calculated. The primary with the lowest ranking will be elected as the primary. If both primaries end up having the same inconsistency level, step two is executed. The inconsistency level calculated using the mirror active primary identity and various replication statistics. Since the mirror will not allow multiple primaries for the same partition it can be useful with the inconsistency level calculation. Resolution - Step TwoEach primary is inspected for the exact time is was elected to be a primary. The election time is stored within the lookup service. All lookup services are inspected during this process. The one which has been elected to be a primary first will be elected to be the primary. If both primary have been elected in the same time - step three is executed. Resolution - Step ThreeThe system reviewing the primary instance names and choosing the one with the lowest lexical value to be the primary. Common Causes For a Split-BrainBelow are the most common causes for Split-Brain scenarios and ways to detect them.
IslandsIn events of network or failures, the system might get into unexpected behavior, also called Islands, which are extreme and need special handling. Here are two islands scenarios you might encounter:
The solution for these scenarios would be to manually reconcile the cluster. Terminate the GSM, with only one remaining managing GSM, restart the GSCs hosting the backup space instances, and as a last step, start the second GSM (will be the backup GSM). |
![]() |
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence |