Summary: This page describe what is synchronous replication and how to configure it
OverviewIn a synchronous replication, the client receives acknowledgement for any replicated operations only after all the space instances in the replication group have performed the operation. This topology is most fitting for the Primary backup topology, where the application needs a guarantee that during a failure of the primary space instance, no already executed operations or saved data will be lost. As a result, this replication type has the highest performance penalty because each operation is not completed until all the target space instances in the group have received and acknowledge the operation. How to Turn on Synchronous Replication?In general you should have the cluster-config.groups.group.repl-policy.replication-mode property set to sync. See below example: <os-core:space id="space" url="/./mySpace"> <os-core:properties> <props> <prop key="cluster-config.groups.group.repl-policy.replication-mode">sync</prop> </props> </os-core:properties> </os-core:space> When to Use Synchronous ReplicationSynchronous replication is most beneficial in the following scenarios:
How Synchronous Replication Works
Handling Disconnections and ErrorsWhen a replication target space instance is unavailable (disconnection) or some error occurred during the processing of the replication data at the target, a synchronous replication channel (between the source and the specific target space instance) moves to asynchronous operating state. During that time, all the replicated operations are accumulated at a backlog (named replication redolog) and a special worker attempts to replicate the items from the redolog to the target space instance in batches. This worker will succeed sending the accumulated replication data once the connection is re-established or the error is resolved at the target and once the redolog is replicated, the channel will return to synchronous operation state. During the asynchronous operating state time period, the client will receive acknowledgements for the operations without them being replicated, thus not halting the cluster when a replication target is down.
Behavior During RecoveryIn the previous scenario, a target space instance might become unavailable because it has been restarted or relocated due to various reason (failure, manual/automatic relocation). In the default settings, when that target space instance will restart, it will perform a recovery from a source space instance. In primary backup topology it will be the primary space instance, in active active topology it can be any space instance. During the recovery process the replication channel will operate in asynchronous state until the redolog is replicated as in the above scenario. The target space instance will not be available until the source channel operating state was restored to synchronous, thus making sure that once the target space is available and visible, a backup or other space instance target is fully synchronized with its source. ThrottlingWhen a synchronous replication channel is operating in asynchronous state, a special throttling takes place that will throttle the replicated operation rate to make sure two things:
This throttling can be configured with the following parameters:
To change the default replication settings you should modify the space properties when deployed. You may set these properties via the pu.xml or programmatically. Here is an example how you can set the replication parameters when using the pu.xml: <os-core:space id="space" url="/./mySpace"> <os-core:properties> <props> <prop key="cluster-config.groups.group.repl-policy.sync-replication.throttle-when-inactive">false</prop> </props> </os-core:properties> </os-core:space> Asynchronous Operating State Related ConfigurationWhile the replication channel is operating at asynchronous state due to the reasons mentioned above, the worker that sends the data from the redolog asynchronously is affected by the following configuration (which also relates to asynchronous replication):
Splitting Replication of Large Batches into Smaller BatchesWhen performing batch operations (writeMultiple, takeMultiple, clear), using a synchronous replication mode , the actual data (space objects/UID) is replicated to the target spaces in batches during the single space operation. This is done in order to avoid to issues, one of them is to run out of memory due to all the data that is generated in the redolog for the replication or cause the redolog capacity limitation being breached. For example, when performing the take (clear) operation, you don't necessarily know how many space objects exist in the space, and all of these need to be removed. Therefore, these operations are split into several chunks, thus providing better memory usage, stability, and scalability. Splitting large batches into chunks is defined using the cluster-config.groups.group.repl-policy.sync-replication.multiple-opers-chunk-size parameter. This parameter default value is 10000. This means that by default the operation is performed using chunks of 10000 objects each. To split the replication activity into smaller chunks, you can do so by overriding this property, for instance, using the pu.xml <os-core:space id="space" url="/./mySpace"> <os-core:properties> <props> <prop key="cluster-config.groups.group.repl-policy.sync-replication.multiple-opers-chunk-size">5000</prop> </props> </os-core:properties> </os-core:space>
|
![]() |
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence |