Summary: This section describes how to control the recovery process, in which a newly instantiated backup space recovers all the data from its primary space.

Overview

The recovery process is initiated when a space is started. Transient spaces in a replication group can be configured to pull their content from another space in the group after they initialize. This is achieved by setting the recovery property to true in the replication group settings. Persistent spaces do not pull all of the source space's data, but only the changes that took place in the time they were offline (the source space maintains a redo log for every target space).

Recovery involves two types of data to recover:

  • Space objects
  • Notify registrations

Space object recovery involves two phases: a snapshot phase and a completion phase. During the snapshot phase, all source space objects are sent to the target space in batches. The completion phase plays back the accumulated operations conducted during the space objects recovery phase.

During the space objects recovery phase, the space logs each batch replication event. Once the recovery process is complete, a full report including the total amount of recovered space objects and notify registrations, and their class types, is logged. During the recovery phase, the source space is available, and the target space is unavailable to clients.

The recovery property can specify that the space should restore its content from one of the members in its replication group, or from a specific space (the "recovery source"). If there is no available source space to recover from, an error is logged, and recovery is not performed.

Important notes:

  • Replication input filter events are called during recovery (into the target space).
  • Space filter events are not called during recovery.
  • The restarted space locates a space to recover from using the Jini Lookup Service - each replication group has a unique name. The source space looks for a matching space with the same replication group to recover from.
  • Partial recovery - the restarted space recovers only classes with the @SpaceClass (replicate=true) decoration (turned on only when partial replication is enabled).

The Recovery Process

When a primary space identifies that its replica backup space is not available, or when a memory recovery is running, it starts to accumulate the operations in its redo log.

The redo-log entry includes:

  • 3 long data types
  • 3 int data types
  • Class name (string data type)
  • UID (string data type)

Accumulating primary space destructive operations in the redo log, due to replication channel disconnection, replica space shutdown, or total failure, are handled in a similar manner. This will be changed in future versions.

When the primary space identifies that its backup replica space becomes available (replication connection is re-established):

  • The primary space clears its redo log. With persistent spaces, this phase does not occur if the backup has data in its database.
  • The primary space sends a full image of its content to the backup (memory recovery). With persistent spaces, this phase does not occur if the backup has data in its database. Starting with GigaSpaces 6.5, multiple threads are responsible for sending the content of the primary space to the replica.
  • The primary space accumulates incoming operations within its redo log, and sends these to the backup once the memory recovery step is completed.
  • The replica space includes a mechanism that ensures that no duplicate operations are conducted.

With in-memory spaces, the redo log is essentially relevant only for the duration of the recovery time.

This means that in some cases, you can safely limit the redo log capacity, since memory recovery time is relatively small, and there should not be many elements in the redo log during this time period. Over the LAN, recovery can replicate 50K/second redo log space objects with a size of 1K, when having multiple replication threads (5 threads).

GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence