Summary: How space instance recovers its data when first started, relocated or after an unexpected failure.
OverviewRecovery is a process that happens on space instance startup or relocation and used to synchronize the space instance data with another space instance in its replication group. When space instance is the first to start and doesn't have another space instance to recover from - it's data is loaded from the External Data Source if such was defined, otherwise the space will start empty. Terminology
Recovery processRecovery process has two phases: a snapshot phase and a completion phase. Snapshot PhaseAll space objects are copied from to the target to the source in batches. This is done concurrently by multiple threads.
Completion PhaseOperations that were performed on the source space during the snapshot phase are not a part of the recovered snapshot, so they are accumulated in the source space instance redo-log and are sent to the target space once the snapshot phase is finished via replication.
Completion phase is finished according to the consistency requirements of the replication type.
Once the recovery process is complete, a full report including the total amount of recovered space objects and notify registrations, and their class types, is logged.
Primary-Backup TopologyIn primary-backup topologies the recovering space instance is always a backup, primary space instances don't recover their data from other spaces. Primary-Backup Persistent TopologyPrimary and Backup space instances use the same database to stored their data. The space is the system of record. The data is usually persisted through the Mirror service. All In Cache PolicyA backup instance recovers all its data from the primary instance - data is not loaded from the database. This is done so that any data changes on the primary during the recovery process are consistent on the backup once recovery finishes. LRU Cache PolicyA backup instance recovers only transient entries from the primary instance. Data is not loaded from the database. Since primary and backup use the same database instance, the data will be loaded to the backup on demand. Backup Instance Recovery Failure HandlingIf a backup space instance recovery process fails, it is handled in the following way:
Any other failure - SpaceMemoryShortageException, Database not available etc. is retried 3 times before failing. Active-Active TopologyIn active-active topologies the recovering space instance connects to one of the space instances in its replication group and recovers all the data from it. Active-Active Persistent TopologyReplicated space instances keep and manage their data in a separate databases. With this scenario:
For further info and configuration options see Distributed Databases Space Instance Recovery Failure HandlingIf the recovery process fails, it is retried 3 times before failing. Notify-RecoveryWhen a partitioned space is started each partition getting list of existing notify registrations from the other partitions. This is useful when the partition fails and restarted. This avoids the client side to re-issue a new notify registrations against the restarted partition. If you are not using notifications or running a partitioned space with backups you may disable this mechanism using the cluster-config.notify-recovery property (boolean property). This will speed up the space deployment time since when a partition looking for the other partitions and these have not been started yet, it might take some time for the entire clustered space to be fully available. By default this property is enabled (true). |
![]() |
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence |