Summary: Reliable Asynchronous Persistency (Mirror)

Overview

The GigaSpaces Mirror Service (also known as Persistency as a Service, or PaaS) provides reliable asynchronous persistency. This allows you to asynchronously delegate the operations conducted with the In-Memory-Data-Drid (IMDG) into a backend database, significantly reducing the performance overhead.

The Mirror service ensures that data will not be lost in the event of a failure. This way, you can add persistency to your application just by running the Mirror Service, without touching the real-time portion of your application in either configuration or code. This service provides fine-grained control of which object needs to be persisted.

Enabling the Mirror Service involves the following:

  • The Data-Grid Processing Unit Mirror Settings
  • The Mirror Service Processing Unit Settings

The above share the same External Data Source settings but have different space settings. See the Hibernate External Data Source for details how to use the built-in HibernateExternalDataSource.

The Data-Grid Processing Unit

The cluster-config.mirror-service space settings specify the interaction between the IMDG primary spaces and the Mirror Service. The mirror="true" space element tag enables the replication mechanism from the IMDG Primary spaces to the Mirror Service. Once the mirror="true" is specified, all IMDG members will be Mirror aware and will be delegating their activities into the Mirror service. The IMDG primary instance will replicate the operations that have been logged within the primary redo log every interval-millis amount of time or interval-opers amount of operations. Both of these mechanisms are always active and the first one that is breached triggers the replication event.

The IMDG Mirror replication settings includes the following options:

Property Description Default
cluster-config.mirror-service.url used to locate the Mirror Service. In case you change the name of the Mirror Service specified as part of the Mirror PU, you should modify this parameter value to facilitate the correct Mirror service URL. jini://*/mirror-service_container/mirror-service
cluster-config.mirror-service.bulk-size The amount of operations to be transmitted in one bulk (in quantity and not actual memory size) from an active IMDG primary to the Mirror Service. 100
cluster-config.mirror-service.interval-millis The replication frequency – Replication will happen every interval-millis mulliseconds 2000
cluster-config.mirror-service.interval-opers The replication buffer size – Replication will happen every interval-opers operations. 100
cluster-config.groups.group.repl-policy.repl-original-state The replication reconciliation mode – This settings should be enabled to ensure that write/take operations or multiple updates of the same entry will be sent to the mirror and not will be discarded when sent within the same batch. false
The Mirror Service may receive replication events from multiple active primary partitions. Each active partition sends its operations to the Mirror service via dedicated replication channel. The Mirror handles incoming replication requests simultaneously. Every Primary Space sending its operations to the Mirror Service in the same order the operations have been executed allowing the Mirror preserve the consistency of the data within the External Data Source.

The Data-Grid Space settings would look like this:

<os-core:space id="space" url="/./space" schema="persistent" mirror="true" external-data-source="hibernateDataSource">
    <os-core:properties>
        <props>
            <!-- Use ALL IN CACHE - Read Only from the database-->
            <prop key="space-config.engine.cache_policy">1</prop>
            <prop key="space-config.external-data-source.usage">read-only</prop>
            <prop key="cluster-config.cache-loader.external-data-source">true</prop>
            <prop key="cluster-config.cache-loader.central-data-source">true</prop>            
            <prop key="cluster-config.mirror-service.url">jini://*/mirror-service_container/mirror-service</prop>
            <prop key="cluster-config.mirror-service.bulk-size">100</prop>
            <prop key="cluster-config.mirror-service.interval-millis">2000</prop>
            <prop key="cluster-config.mirror-service.interval-opers">100</prop>
            <prop key="cluster-config.groups.group.repl-policy.repl-original-state">true</prop>
        </props>
    </os-core:properties>
</os-core:space>

<bean id="hibernateDataSource" class="org.openspaces.persistency.hibernate.DefaultHibernateExternalDataSource">
    <property name="sessionFactory" ref="sessionFactory"/>
</bean>

The above example:

  • Configures the Space to connect to its mirror Space. By default, it will lookup a mirror Space called mirror-service.
  • Configures the Space to only read data from the external data source (See the space-config.external-data-source.usage property). This means that all destructive operations will be delegated into the database via the Mirror service.
  • Configures the Data-Grid to use an external data source that is central to the cluster. This means that both primary and backup IMDG instances will interact with the same External Data Source.

See the External Data Source Properties and the Hibernate External Data Source for full details about the EDS properties the you may configure.

You must use a Data-Grid cluster schema that includes a backup (i.e. partitioned-sync2backup) when running a Mirror Service. Without having backup, the Primary IMDG Spaces will not replicate their activities to the Mirror Service. For testing purposes, in case you don't want to start backup spaces, you can use the partitioned-sync2backup cluster schema and have 0 as the number of backups - this will still allow the primary spaces to replicate their operations to the Mirror.

The Mirror Processing Unit

The Mirror Service is constructed using the Space tag, however, the Mirror Service itself is not a regular Space. It is dispatching the operations which have been replicated from the IMDG primary spaces to the External Data Source (i.e. Database). The Mirror Service should be constructed as a separate processing unit, which includes only its definition. The following configuration shows how to configure a processing unit, to act as the Mirror Service:

<os-sla:sla number-of-instances="1" />  
<os-core:space id="mirror" url="/./mirror-service" schema="mirror" external-data-source="hibernateDataSource" />

<bean id="hibernateDataSource" class="org.openspaces.persistency.hibernate.DefaultHibernateExternalDataSource">
    <property name="sessionFactory" ref="sessionFactory"/>
</bean>
  • The above configuration constructs a Mirror Service using GigaSpaces built-in Hibernate External Data Source. The hibernateDataSource should have its sessionFactory injected.
  • The name of the Mirror Space is important. The mirror-service is the default name for a mirror Space, which is then used by the IMDG to connect to its mirror.
  • The os-sla definition insures that there will be only one Mirror Service instance running.
  • The configuration above should exist within the mirror PU pu.xml file.

The Mirror PU Content

The Mirror processing unit structure is shown below:

-- example-mirror
------ META-INF
---------- spring
-------------- pu.xml

See the Processing Unit Structure for more information on the processing unit structure.

The relevant Hibernate JAR file and its third party dependencies should be available to the mirror processing unit. The jar files should be placed in the processing unit lib directory.

Optimizing the Mirror Activity

You might want to tune the IMDG and the Mirror activity to push data into the database faster. Here are some recommendations you should consider:

  • Optimize the Space Class structure to include fewer fields. Less fields means less overhead when the IMDG replicates the data to the Mirror Service.
  • Tune the bulk-size, interval-millis and interval-opers to perform the replication in larger batches and less frequently. This means you should increase the bulk-size, interval-millis and interval-opers to have larger values than the defaults. The exact values depends with the network speed, the average size of the objects and the database configuration and machine speed. Here is an example for a configuration that is relevant for IMDG with relatively small objects (less than one K) and high rate of operations (more than 10,000 operations per second for partition):
    <prop key="cluster-config.mirror-service.bulk-size">10000</prop>
    <prop key="cluster-config.mirror-service.interval-millis">5000</prop>
    <prop key="cluster-config.mirror-service.interval-opers">50000</prop>

With the above configuration the primary partition will replicate its redo log activities to the Mirror service every 5 seconds or every 50,000 operations. The replication will occur in batches of 10,000 objects per batch.

  • Tune the External Data Source to commit data into the database in batches.
  • Optimize the database transaction support.
  • Use stateless session with the Hibernate External Data Source configuration. See the StatelessHibernateExternalDataSource.
  • Implement a Mirror Service that will write the incoming data into a CSV file. This should be faster than writing data into the database. Later import the data into the database. (normally very fast operation)
  • Increase the database maximum connections.

Advanced Information & Operations

For more advanced operations and information see Async Persistency - Mirror - Advanced.

GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence