Summary: External Data Source advanced topics such as advanced operations, tuning, troubleshooting, and limitations.

Overview

This sections covers advanced options related to the External Data Source.

Properties

Here are the External Data Source Properties:

Property Description Default
space-config.external-data-source.data-source-class Data Source interfaces require an implementing class. GigaSpaces provides an implementation that works seamlessly with Hibernate, which is defined using this property. The class name is com.gigaspaces.datasource.hibernate.HibernateDataSource  
space-config.external-data-source.data-class This defines the class of the objects that are passed to the data source. Usually, this attribute shouldn't be changed unless the data source requires that the data objects won't be converted to their original format. Optional Values:java.lang.Object, com.gigaspaces.document.SpaceDocument , com.j_spaces.core.IGSEntry java.lang.Object
space-config.external-data-source.supports-inheritance This attribute indicates whether the data source supports hierarchical queries. If the underlying data source supports sub-class queries (like Hibernate), the External Data Source is invoked only once for the queried class. Otherwise, it is called for the super class, and for each of its sub-classes. true
space-config.external-data-source.usage This attribute specifies how the data source should be used:
Read operations only (read-only), or read and write (read-write). If set to read-only, destructive operations are not delegated to the data source.
read-write
space-config.external-data-source.shared-iterator.enabled This attribute enables shared iterator mode which tries to optimize data source access by sharing the same iterator for the same query operations when possible. true
space-config.external-data-source.shared-iterator.time-to-live This attribute specify for how long in miliseconds an iterator can be shared in shared iterator mode. If two equivalent queries are done concurrently, but the time that elapsed between the first query and the second query exceeds the time to live, the second query will open a new iterator on the data source and will not share the first one. 10000
space-config.external-data-source.init-properties-file This properties file content is passed directly to the data source implementation in the init() method. This file can be used to pass any custom parameters to the data source.  
cluster-config.cache-loader.external-data-source Provides cluster-wide support. true
cluster-config.cache-loader.central-data-source Provides clustered database wide support. true

Troubleshooting

To enable logging for ExternalDataSource, edit the <GigaSpaces Root>\config\gs_logging.properties file, and set the persistent level to CONFIG or FINER.

  • CONFIG messages are intended to provide a variety of static configuration information, and to assist in debugging problems that may be associated with particular configurations.
    com.gigaspaces.persistent.level = CONFIG
  • FINER messages log calls for entering, returning, or throwing an exception to and from the cache interface implementations.
    com.gigaspaces.persistent.level = FINER

Initial Space Load

When the space is started, restarted, or cleaned, the system can initially fill the space with space objects that are likely to be required by the application. You can specify the data to loaded using the ExternalDataSource.initialLoad method that is called once the space is started. See the External Data Source Initial Load for details. The space is not available for clients, until the data load process has been completed.

The Initial Load is supported with the partitioned-sync2backup cluster schema. If you would like to pre-load a clustered space using the Initial-Load without running backups you can use the partitioned-sync2backup and have ZERO as the amount of backups.

Refreshing Space Object when using EDS

In order to refresh a space object, you should remove it from the space using take, takeById , clear operations with EVICT_ONLY modifier, If the operation succeed(the object might be still involved in transaction, or not yet replicated, in those cases you should retry), read it back using the read or readById operations. This in turn will load the latest version of the object from the external data source (i.e. database) via the EDS implementation back into the space. This is only relevant to spaces that do not use ALL_IN_CACHE policy.

Eliminating Resonance Affect when Using Mirror Service

When using the Mirror Service, and the ExternalDataSource is enabled for the space, all data loaded into the space using ExternalDataSource.initialLoad while it is being started, is not replicated back to the Mirror Service.

Count Operation

The scope of IJSpace.count() or GigaSpace.count() and IMap.Size() is the data stored within the space. These methods do not take into account the data stored within the underlying data source.

When using the Map API with a local cache, the value of IMap.Size() is the amount of objects in the local cache (it might be less than what actually exists in the space).

Recursive Calls

The ExternalDataSource implementation should avoid performing space operations to prevent deadlocks and recursive behavior.

UID Generation

The space embeds a unique identifier into each space Object. This ID is used implicitly when performing update operations, and read/take operations based on ID.

When using the External Data Source mechanism, to ensure the consistency of the system, you should construct the Object ID based on some unique value, when writing the object into the space and when loading it from the underlying External Data Source. The ID should be based on some unique value stored within the object, such as the primary key. This unique value is used when generating the ID. If you write an object into the space, or load an object that has a ID that already exists within the space, the operation will be rejected with the exception:com.j_spaces.core.client.EntryAlreadyInSpaceException. When specifying the SpaceId field make sure the auto-generate attribute should be set to false.

Hibernate ID Generation

Hibernate supports multiple ID generators as detailed in Hibernate documentation. Your hibernate mapping file should use algorithm that is appropriate for your use case.

Some generators increase the number of database operations and result into overall adverse performance. You need to watch out for generators + database combinations that automatically disable the batch insert mode transparently as mentioned here.

"Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator."

Using a sequence number increases the database reads on some databases, because Hibernate reads the next sequence number before each new INSERT in the batch. This also disables batch persistence used by GigaSpaces HibernateExternalDataSource.

A better strategy would be to use a dummy generator like "increment" in hibernate mapping file, on the database side define a INSERT trigger on this table to generate a new id using a sequence. You will see orders of magnitude performance improvement in the database operations making this simple change.

Considerations

  • When a space is configured using External Data Source, and a POJO is used as the Space Domain class, it must use the SpaceId(autogenerate=false) decoration.
  • When running in LRU Cache policy the GigaSpace.Count operation using the data within the space only and does not access the External Data Source (database) to return the object count (different than Storage Adapter).
  • When a space is configured using External Data Source only Native serialization mode should be used.
  • RegExQuery is not supported by the DataProvider or the SQLDataProvider. Use the SQLQuery instead.
  • Objects loaded via the ManagedDataSource.initialLoad can be expired using the @SpaceLeaseExpiration annotation. See the POJO Support - Advanced page for more details.
  • When using the Map API, the key must be Serializable.
  • The ExternalDataSource.initialLoad() loads data into partitioned spaces, by reading all the data into the space and filtering it at the space side. To tune this behavior, you should execute the relevant query from the database on the partition ID, to fetch the relevant result set to load into the space. See the External Data Source Initial Load for more details.
  • Hibernate Lazy load is not supported when using the HibernateDataSource as an External Data Source implementation. See Space Object Modeling for more details.
  • When running in ALL_IN_CACHE cache policy mode, optimistic locking is supported - i.e. updates in optimistic locking mode will be rejected, in case the client performs an update with the non-latest version of the entry. The loaded object from the database should include the latest version or the value 1.
  • When running a local cache, the client cache will be updated using an optimistic locking mode - i.e. updates will include the correct version of the entry.
  • Optimistic locking is not supported when running in LRU cache policy mode, in case the loaded object does not include data within the SpaceVersion field.
  • When running in LRU Cache policy the engine.initial_load property should be configured with a small number, to avoid memory shortage in persistent spaces with large data.
  • Optimal number of connection to database would be number of partitions in the cluster.
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence