Summary: GigaSpaces's persistency approach consists of several paradigms for data persistency, according to the application needs. This section gives a basic overview of each paradigm.

Persisting Space Data into Permanent Storage

There are many situations where space data needs to be persisted to permanent storage and retrieved from it. For example:

  • A client process works primarily with the memory space for temporary storage of process data structures, and the permanent storage is used to extend or back up the physical memory of the process running the space. In case the data in the space becomes unavailable due to cache eviction, for example, the backup data in permanent storage can be accessed.
  • A client process works primarily with the database storage and the space is used to make read processing more efficient. Since database access is expensive, the data read from the database is cached in the space, where it is available for subsequently fast read operations. (This is using the space as a side cache.)
  • When a space is restarted, data from its persistent media files can be loaded into the space to speed up incoming query processing.

Bridging the Gap Between Object to Relational

Object-oriented development dominates the enterprise, and most client applications today are written in the Java, C#, and C++ languages. However, the majority of business-critical data is stored in relational database management systems (RDBMS) or similar systems that use record-based (non object-oriented) storage, whose data is read by query-based search schemes.

Because of this mismatch, an intermediate object-relational mapping (ORM) step is required to perform translation of objects to records when writing data to a database, and translation of records to objects when reading data from a database. This intermediate step is implemented in middleware that is detached from and transparent to the client application. Client calls to standard API read and write methods trigger the middleware functionality without a need for the client to intervene. Advanced middleware systems permit the client API to formulate and pass a database query for use when reading from the database.

The Hibernate library, an ORM persistence and query service for the Java language, can provide this service for RDBMS. Hibernate allows you to express queries in its own portable SQL extension (HQL), as well as in native SQL. However, Hibernate is restricted to run at the client level, and does not relate to read/write-through caching.

Migrating Legacy Hibernate API Applications to GigaSpaces API

To benefit from data caching and other capabilities, it is worthwhile to migrate a legacy application that uses the Hibernate API, to the GigaSpace or GigaMap API. In such cases, these applications can benefit from the ability to scale when using the GigaSpaces Data Grid. This is achieved by partitioning the data across different spaces running on different machines, and having the business logic colocated with each partition. This allows the space and the business logic to run in same memory address, eliminating remote calls when accessing the data.

The following tables show the correspondence between the Hibernate basic API methods to GigaSpaces API and the GigaMap API methods.

org.hibernate.Session Method GigaSpace Method GigaMap Method
save write put
persist write put
delete clear remove
update write put
merge write put
saveOrUpdate write put
replicate write put
get read, readByID get
load read, readByID get
createSQLQuery readByIDs, readMultiple(SQLQuery) , Not supported

The Moving from Hibernate to Space best practice includes step by step instructions for moving from Hibernate based application to GigaSpaces Data-Grid as the data access layer. This use Hibernate as the space persistency layer using write-through approach when pushing updates into the database.

The space can be used as a Hibernate second level cache.

Caching policies and Space Persistency

Space Persistency supports the All In Cache and LRU Cache policies.

All In Cache Policy

With the All In Cache policy, the assumption is the Space holds the entire data in memory. In this case, the space communicated with the data source at startup, and loads all the data. If data within the space is updated/added/removed, the space is calling the SpaceSynchronizationEndpoint implementation to update the underlying data source. All data activities leveraging the data in memory.

LRU Cache Policy - Read-Ahead

LRU persistency model is based on the eviction model: Some of the data stored In-Memory (based on auto expiration mechanism or explicit data eviction) and ALL the data stored on disk where the preferred disk media is a database. You may leverage Hibernate as the mapping layer when data is persist or have a custom persistency mapping implemented leveraging the Space Data Source API.

GigaSpaces do not support the overflow model when persisting data since it may lead to inconsistency situations.
.
Using a database to store the data allows you to:

  • Reload it very fast into the space with plenty of flexibility to customize the load activity.
  • Allows the system to query the database when needed.

Database technology has proven itself to be able to store vast amount of data very efficiently with very good high-availability. You may use RDBMS SQL databases (mySQL, Oracle, Sybase, DB2) or NoSQL databases (MongoDB , MarkLogic, AllegroGraph) as the space persistency layer.

When using NoSQL databases you may also leverage GigaSpaces Document API support to map complex data structure into a document data store model.

With the LRU policy, the assumption is that some of the data (recently used) is stored in memory. The amount of data stored in memory is limited by the cache size parameter, the memory usage watermark threshold parameters and available free GSC JVM heap size. In this case, once the space is started is loads data up 50% (you may tune this value) of the defined cache max size (total of objects per partition).

If data within the space is updated/added/removed, the space is calling the SpaceSynchronizationEndpoint implementation to update the underlying data source. When performing read operations for a single object (read/readById/readIfExists) and no matching object is found in-memory (cache miss), the SpaceDataSource implementation is called to search for a matching data to be loaded back into the space and from there sent to the client application (read-ahead). If a query is executed (readMultiple), and the max objects to read exceed beyond the amount of matching objects in memory, the SpaceDataSource is called to search for matching data elements to be loaded back into the space and from there sent to the client application. In this case, the client might have in return objects that were originally within the space, and objects that have been read from the data source and loaded into the space as a result of the query operation.

The IMDG with Large Backend Database Support best practice suggest a simple approach you may use to leverage LRU Space with a large database allowing the application to execute queries against the space in an optimal manner.

In both cases (ALL_IN_CACHE and LRU cache policy), you can customize the data load phase to speed up the space initialization phase.

Space Persistency

The space can load data from data sources, store data into data sources, and persist data into a relational data source or any other media via a custom SpaceSynchronizationEndpoint implementation. Space Persistency a built-in implementation using Hibernate, to store data in an existing data source and in the space. Data is loaded from the data source during space initialization (via the SpaceDataSource implementation), and from then onwards the application works with the space directly. Meanwhile, the data source is constantly updated with all the changes made in the space (via the SpaceSynchronizationEndpoint implementation). This is the recommended model.

The Hibernate Space Persistency support RDBMS. The Cassandra Space Persistency allows applications to leverage NoSQL Cassandra DB having a distributed database infrastructure as an alternative to RDBMS.

Section Contents

GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence