Summary: Explains the concepts of the GigaSpaces In-Memory Data Grid (the Space), how to access it, and how to configure advanced capabilities, such as persistency, eviction, etc.
Overview
This section describes the GigaSpaces In-Memory Data Grid (IMDG or the Space) implementation, also known as the Space. The Space enables your application to read data from it, and write data to it in various ways. It also deals with various configuration aspects, such as space topologies, persistency to an external data source and memory management facilities.
Key Use Cases for the Space
The Space as the System of Record
One of the unique concepts of GigaSpaces is that its In-Memory Data Grid serves as the system of record for your application. This means that all or major parts of your application's data are stored in the Space and your data access layer interacts with it via the various Space APIs. This allows for ultra-fast read and write performance, while still maintaining a high level of reliability and fault tolerance. Reliability and fault tolerance are maintained via data replication to peer space instances in the cluster, and eventual persistency to a relational database if needed.
The Space as a cache
GigaSpaces IMDG supports a variety of caching scenarios. Using GigaSpaces IMDG as a cache or as a system of record provides the following benefits:
Low latency: In-Memory Data access time without any disk usage.
Data access layer elasticity: Scale out/up on demand to leverage additional machine resources.
Less load on the database layer: Since the cache will offload the database, you will have less contention generated at the database layer.
Continuous High-Availability: Zero downtime of your data access layer with the ability to survive system failures without any data loss.
The Caching Scenarios describes the different caching options supported by GigaSpaces.
Characteristics of a Space
The space has a number of determining characteristics that should be configured when it is created, as described below:
The Space Clustering Topology
The Space can have a single instance, in which case it runs on a single Virtual Machine (VM), or multiple instances, in which case it can run on multiple VMs. When it has multiple instances, the Space can run in a number of topologies which determine how the data is distributed across those VMs. In general, the data can be either replicated, which means it resides on all of the VMs in the cluster, or partitioned, which means that the data is distributed across all of the VMs, each containing a different subset of it. With a partitioned topology you can also assign one or more backup space instances for each partition.
Master-Local Space
Regardless of the Space's topology, you can also define a Local Cache for space clients. The Local Cache caches space entries recently used by the client, or a predefined subset of the central space's data (often referred to as a Continuous Query). The data cached on the client side is kept up to date by the server. If Space client A changes a Space entry that resides in a client B's local cache, the Space makes sure to update client B's cache.
The Replication Mode
When running multiple space instances, in many cases the data is replicated from one Space instance to another. This can happen in a replicated topology (in which case every change to the data is replicated to all of the space instances that belong to the space) or in a partitioned topology (in this case you choose to have backups for each partition). There are two replication modes; synchronous and asynchronous. With synchronous replication, data is replicated to the target instance as it is written. So the client code which writes, updates or deletes data, waits until replication to the target is completed. With asynchronous replication, replication is done in a separate thread, and the calling client does not wait for the replication to complete.
Persistency Configuration
The Space is an In-Memory Data Grid. As such its capacity is limited to the sum of the memory capacity of all the VMs on which the space instances run. In many cases, you have to deal with larger portions of data, or load a subset of a larger data set, which resides in an external data source such as a relational database, into the space. The space supports many persistency options, allowing you to easily configure how it interacts with an external relational database, or a more exotic source of data. It supports the following options, from which you can choose:
Cache warm-up: load data from an external data source on startup.
Cache read through: read data from the external data source when it is not found in the space.
Cache write through: write data to the external data source when it is written to the space.
Cache write behind (also known as asynchronous persistency): write data to the external data source asynchronously (yet reliably) to avoid the performance penalty.
Eviction Policy and Memory Management
Since the Space is memory-based, it is essential to verify that it does not overflow and crash. The Space's memory can be managed and a memory overflow prevented by:
Eviction policy. The space supports two eviction policies: ALL_IN_CACHE and LRU (Least Recently Used). With LRU, the Space starts to evict the least used entries when it becomes full. The ALL_IN_CACHE policy never evicts anything from the Space.
Memory manager. The memory manager allows you to define numerous thresholds that control when entries are evicted (in case you use LRU), or when the space simply blocks clients from adding data to it. Combined, these two facilities enable better control of your environment and ensure that the memory of the Space instances in your cluster do not overflow.
Reactive Programming
GigaSpaces and its Space-Based-Architecture embrace the reactive programming approach. Reactive programming with GigaSpaces includes:
The Space supports a number of APIs to allow for maximum flexibility to Space clients when accessing the Space:
The core Space API, which is the most recommended, allows you to read objects from the Space based on various criteria, write objects to it, remove objects from it and get notified about changes made to objects. This API supports transactions.
Accessing the Space from Other Languages The code space API is also supported in Java and C++. This allows clients to access the space via these languages. It also supports interoperability between languages, so in effect you can write an object to the space using one language, say C++, and read it with another, say Java
The Document API allows you to develop your application in a schema-less manner. Using map-like objects, you can add attributes to data types in runtime.
Services on Top of the Space
Building on top of the core API, the Space also provides higher level services onto the application. These services, along with the space's basic capabilities, provide the full stack of middleware features that you can build your application with. The Task Execution API allows you send your code to the space and execute it on one or more nodes in parallel, accessing the space data on each node locally. Event containers use the core API's operations and abstract your code from all the low level details involved in handling the event, such as event registration with the space, transaction initiation, etc. This has the benefit of abstracting your code from the lower level API and allows it to focus on your business logic and the application behavior. Space-Based Remoting allows you to use the space's messaging and code execution capabilities to enable application clients to invoke space side services transparently, using an application specific interface. Using the space as the transport mechanism for the remote calls, allows for location transparency, high availability and parallel execution of the calls, without changing the client code.
The Space as the Foundation for Space-Based Architecture
Besides its ability to function as an in-memory data grid, the Space's core features and the services on top of it, form the foundation for Space-Based Architecture (SBA). By using SBA, you can gain performance and scalability benefits not available with traditional tier-based architectures, even when these include an in-memory data grid, such as the Space. The basic unit of scalability in SBA is the processing unit. The Space can be embedded into the processing unit, or accessed remotely from it. When embedded into the processing unit, local services, such as event handler and service exposed remotely over the Space, can interact with the local space instance to achieve unparalleled performance and scalability. The Space's built-in support for data partitioning is used to distribute the data and processing across the nodes, and for scaling the application.
What's Next
It is recommended that you read the following sections next:
Client Side Caching — A client application may run a local cache (near cache), which caches data in the client application's local memory. Gigaspaces provides two options for interacting with a client-side cache: local cache and local view. Both the local cache and local view allow the client application to cache specific or recently used data within client JVM. The data is also updated automatically by the space when necessary. The local cache is ideal for situations where higher flexibility is required, while the local view is designed for more rigid and predefined, static data.
Persistency — Using the GigaSpaces External Data Source interface to persist data stored in the space
Modeling your Data — How to model application data for in-memory data grid
Object Entries — Understanding the semantics of Space Entries and .NET Objects