Summary: Setting cache policy, memory usage and rules for exceeding physical memory capacity.

Overview

The Memory Management facility is used to assist the client avoiding situation where a space server will not get into an out of memory failure scenario. Based on the configured cache policy the memory manager protect the space from consuming memory beyond the defined threshold.

The client/Application is expected to have its own business logic handling method for the memory shortage, in case of ignoring the memory shortage exception the space server may eventually exhaust all available memory.

Memory management can be achieved in three main ways:

  • Setting a cache policy - using the ALL IN CACHE management policy or the Least Recently Used (LRU) management policy.
  • Memory usage - using the space-config.engine.memory_usage properties, which provides options for controlling the space memory utilization and allows you to evict Entries from the space.
  • Exceeding Physical Memory Capacity - using an LRU Based Persistent Space or a Cluster of In-Memory Spaces with Hash-Based Load-Balancing Policy.
    The space includes a dedicated thread that is responsible for clearing expired Entries - the lease manager. For more details, refer to the Lease Manager section.

Cache Policy

The space supports two cache management policies (0 - LRU POLICY, 1 - ALL IN CACHE ) defined via the following property:

space-config.engine.cache_policy
  • All IN CACHE (1) - the space uses only the available physical memory. In a persistent space, the memory is backed with the underlying database but the overall capacity of the space does not exceed the capacity of the available physical memory.
    When using the All IN CACHE, the cache size parameter is ignored.
  • Least Recently Used (0) - the space evicts the "oldest" Entries from its memory. "Oldest" Entries are determined by the time they were written or updated to the space. In a persistent space mode, evicting an Entry means that an Entry would simply be removed from the space memory but would still be available through the underlying RDBMS. The space reloads this Entry back into the space memory only if it was requested by a specific read operation.
    Selecting the latter option makes the Cache Manager Size parameter irrelevant, because it indicates that all Entries should be saved in memory, regardless of their total size.

The space memory manager using a dedicated thread called Evictor - this thread handles the eviction of entries and identifying memory shortage event.

In general, Eviction can be done using:

  • Amount of max entries - evicts entries one by one. Not using batches. Very moderate mechanism. Turned on by default when running in LRU mode.
  • Available memory - eviction done in batches. Very sensitive mechanism. Optional when using LRU.

Evicting an entry from the space requires the space engine to lock the LRU chain during the entry removal and to update the relevant indexes. This means the eviction based on Available memory that is done in batches, might impact the space responsiveness to client requests. Still , you might need to use this in case you can't estimate the amount of entries within the space.

Monitoring the Memory Manager

You can monitor the memory manager activity by moving the com.gigaspaces.core.memorymanager logging entry to ALL.
It will display log entries when evicting entries (start , during and when completing the eviction cycle) and when waiting for incoming activities.

Defining Cache Size

When a persistent space (using the JDBC SA or an indexed file) is using LRU cache policy and the space has been restarted, it loads data from the underlying durable data source (RDBMS, indexed file) before being available for clients to access.

The default behavior is to load data up to 50% of the space-config.engine.cache_size value.

When the space-config.engine.memory_usage is true (evicting data from the space based on free heap size), is it recommended to have a large value for the space-config.engine.cache_size property. This instructs the space engine to ignore the amount of Entries inside the space when launching the eviction mechanism. This ensures that the eviction is based only on heap size free memory.

The combination of the above (large space-config.engine.cache_size and space restart) may lead to out of memory problems. To avoid this, configure the space-config.engine.initial_load to have a low value (5 below means 5% of the space-config.engine.cache_size - default is 50%):

space-config.engine.initial_load=5

The space-config.engine.initial_load_class property can be used to specify which class(s) to load its data.

How the LRU Eviction Works?

The LRU eviction has 2 eviction strategies:
1. Based on maximum amount of entries within the space - provides VERY deterministic behavior of the garbage collection and memory used and space responsively. With reasonable client request rate this would provide very constant behavior without client hiccups when memory reclaimed. This is running by default when having LRU cache policy. In order to turn it off you should have very large number for the cache size property.

This activity checks the amount of entries within the space and evicts relevant entry. One entry is evicted when reaching max amount of entries. This is called when:

  • Writing new entry to the space
  • Transaction is committed or roll backed.

2. Based on the amount of available memory the JVM hosting the space has - need some tuning to provide deterministic behavior. This is turned on when the space-config.engine.memory_usage.enabled value is true. Very complex to use when having multiple spaces running within the same JVM (GSC).

The LRU eviction based on amount of available memory performs the following:
Check used memory. If not breached the space-config.engine.memory_usage.high_watermark_percentage exits loop, if yes , starts the eviction cycle.

Start loop

    1. Evicts a batch - this release entries from the space.
    2. Entries evicted? If no - exit loop
    3. Wait for the GC to reclaim memory. As much as more entries will be evicted in one batch, it will take more time to reclaim the memory. This phase added starting with GigaSpaces 6.5. It is configured using the <retry_yield_time> parameter.
    4. Check used memory. See below calculation.
    5. If amount of memory used breached low watermark percentage exit loop

End loop

The Wait to GC to reclaim memory step has been added recently to avoid the problem of evicting too many entries. The problem manifest itself when the Check used memory phase is called where the memory of evicted objects has not been reclaimed causing the JVM to return wrong result for the used memory. The used memory rate calculated via:

Used_memory_rate = (Runtime.totalMemory() - Runtime.freeMemory() * 100.0) / Runtime.maxMemory()

Memory Usage

The space-config.engine.memory_usage properties provides options for controlling the space memory utilization and allows you to evict Entries from the space. Entries are evicted when the number of cached Entries reaches its maximum size or the memory use reaches its limit.
These are the default parameters given for memory usage. They should be in the following order:
high_watermark_percentage >= write_only_block_percentage >= write_only_check_percentage >= low_watermark_percentage

space-config.engine.cache_policy=0
space-config.engine.cache_size=5000000
space-config.engine.memory_usage.enabled=true
space-config.engine.memory_usage.high_watermark_percentage=95
space-config.engine.memory_usage.write_only_block_percentage=85
space-config.engine.memory_usage.write_only_check_percentage=76
space-config.engine.memory_usage.low_watermark_percentage=75
space-config.engine.memory_usage.eviction_batch_size=500
space-config.engine.memory_usage.retry_count=5
space-config.engine.memory_usage.explicit-gc=false
space-config.engine.memory_usage.retry_yield_time=2000

The space-config.engine.memory_usage.enabled default value is true in GigaSpaces version 6.5 and onwards.

MemoryShortageException

The com.j_spaces.core.MemoryShortageException is thrown when:

  • There are no more retries to evict and the used amount of memory is above the high_watermark_percentage for non write-type operations.
  • There are no more retries to evict and the used amount of memory is between write_only_block_percentage and high_watermark_percentage for a write-type operation.

The com.j_spaces.core.MemoryShortageException includes information about:

  • Space host name
  • Space container name
  • Space name
  • Total available memory
  • Total used memory

Here is an example for the MemoryShortageException message:

Memory shortage at: host: pc-lab38, container: mySpace_container1_1, space mySpace, total memory: 1820 mb, used memory: 1283 mb

explicit-gc

The memory manger has very delicate feature - <explicit-gc>. When enabled, this performs an explicit GC call before checking how much memory is used. When turned on - this will block clients from accessing the space during the GC activity. This can cause a domino affect, resulting unneeded failover or client total hang. The problem would be sever with clustered environment where both primary and backup space JVM calling GC explicitly in the same time, holding back the primary from both serving the client and sending operations to the backup.

With small value for the retry_yield_time or when the explicit-gc is turned off (false as a value), the space might evict most of its data once the write_only_block_percentage or the high_watermark_percentage is breached.

Since the space does perform garbage collection between each batch eviction cycle, the evicted object data is not reclaimed immediately by the garbage collection, and the space continuous to evict Entries.

When using the explicit-gc option:

  • Make sure -XX:+DisableExplicitGC isn't set.
  • Adding -XX:+ExplicitGCInvokesConcurrent might help to reduce the impact of the System.gc() calls.
  • System.gc() is called before calculating available memory.
  • Calculating available memory is performed when the following operations are called:
    • abort
    • changeReplicationState
    • clear
    • commit
    • count
    • getReplicationStatus
    • getRuntimeInfo
    • getSpacePump
    • getTemplatesInfo
    • joinReplicationGroup
    • leaveReplicationGroup
    • notify
    • prepare
    • prepareAndCommit
    • read
    • readMultiple
    • replace
    • spaceCopy
    • update
    • updateMultiple
    • write

GC Behavior and Space Response Time Tango

In general, when the GC is called there is a chance clients accessing the space will be affected.
If the JVM is not using incremental GC mode (i.e. regular behavior) , the GC will have the famous chain saw behavior. Rapid memory reclaim of the recent evicted/referenced objects. This means a quick garbage collection with potentially having delays at the client side or phantom OOME in case the JVM has not managed to evict fast enough.

See below regular GC behavior when eviction is going on (based on available memory) and new entries are written into the space:

The Incremental GC behavior will have more moderate activity with on going garbage collection without the risk missing a garbage collection and getting OOME - see below behavior when eviction is going on (based on available memory) and new entries are written into the space:

When the LRU eviction is based on maximum amount of entries the memory utilization graph would look like this - very small amplitude.

This behavior achieved since the memory manager evicts entries one by one from the space and not in batches. So the amount of work the JVM garbage collector needs to perform is relatively small. This also does not affect the clients communicating with the space and provide very deterministic response time - i.e. very small chance for client hiccup.

If you can estimate the amount of entries the space will hold and use the eviction based on max entries within the space, this would allow you to eliminate hiccups and provide very deterministic and constant response time.

Memory Manager Activity when initializing the space

In this phase of the space life cycle, the space checks for the amount of available memory. This is relevant when the space perform warm start such as ExternalDataSource.initialLoad() or persistent space using SA with RDBMS or embedded H2 database.

Memory Manager and Transient Entries

When using transient entries:

  • Transient Entries are included in the free heap size calculation.
  • Transient Entries are included in the count of total objects (for max cache size).
  • Transient Entries are not evicted when running in LRU cache policy mode.

How can I Get Deterministic Behavior During Eviction?

In order to have deterministic behavior of of the memory manager when evicting entries based on amount of free memory in such a way that it will:

  • Won't evict too much entries
  • Will not consume too much time when reclaiming released entries memory
  • Will have minimum impact of client response time

You should:

  • Have small eviction batch size - A very good rule of the thumb should be: the amount of new objects added to the space per second * 2. For example: if the clients adding 1000 new objects to the space per sec , and we have 2 partitions the batch size should be 1000.
  • Have sensible time allowing GC to reclaim the evicted entries - A very good rule of the thumb should be 2 seconds for 1000 entries for 5 K entries. Needless to say the CPU speed has an affect here. The recommendation here is good for 2 MHz Intel CPU.
  • Limit the amount of entries within the space using the space-config.engine.cache_size parameter - This will make sure the space will not miss garbage collection. Have some reasonable number here as protection mechanism.
  • Have small amplitude between the high and low watermark percentage - Remember that with 2G Xmx every 1% percent means 20M of memory. Reclaiming such amount of memory takes 1-2 seconds.

Here are good settings for a JVM with 2G with 5 K object size. With the following settings eviction will happen once the JVM will consume more than 1.4 G.

space-config.engine.cache_policy=0
space-config.engine.cache_size=200000
space-config.engine.memory_usage.enabled=true
space-config.engine.memory_usage.high_watermark_percentage=70
space-config.engine.memory_usage.write_only_block_percentage=68
space-config.engine.memory_usage.write_only_check_percentage=65
space-config.engine.memory_usage.low_watermark_percentage=60
space-config.engine.memory_usage.eviction_batch_size=2000
space-config.engine.memory_usage.retry_count=100
space-config.engine.memory_usage.explicit-gc=false
space-config.engine.memory_usage.retry_yield_time=4000

Here are the JAVA_OPTIONS (using incremental GC) to use for the JVM running the Space/GSC:

-Xmx2g -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing 
-XX:CMSIncrementalDutyCycleMin=10 -XX:CMSIncrementalDutyCycle=50 -XX:ParallelGCThreads=8
-XX:+UseParNewGC -Xmn5m -XX:MaxGCPauseMillis=1000 -XX:GCTimeRatio=4 -XX:+DisableExplicitGC "

Memory Manager Parameters

Property Description Default value
space-config.engine.cache_size Defines the maximum size of the cache. 100000
space-config.engine.memory_usage.high_watermark_percentage Specifies a maximum threshold for memory use. If the space container's memory usage exceeds this threshold, a com.j_spaces.core.MemoryShortageException is thrown. 95
space-config.engine.memory_usage.low_watermark_percentage Specifies the recommended lower threshold for the JVM heap size that should be occupied by the space container. When the system reaches the high_watermark_percentage, it evicts entries on an LRU basis, and attempts to reach this low_watermark_percentage. This process continues until there are no more Entries to be evicted, or memory use reaches the low_watermark_percentage. 75
space-config.engine.memory_usage.eviction_batch_size Specifies the amount of Entries to evict each time. This option is relevant only in LRU cache management policy. 500
space-config.engine.memory_usage.write_only_block_percentage Specifies a lower threshold for blocking write-type operations. Above this level only read/take operations are allowed. 85
space-config.engine.memory_usage.write_only_check_percentage Specifies an upper threshold for checking only write-type operations. Above this level all operations are checked. 76
space-config.engine.memory_usage.retry_count Number of retries to lower the memory level below the Low_watermark_percentage. If after all retries the memory level is still above write_only_block_percentage, a com.j_spaces.core.MemoryShortageException is thrown for that write request. 5
space-config.engine.memory_usage.explicit-gc If true, the garbage collector is called explicitly before trying to evict.

When using the LRU cache policy, explicit-gc=false means that the garbage collector might evict less Entries than the defined minimum (low watermark percentage). This tag is false by default because setting the garbage collector explicitly consumes a large amount of CPU, thus effecting performance. Therefore, it is recommended to define true only if you want to ensure that the minimum amount of Entries are evicted from the space (and not less than the minumum).
false

A MemoryShortageException is only thrown when the JVM garbage collection and the eviction mechanism do not evict enough memory. This can happen if the low_watermark_percentage value is too high.

When a persistent space (using External Data Source , or using the JDBC SA) running in LRU Cache policy mode is started, it loads data from the underlying durable data source before being available for clients to access. The default behavior is to load data up to 50% of the space-config.engine.cache_size amount of objects.

When space-config.engine.memory_usage.enabled=true (evicting data from the space is based also on free heap size), you might want to have large value for the space-config.engine.cache_size property. This essentially instructs the space engine to ignore the amount of space objects when trigering the eviction mechanism. This ensures that the eviction is based only on JVM heap size free memory.

The combination of the above (large space-config.engine.cache_size and space restart) might lead to out-of-memory problems when the space starts. To avoid this problem, configure space-config.engine.initial_load to a low value:
XPATH property:
space-config.engine.initial_load=5

Exceeding Physical Memory Capacity

The overall space capacity is not necessarily limited to the capacity of its physical memory.
Currently there are two options for exceeding this limit, detailed below.

  • Using an LRU Based Persistent Space - in this mode, all the space data is kept in the RDBMS and therefore the space capacity is dependent on the RDBMS capacity rather than the memory capacity. The space would maintain in memory a partial image of the persistent view on an LRU basis.
  • Using Cluster of In-Memory Spaces with Hash-Based Load-Balancing Policy - in this mode, the space utilizes the physical memory of multiple machines using the space clustering mechanism.
    This means the application using the space would be able to access all the space instances transparently as if they were a single space with higher memory capacity.
    The clustered space provides means for virtualizing several physical space instances.
    With the hash-based load-balancing policy, the space proxy would multiplex the space operations (write, read, take, etc.) between those physical space entities. This means that each space entity would store only part of the information based on the Entry hash code. The space proxy would be able to fetch the Entry from the appropriate space instance through the same hash-based mechanism transparent to the application.
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence