Summary: Instructions and best practices for tuning large-scale deployments.

Large Cluster Considerations

When designing a large cluster, there are several things that need to be taken into account to assure that the cluster will be able to handle heavy loads, and perform quickly and stably.

When speaking of a larger cluster, we are referring to more than few hundreds. If this is the size of cluster you intend to build, the following considerations are relevant for you.

Unregistering Spaces "Disappear" from LUS

This occurs when a large amount of memory is consumed in the process, causing extensive JVM GC spikes. This results in high CPU usage and distracts the LeaseRenewManager (a long GC/CPU clock causes the LeaseRenewManager to miss the default 4 seconds, or to attempt to renew the lease, firing a space service un-registering event). If the LUS fires an event to unregister a space, the UI spaces tree node represents it using a specific icon. Additionally, specific logging is printed out in the UI.

To avoid the unregistering of spaces, add resources (memory, CPU) or spaces, or tune the LeaseRenewal maxLeaseDuration and roundTripTime. These two values can be configured using the system properties:

//Default value for roundTripTime 4 seconds
-Dcom.gs.jini.config.roundTripTime=4000

//Default value for maxLeaseDuration  8 seconds.
-Dcom.gs.jini.config.maxLeaseDuration=8000

It is recommended to increase these values to 40000/80000 respectively in case a large cluster is used.

Increasing these values causes a delay when the space recognizes failover, since the active election infrastructure is based on space un-registration.

Minimize RMIRegistry Overhead

Since every space container starts an embedded RMIRegistry service, it creates a set of threads which consume a large amount of resources.

If the RMIRegistry service is not used, or if a full replication cluster or a large cluster is used; it is recommended to disable the RMIRegistry service in the space container and in the GSC/GSM.

For details on how to disable the space container RMIRegistry, refer to the Overwriting Default Configuration section.

Service Grid CPU, Memory, Disk CPU Measurement Overhead

Avoid/disable the Service Grid measurements for CPU, memory, and disk; since these perform an intensive update of Jini lookup attributes across all Lookup services every 15 sec.

For more details, see the following services.config file elements:

com.gigaspaces.management.system.memory
com.gigaspaces.management.system.cpu
com.gigaspaces.management.system.disk

Setting Redundant total_members for Clusters

Avoid setting total_member=100 if you need only 50 spaces. The overhead in such a case is high, especially in fully replicated clusters.

Lookup and Mahalo Service Settings

Set at least a 600 MB maximum heap for your LUS/GSM/Mahalo. Do not start more than 2 instances per cluster, and preferably start the cluster on your strongest machine.

Setting List of JARs Accessible through rmi.server.codebase or HTTPD

When adding a very long list of files (or very large files in size), all clients attempt to download these files over the wire. This causes dozens of MBs go over the wires repeatedly per each type of client, exhausting the HTTPD and the network.

Recommendations

  • Do not add third party JAR files which are not used by the space/service directly.
  • Pay special attention to the size of files.
  • Avoid using rmi.server.codebase wherever possible.
  • Use the
    file://
    protocol instead of the
    http://
    protocol wherever possible.

This Example of a Service Grid Deployment Descriptor contains a long list of files. See also the 50MB weblogic.jar.

Cluster Availability Monitoring

When a large cluster is monitored for availability, it is recommended to increase the value of the Monitor thread to a maximum. Usually, when there is no failover or when there are no backup-only spaces, the Monitor thread can be safely set to its maximum value; since clients directly interact with the space members. If either is detected as unavailable, the Detector thread is responsible for detecting their re-availability.

For more details, refer to the Viewing Clustered Space Status section.

Many Clients Accessing Space

When attempting to run hundreds of clients, which need to find a space and perform operations; a few considerations need to be taken.

Calling SpaceFinder

If hundreds of clients are calling SpaceFinder at the same time, RMI calls should be used. If this isn't possible, you should use Jini Unicast lookup only (without Multicast and groups set). Use Jini Multicast discovery as the last option, since it has the most overhead.

Cluster Availability Monitoring

When there are many clients monitoring the availability of a cluster, it is recommended to increase the value of the Monitor thread to a maximum. Usually, when there is no failover or there are no backup-only spaces, the Monitor thread can be safely set to its maximum value; since clients directly interact with the space members. If either is detected as unavailable, the Detector thread is responsible for detecting their re-availability.

For more details, refer to the Viewing Clustered Space Status section.

GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence