Summary: OpenSpaces uses GigaSpaces Service Grid as an SLA-driven container for deploying Processing Units over a dynamic pool of machines, based on the Processing Unit SLA definitions.

Overview

OpenSpaces uses GigaSpaces Service Grid as a SLA-driven container for deploying Processing Units over a dynamic pool of machines, based on the Processing Unit SLA definitions. The Service Grid is composed of two main entities: A GSM (Grid Service Manager) and a GSC (Grid Service Container). The GSC (also known as a SLA-driven container) is responsible for running one or more Processing Units; while the GSM is responsible for coordination, orchestration and deployment of Processing Units on GSCs.

The SLA definition can be embedded within the Processing Unit Spring XML configuration, as well as provided during Processing Unit deployment. It defines the number of PU instances that are running; different policies (scaling, failover) based on CPU, memory, or application level measurements; and deploy-time requirements of specific GSCs. The GSM reads the SLA definition, and deploys the requested Processing Unit topology onto an available pool of GSCs.

Starting the Service Grid

In order to start the Service Grid, a GSM and several GSCs should be started. Additional GSMs can be started to act as backups in case a GSM fails. There is no need to run more than 2 GSMs for each Jini Group, see Lookup Service Configuration). Starting a GSM is done using the following command (exists in the <GigaSpaces Root>\bin directory):

Unix
gsm.sh

Windows
gsm.bat

Starting a GSC is done using the following command (exists in the <GigaSpaces Root>\bin directory):

Unix
gsc.sh

Windows
gsc.bat

The GSM automatically connects to all GSCs in its Jini lookup group, and uses them as the available pool of deployment containers. GigaSpaces installation comes with a default group name called gigaspaces-6.5XAPga. It is recommended to change the default lookup group name to a unique value (such as the user name), in order to avoid deployment of Processing Units to unwanted locations.
The lookup group name can be changed in several places, most commonly the -Dcom.gs.jini_lus.groups system property is changed in the setenv script in the <GigaSpaces Root>\bin directory.

If you want to use a unicast instead/with multicast discovery of services you should refer to the How do I Use/Set Unicast (Jini Locators) Discovery? section.

The following shows a single GSM and two GSCs using the GigaSpaces Management Center (started using the gs-ui.sh script), and the GigaSpaces CLI (started using the gs.sh script):

GigaSpaces Management Center:

GigaSpaces CLI:

The above shows a single GSM and two GSCs in the Service Grid Infrastructure section. We can also see two GSCs available for Processing Unit deployments in the Details panel.

SLA Element - Cluster Info

The SLA element defines the Processing Unit deployment nature in terms of topology, scaling policies, deploy time requirements, and different monitors. A sample SLA definition is shown below:

Namespace
<os-sla:sla cluster-schema="partitioned-sync2backup" number-of-instances="2" number-of-backups="1"
            max-instances-per-vm="1"/>

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="clusterSchema" value="partitioned-sync2backup" />
    <property name="numberOfInstances" value="2" />
    <property name="numberOfBackups" value="1" />
    <property name="maxInstancesPerVM" value="1" />
</bean>

This SLA definition above creates 4 instances of a Processing Unit using the partitioned-sync2backup (which is mostly used for space creation). It has 2 partition instances (number-of-instances="2") each with one backup (number-of-backups="1"). It also makes sure it doesn't deploy a primary and a backup of the same partition to the same GSC (the max-instances-per-vm="1").

The data example processor Processing Unit uses the SLA definition above, here is the deployment:

GigaSpaces Management Center:

GigaSpaces CLI:

Here we can see 4 instances of the data-processor processing unit with two partitions (data-processor.PU.1 and data-processor.PU.2), each with two instances (data-processor.PU.1 [1] and data-processor.PU.1 [2] one primary partition with a single backup).

The name of the different Processing Unit instances depends on the numberOfBackups parameter. If it is not set (we are using a cluster schema without backups, partitioned for example), the Processing Unit name is data-processor.PU, and it has a numberOfInstances instances. If numberOfBackups is set, there are several Processing Unit names: data-processor.PU.N where N range is from 1 to numberOfInstances, where each data-processor.PU.N has 1 + numberOfBackups instances (holds the primary and all of its backups).

SLA Cluster Info and the Space

OpenSpaces creates an abstraction between the Processing Unit itself and the container that runs it. The cluster-schema, number-of-instances and number-of-backups map to the OpenSpaces ClusterInfo parameters.

A Processing Unit that includes an embedded Space definition makes use of the three ClusterInfo parameters defined in the SLA, and creates the space with it. In other cases, the parameters simply control the Processing Unit's number of instances (primary and backup concepts only apply when an embedded space is used).

If we take a closer look at the ClusterInfo parameters, we can see that it holds the instanceId and backupId of specific instances. When working with the Service Grid as the deployment container, the SLA container automatically assigns the instanceId and backupId for each Processing Unit instance it creates, based on the SLA definitions.

It is up to the deployer to configure the SLA correctly. Trying to deploy a Processing Unit with a cluster schema that requires backups without specifying numberOfBackups causes the deployment to fail.

Distribution and Provisioning

Distribution of services takes into account the SLA definitions. These definitions are a set of constraints and requirements that are met when a service is provisioned on a specific container (GSC). All SLAs are considered during initial deployment, relocation, and failover of a service.

Default SLA Definition

If no SLA definition is provided either within the Processing Unit XML configuration or during deploy-time, a default SLA is used. The following is the default SLA definition:

Namespace
<os-sla:sla number-of-instances="1" />

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="numberOfInstances" value="1" />
</bean>

SLA - Max Instances per VM/Machine

The max instances per machine parameter is supported in GigaSpaces version 6.0.2 and onwards.

The SLA definition allows you to define the maximum number of instances for a certain service, either per VM (GSC) or machine (host, regardeless of the number of VM/GSCs running on it).

The max instances parameter is different in a topology without backups and a topology with backups (has the number-of-backups parameter set to a value other than zero). When working without backups, the max instances parameter defines the total number of instances that can be deployed (either on a single VM, or on a single machine). When working with backups, the max instances parameter defines the total number of instances that can be deployed in a group of single primary spaces and their backups (either on a single VM, or a single machine).

The most common usage of the max instances feature is when using a topology that includes backups, by setting its value to 1. This defines that a primary and its backup(s) cannot be deployed on the same VM (GSC) or on the same machine.

Here is an example of setting the max instances per VM parameter:

Namespace
<os-sla:sla max-instances-per-vm="1" />

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="maxInstancesOfVM" value="1" />
</bean>

Here is an example of setting the max instances per machine parameter:

Namespace
<os-sla:sla max-instances-per-machine="1" />

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="maxInstancesOfMachine" value="1" />
</bean>

SLA - Even Distribution of Primaries

In addition to the two definitions above (max-instances-per-vm and max-instances-per-machine), there is another implicit definition - "max primaries per machine", which is supported in GigaSpaces version 6.6.

The SLA defines the maximum number of primary space instances that can be deployed on each machine. By applying the "max primaries per machine" SLA constraint, the GSM will try to deploy primary spaces in an even manner across different physical machines. Distribution of backup spaces, aims to be even.

The maximum number of primaries is calculated by dividing the number of primaries by the number of physical machines, rounded up to the next integer. For example, a deployment of 4 primaries (with or without backups) on 2 physical machines, results in 2 primaries on each machine. The same goes for 3 machines. If there are more than 3 machines, then only 1 primary is provisioned on each machine.

The definition is implicit and there is no need to configure any os-sla parameter. The maximum number of primaries is calculated according to the current state of available machines. This is considered during failover and relocation of the service.

Evenness is not achieved as a constraint. Meaning, if you have 2 machines and you run 3,1 you will end up with 2 primaries on the 1st machine and the 3rd primary on the 2nd machine.
System guaranties that if you have more primaries than physical machines, than you will at least have one primary per machine.

Even distribution is enabled by default from version 6.6.5 onwards. In earlier versions it was disabled by default due its affect on deployment times (this issue was resolved in 6.6.5). If you would like to enable this feature, it is highly recommended that you upgrade to 6.6.5 or to the 7.0.x branch.
In order to enable this behavior in earlier versions interested in this behavior, you need to add the -DmaxPrimariesPerPhysicalMachine=999 system property to EXT_JAVA_OPTIONS in the setenv script. This property is then applied to the GSM. Any int value different than -1 acts as a true bit-flag.

SLA - Monitors

In the SLA, several monitors can be defined in order to provide runtime feedback to the current Processing Unit instance, as well as control an optional policy definition (explained in the next section). The default monitor used with OpenSpaces and Spring allows to periodically monitor a Spring bean property. Here is an example of how it can be defined:

Namespace
<os-sla:sla>
    <os-sla:monitors>
        <os-sla:bean-property-monitor name="Processed Data"
                                      bean-ref="dataProcessedCounter"
                                      property-name="processedDataCount"
                                      period="2000" />
    </os-sla:monitors>
</os-sla:sla>

<bean id="dataProcessedCounter" class="org.openspaces.example.data.processor.DataProcessedCounter"/>

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="monitors">
        <list>
            <bean class="org.openspaces.pu.sla.monitor.BeanPropertyMonitor">
                <property name="ref" value="dataProcessedCounter" />
                <property name="propertyName" value="processedDataCount" />
                <property name="period" value="2000" />
            </bean>
        </list>
    </property>
</bean>

<bean id="dataProcessedCounter" class="org.openspaces.example.data.processor.DataProcessedCounter"/>

The above definition monitors the following Java bean:

public class DataProcessedCounter {

    AtomicInteger processedDataCount = new AtomicInteger(0);

    @SpaceDataEvent
    public void dataProcessed(Data data) {
        processedDataCount.incrementAndGet();
        System.out.println("*** PROCESSED DATA COUNT [" + processedDataCount + "] DATA [" + data + "]");
    }

    public int getProcessedDataCount() {
        return processedDataCount.intValue();
    }
}

The SLA definition defines a monitor on a bean with an ID dataProcessedCounter, and a property called processedDataCount (a Java bean getter). The value is periodically checked (by invoking the getter) with a period of 2 seconds.

By defining the monitor, we can then see its value at runtime using the GigaSpaces Management Center:

The monitor screen can be opened by double-clicking the Processing Unit instance in the Grid Container view. In the above example, the data-processor.PU.1 [1] was clicked. In the monitor screen, we can see a list of all the monitors/watches defined, and see their value over time.

SLA Policy

A policy defined in the SLA element defines the action needed to be taken when a certain monitor breaches either its upper or lower threshold. OpenSpaces supports two policy types: Relocation and Scale Up.

SLA Relocation Policy

The relocation policy causes a Processing Unit instance to be relocated if one of its monitors have breached a threshold value. It can be defined in the following manner:

Namespace
<os-sla:sla>
    <os-sla:relocation-policy monitor="Processed Data" high="500" />
    <os-sla:monitors>
        <os-sla:bean-property-monitor name="Processed Data"
                                      bean-ref="dataProcessedCounter"
                                      property-name="processedDataCount"
                                      period="2000" />
    </os-sla:monitors>
</os-sla:sla>

<bean id="dataProcessedCounter" class="org.openspaces.example.data.processor.DataProcessedCounter"/>

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="policy">
        <bean class="org.openspaces.pu.sla.RelocationPolicy">
            <property name="monitor" value="Processed Data" />
            <property name="high" value="500" />
        </bean>
    </property>
    <property name="monitors">
        <list>
            <bean class="org.openspaces.pu.sla.monitor.BeanPropertyMonitor">
                <property name="ref" value="dataProcessedCounter" />
                <property name="propertyName" value="processedDataCount" />
                <property name="period" value="2000" />
            </bean>
        </list>
    </property>
</bean>

<bean id="dataProcessedCounter" class="org.openspaces.example.data.processor.DataProcessedCounter"/>

The above example causes a Processing Unit instance to be relocated to a different GSC if it processed more than 500 Data objects.

When using a Processing Unit with an embedded space, the relocation policy can be used to scale out an application when using a partitioned space. The deployment can start, for example, with 20 partitions working against 3 GSCs. In order to scale out the application, another GSC can be started, and several of the Processing Unit instances can be relocated to it.

SLA Scale Up Policy

The scale up policy causes a new Processing Unit instance to be created, if one of its monitors has breached an upper threshold value, up to a maximum defined value. A Processing Unit instance is destroyed if a monitor breaches a lower threshold value, down to the default number of instances.

Namespace
<os-sla:sla number-of-instances="2">
    <os-sla:scale-up-policy monitor="Processed Data" lower-dampener="30000" upper-dampener="2000" low="1" high="500" max-instances="4" />
    <os-sla:monitors>
        <os-sla:bean-property-monitor name="Processed Data"
                                      bean-ref="dataProcessedCounter"
                                      property-name="processedDataCount"
                                      period="2000" />
    </os-sla:monitors>
</os-sla:sla>

<bean id="dataProcessedCounter" class="org.openspaces.example.data.processor.DataProcessedCounter"/>

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="policy">
        <bean class="org.openspaces.pu.sla.ScaleUpPolicy">
            <property name="monitor" value="Processed Data" />
            <property name="maxInstances" value="4" />
            <property name="low" value="1" />
            <property name="high" value="500" />
            <property name="upper-dampener" value="2000" />
            <property name="lower-dampener" value="30000" />
        </bean>
    </property>
    <property name="monitors">
        <list>
            <bean class="org.openspaces.pu.sla.monitor.BeanPropertyMonitor">
                <property name="ref" value="dataProcessedCounter" />
                <property name="propertyName" value="processedDataCount" />
                <property name="period" value="2000" />
            </bean>
        </list>
    </property>
</bean>

<bean id="dataProcessedCounter" class="org.openspaces.example.data.processor.DataProcessedCounter"/>

The above example creates another Processing Unit instance once the processed data count passes 500 for a certain Processing Unit instance. It creates up to 4 total Processing Unit instances (the max-instances parameter). If the processed data count stays below the value 1, it removes the Processing Unit instances down to 2 instances (number-of-instances parameter).

When using the scale up policy, it is important to understand what is passed in an embedded space construction. The space total members (or number of instances) is the max-instances value, therefore, it is important to use space cluster schemas that support the new space cluster members creation (such as replicated), and ones that require some investigation and business applicability (such as partitioned).

The upper and lower 'dampeners' are properties that allow the scaling policy to avoid values that oscillate across an upper or lower boundary. These values behave as a smoothing filter, taking into account the reality that action on a threshold breach may need to be delayed before it is actually being executed.
In this case, scaling behavior (incrementing amount of instances) will only take place if the threshold condition is still an issue 2 seconds after the threshold has been breached. If the threshold has been cleared (falls back within acceptable range), then the increment action will not take place. In the same way, before scaling down , the system will wait 30 seconds after the threshold has been breached. If the threshold has been cleared then the decrement will not take place.

SLA Requirements

A set of one or more requirements can be defined in the SLA, controlling the applicable GSCs the Processing Unit is deployed to. The requirements are based on machine statistics and GSC capabilities. Here is an example showing all the different requirements supported:

Namespace
<os-sla:sla>
    <os-sla:requirements>
        <os-sla:host ip="127.0.0.1" />
        <os-sla:system name="test2">
            <os-sla:attributes>
                <entry key="entry1" value="value1" />
            </os-sla:attributes>
        </os-sla:system>
        <os-sla:cpu high=".9" />
        <os-sla:memory high=".8" />
    </os-sla:requirements>
</os-sla:sla>

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="requirements">
        <list>
            <bean class="org.openspaces.pu.sla.requirement.HostRequirement">
                <property name="id" value="127.0.0.1" />
            </bean>
            <bean class="org.openspaces.pu.sla.requirement.SystemRequirement">
                <property name="name" value="test2" />
                <property name="attributes">
                    <map>
                        <entry key="entry1" value="value1" />
                    </map>
                </property>
            </bean>
            <bean class="org.openspaces.pu.sla.requirement.CpuRequirement">
                <property name="high" value=".9" />
            </bean>
            <bean class="org.openspaces.pu.sla.requirement.MemoryRequirement">
                <property name="high" value=".8" />
            </bean>
        </list>
    </property>
</bean>

When using the host or system requirements, more than one requirement can be defined (for example, to define a set of machines this Processing Unit can be deployed to; define the machine's CPU utilization limit; and define the GSC's memory usage limit ).

Defining system requirements allows you to configure logical mapping of which SLA containers the Processing Unit is deployed to. When using system requirements, the GSC is started with a set of capabilities that match (or don't match) a given system capability. Here is an example of an XML override file that is used when starting a GSC:

<overrides>
    <Component Name="org.jini.rio.qos">
        <Parameter Name="addPlatformCapabilities">
        <![CDATA[
        new org.jini.rio.qos.capability.PlatformCapability[] {
            new org.jini.rio.qos.capability.software.SoftwareSupport(
                new Object[]{"Name", "X"})
            }
        ]]>
        </Parameter>
    </Component>
</overrides>

The above override adds "Software Support" for a feature called X. We can then start the GSC using the override (assuming the file name is feature-override.xml):

Unix
gsc.sh feature-override.xml

Windows
gsc.bat feature-override.xml

Within the SLA, we can then define that the Processing Unit should be deployed only to GSCs that expose feature X:

Namespace
<os-sla:sla>
    <os-sla:requirements>
        <os-sla:system name="SoftwareSupport">
            <os-sla:attributes>
                <entry key="Name" value="X" />
            </os-sla:attributes>
        </os-sla:system>
    </os-sla:requirements>
</os-sla:sla>

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="requirements">
        <list>
            <bean class="org.openspaces.pu.sla.requirement.SystemRequirement">
                <property name="name" value="SoftwareSupport" />
                <property name="attributes">
                    <map>
                        <entry key="Name" value="X" />
                    </map>
                </property>
            </bean>
        </list>
    </property>
</bean>

Instance Level Requirements

OpenSpaces allows you to define requirements per Processing Unit instance. Here is an example:

Namespace
<os-sla:sla>
    <os-sla:requirements>
        <os-sla:cpu high=".9" />
        <os-sla:memory high=".8" />
    </os-sla:requirements>
    <os-sla:instance-SLAs>
        <os-sla:instance-SLA instance-id="1">
             <os-sla:requirements>
                <os-sla:host ip="100.0.0.1" />
             </os-sla:requirements>
        </os-sla:instance-SLA>
        <os-sla:instance-SLA instance-id="1" backup-id="2">
             <os-sla:requirements>
                <os-sla:host ip="100.0.0.2" />
             </os-sla:requirements>
        </os-sla:instance-SLA>
    </os-sla:instance-SLAs>
</os-sla:sla>

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="requirements">
        <list>
            <bean class="org.openspaces.pu.sla.requirement.CpuRequirement">
                <property name="high" value=".9" />
            </bean>
            <bean class="org.openspaces.pu.sla.requirement.MemoryRequirement">
                <property name="high" value=".8" />
            </bean>
        </list>
    </property>
    <property name="instanceSLAs">
        <list>
            <bean class="org.openspaces.pu.sla.InstanceSLA">
                <property name="instanceId" value="1" />
                <property name="requirements">
                    <list>
                        <bean class="org.openspaces.pu.sla.requirement.HostRequirement">
                            <property name="id" value="100.0.0.1" />
                        </bean>
                    </list>
                </property>
            </bean>
            <bean class="org.openspaces.pu.sla.InstanceSLA">
                <property name="instanceId" value="1" />
                <property name="backupId" value="1" />
                <property name="requirements">
                    <list>
                        <bean class="org.openspaces.pu.sla.requirement.HostRequirement">
                            <property name="id" value="100.0.0.2" />
                        </bean>
                    </list>
                </property>
            </bean>
        </list>
    </property>
</bean>

The above example verifies that the first instance is deployed to a specific machine (specified by its IP), and its backup is deployed to a different machine. All instances share the "general" requirements of CPU and memory.

When specifying instance level SLA requirements, a scaling policy is not supported.

See the Primary Backup Processing Unit SLA Example section for more details.

SLA - Member Alive Indicator

The member alive indicator allows to configure the SLA on how often a member will be monitored to see if a member is alive, and in case of failure, how many times to retry and how often.

Property Description Default
invocation-delay How often (in milliseconds) an instance will be monitored and verified to be alive. 5000, which are 5 seconds
retry-count Once a member has been indicated as not alive, how many times to check it before giving up on it. Defaults to 3
retry-timeout Once a member has been indicated as not alive, what is the retry timeout interval (in milliseconds). 500 milliseconds

When a Processing Unit is determined as not alive, the active GSM tries to re-deploy the Processing Unit (according to the SLA definitions).

NameSpace
<os-sla:sla>
    <os-sla:member-alive-indicator invocation-delay="5000" retry-count="3" retry-timeout="500" />
</os-sla:sla>

Plain
<bean id="SLA" class="org.openspaces.pu.sla.SLA">
    <property name="member-alive-indicator">
        <bean class="org.openspaces.pu.sla.MemberAliveIndicator">
            <property name="invocationDelay" value="5000" />
            <property name="retryCount" value="3" />
            <property name="retryTimeout" value="500" />
        </bean>
    </property>
</bean>

Logging

config\gs_logging.properties includes:
org.openspaces.pu.container.servicegrid.PUFaultDetectionHandler.level = INFO

Level Description
CONFIG Logs the configurations applied
FINE Logs once a member is determined as not alive
FINER Logs once a member is indicated as not alive (on each retry)
FINEST Logs verify attempts and success

For service-failure troubleshooting, Level.FINE should suffice.

Deploying a Processing Unit

A Processing Unit can be easily deployed onto the Service Grid. In order to deploy a Processing Unit, the Processing Unit must follow the processing unit structure.

There are several ways to package and deploy a Processing Unit. The first option is to copy the Processing Unit (with the correct structure) to the machine the GSM is running in under the <GigaSpaces Root>\deploy directory (the deploy directory can be configured using com.gs.deploy system property). The second option is to JAR up the Processing Unit and remotely deploy it. Another option is to point to a directory which is the processing unit and deploy it (it will be jarred up automatically). This option uploads the packaged JAR file into all the current GSMs, and extracts it there. Once extracted, it is deployed as usual.

By default, when a GSC is provisioned to run a processing unit instance, it automatically downloads the processing unit from the GSM into <GigaSpaces Root>\work under deployed-processing-unit directory.
The work directory can be configured using the com.gs.work system property.
Downloading the processing unit is the preferred option, but it can be disabled. In order to disable it, a processing unit property pu.download should be set to false.
pu.download property - The property can be set by setting it within the processing unit META-INF/spring/pu.properties location. It can also be set during deployment either using the CLI (adding -properties embed://pu.download=false to the command), or the UI (second wizard screen, adding pu.download as the key, and false as the value).

OpenSpaces provides several options for deploying a Processing Unit onto the Service Grid. Here is a simple deployment example (assuming that a data-feeder Processing Unit directory structure exists under the GSM deploy directory):

Code
Deploy deploy = new Deploy();
// Also has a main method that can be used the same
deploy.deploy(new String[] {"-groups", "kimchy", "hello-world"});

Ant
<deploy name="hello-world" />

<macrodef name="deploy">
    <attribute name="name"/>
    <sequential>
        <java classname="org.openspaces.pu.container.servicegrid.deploy.Deploy" fork="false">
            <classpath refid="all-libs"/>
            <arg value="-groups" />
            <arg value="kimchy" />
            <arg value="@{name}"/>
        </java>
    </sequential>
</macrodef>

GigaSpaces CLI

align=center!

GigaSpaces Management Center


In GigaSpaces version 6.5 and onwards, it is possible to select an existing Processing Unit from the Processing Unit name drop-down menu (as seen above).
See how to navigate from your deployed Processing Unit to the newly created space.

All different deploy options are built on top of the Deploy class. For the pudeploy CLI command parameters, refer to the pudeploy section.

GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence