Summary: Lowering the Space Object Footprint - The Binary Serialization Pattern

Overview

By default, when using the Java API with POJO or Entry based class, the space stores space object fields as is. No data compaction, or compression is done while the object is transported across the network or when stored within the space.

  • The compressed serialization mode compressed non primitive fields using the zip utilities.
  • C++ and .Net objects data does go through some compaction when sent across the network.

The binary serialization technique allows you to reduce the footprint associated with storing space object in memory in great manner. This means you will be able to store more space objects per memory unit.

The basic idea of the binary serialization is simple: Total control on the format of the space object data while transported over the network and when stored within the space. This technique avoids the extra de-serialization involved when space object written to the space from a remote client (for non primitive fields such as user defined classes or collection field) and when replicated to a backup space(s) and the serialization involved when reading an object back from the space into the client process.

Future versions will generate the serialization methods in real time.

With the binary serialization pattern you should have all the fields used for matching and query as indexed fields , while all the rest (payload) stored into byte array field. Before writing the object to the space, the payload field should be packed into the byte array field and once the object is read from the space the payload fields should be unpacked.

When the object is written to the space:
– Its fields are compressed where some of the fields (that are not required for matching) being compressed into the same field (as a byte array).
– All the remaining fields are serialized at the client side.
– The object with compressed fields arrives in the space, de-serialized, fields used for matching are stored separately within the space , all fields that are not required for matching stored within the space within one byte array field.

When the object is read from the space:
– The read template undergoes the same actions as when writing an object to the space
– The matching object fields are compressed (the byte-array data is already in compressed for) and serialized.
– When the object arrives the client side, it is de-serialized, the required fields are uncompressed and all others are de-serialized and uncompressed on demand in a lazy manner.

Using the binary serialization pattern can reduce the object footprint when stored within the space in drastic manner. As much as you will have more fields as part of the space object serialized using the binary serialization pattern , the memory footprint overhead will be smaller compared to regular space class.

Example

With the attached example we have a space class with 37 fields.

  • 1 Query Integer data type field
  • 12 String fields
  • 12 Long fields
  • 12 Integer Fields.

The footprint overhead of a regular Space POJO class compared to a binary format Space POJO Class is 300%.

  • With 64 bit JVM the Regular Class consumes 2253 bytes and the binary format Class consumes 707 bytes.
  • With 32 bit JVM the Regular Class consumes 1434 bytes and the binary format Class consumes 505 bytes.

To run this example copy the example package zip into \GigaSpaces Root\examples\Advanced\DataGrid , extract the zip file and follow the instructions at the readme file.

The Regular Space class

Our example involves a space class that will be modified to follow the binary serialization pattern.

The original class includes:

  • One Integer indexed field that acts also as the routing field and declared as space class field
  • 12 String type non indexed fields declared as space class fields
  • 12 Long type non indexed fields declared as space class fields
  • 12 Integer type non indexed fields declared as space class fields
  • Getter and Setter methods for the above fields

The original class would look like this:

@SpaceClass(replicate=true)
public class SimpleEntry {

	public SimpleEntry() {
	}
	private Integer _queryField;
	private Long _longFieldA1;

	?

	@SpaceRouting
	@SpaceProperty(index=IndexType.BASIC)
	public Integer get_queryField() {
		return _queryField;
	}

	public void set_queryField(Integer field) {
		_queryField = field;
	}

	public Long get_longFieldA1() {
		return _longFieldA1;
	}

	public void set_longFieldA1(Long fieldA1) {
		_longFieldA1 = fieldA1;
	}

The BinaryFormatEntry class

The modified class includes:

  • One Integer indexed field that acts also as the routing field and declared as space class field
  • One binary array declared as a space class field
  • 12 String type non indexed fields declared as none space class fields
  • 12 Long type non indexed fields declared as none space class fields
  • 12 Integer type non indexed fields declared as none space class fields
  • Getter and setter methods for the above fields
  • Data pack and unpack method and few helper methods
  • Externalizable implementation with the writeExternal and readExternal methods

The modified class would look like this:

@SpaceClass(includeProperties=IncludeProperties.EXPLICIT ,replicate=true)
public class BinaryFormatEntry implements Externalizable {

    public BinaryFormatEntry(){}

    private Integer    _queryField;
    private byte[]     _binary;

    private Long       _longFieldA1;
    ?.

    @SpaceRouting
    @SpaceProperty(index=IndexType.BASIC)
    public Integer     get_queryField()
    {
        return _queryField;
    }
    
    public void set_queryField(Integer     queryField)
    {
        _queryField = queryField;
    }

    @SpaceProperty
	public byte[] get_binary() {
		return _binary;
	}

	public void set_binary(byte[] _binary) {
		this._binary = _binary;
	}

	public Long get_longFieldA1() {
		return _longFieldA1;
	}

	public void set_longFieldA1(Long fieldA1) {
		_longFieldA1 = fieldA1;
	}
	?


	public void pack()
	{}
	public void unpack()
	{}
	public void writeExternal(ObjectOutput out) 
	{}
	public void readExternal(ObjectInput in)
	{}
	private long getnulls(){}
	{}
	
	private short checkNulls() 
	{}

The pack method

The pack method serialize the object data. It is called before calling the space write operation.
This method serialize the object data by placing the data into the byte array field. Null values fields indication stored within one field.
The PbsFormatter utility class is used to write the binary data into the byte array. The PbsFormatter compact the data before it is writing it into the byte array.

public void pack()
{
    long nulls = 0;

    try{

        ByteArrayOutputStream baos = new ByteArrayOutputStream();

        nulls = getnulls();
        PbsFormatter.writeLong(baos, nulls);


        if (_longFieldA1 != null)
        	PbsFormatter.writeLong(baos, _longFieldA1);

        _binary = baos.toByteArray();
        baos.close();
    }catch (Exception e) {
        e.printStackTrace();
    }
}

The unpack method

This method de-serialize the object data by extracting the data from the byte array field and populating the fields with their corresponding values. Null values fields are non populated. This method is called after calling the space read operation. The PbsFormatter utility class is used to read the binary data and place it into the relevant field.

public void unpack() {
    long nulls = 0;
    try{
        int i = 0;
        ByteArrayInputStream bais = new ByteArrayInputStream(_binary);
        nulls = PbsFormatter.readLong(bais);

        if( (nulls & 1L << i) == 0 )
            _longFieldA1 = PbsFormatter.readLong(bais);
        i++;
		?.
        bais.close();
        _binary = null;
    }catch (Exception e) {
        e.printStackTrace();
    }
}

The writeExternal method

The writeExternal method retrieves the object data and writes it into the output stream.
The object data involves fields with null data (streamed into a bit map field calculated by the checkNulls method) , the query field and a byte array field that includes all non indexed fields data created by the pack method. The pack method has been called explicitly prior the space write method call.

public void writeExternal(ObjectOutput out) throws IOException {
	short nulls = 0;
	int i=0;
	nulls = checkNulls();
	out.writeShort(nulls);
	if (_queryField != null) {
	    out.writeInt(_queryField);
	}
	if (_binary != null) {
	    out.write(_binary);
	}
}

The readExternal method

The readExternal method essentially performs the opposite of the what the writeExternal method is doing.
This methods populate the query field data and the byte array field data. Later , the remaining fields will be populated once the unpack method will be called.

public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {
	short nulls;
	int i=0;
	nulls = in.readShort();
	if( (nulls & 1L << i) == 0 )
	   _queryField = in.readInt();
	i++;
	if( (nulls & 1L << i) == 0 )
	{
	     byte[] data = new byte[1000];      // ß Here you might need to create array with large enough size
	     int len = in.read(data);
	     _binary = new byte[len];
	     System.arraycopy(data, 0, _binary, 0, len);
	}
}

The checkNulls method

This method goes through the query field and the byte array fields and place into a short data type field indication for the ones with null value using a bit map.

private short checkNulls() {
	short nulls = 0;
	int i = 0;
	
	nulls = (short) ((_queryField == null) ? nulls | 1 << i : nulls);
	i++;
	nulls = (short) ((_binary == null) ? nulls | 1 << i : nulls);
	i++;
	return nulls;
}

The getnulls method

This method goes through all class non indexed fields (the ones that their data is stored within the byte array) and place into a Long data type field indication for the ones with null value using a bit map.

private long getnulls()
{
    long nulls = 0;
    int i=0;
    nulls = ((_longFieldA1 == null)  ? nulls | 1L << i : nulls ) ;
    i++;
    nulls = ((_longFieldB1 == null)  ? nulls | 1L << i : nulls ) ;
    i++;
	?.
    return nulls;
}

The Factory method

The example using a factory method called generateBinaryFormatEntry to create the space object. Once it has been populated , its pack method is called.

private BinaryFormatEntry generateBinaryFormatEntry(int id){
	BinaryFormatEntry bfe = new BinaryFormatEntry(id, value1 , value2 ?)
	bfe.pack();     //  the pack method is called implicitly as part of the factory method
	return bfe;
}

Writing and Reading the Object from the space

The following code snipped illustrates how the binary serialized object is written into the space and read from the space:

BinaryFormatEntry testBFE = generateBinaryFormatEntry(500);
_space.write(testBFE, null, Lease.FOREVER);
BinaryFormatEntry templateBFE = new BinaryFormatEntry();
templateBFE._queryField = new Long(500);
BinaryFormatEntry resBFE = (BinaryFormatEntry)_space.read(templateBFE, null, 0);
resBFE.unpack(); ß this deserialize the binary data into the object fields
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence