Summary: A Cassandra Space Persistency Solution
OverviewThe Apache Cassandra Projectâ„¢ is a scalable multi-master database with no single points of failure. The Apache Cassandra Project develops a highly scalable second-generation distributed database, bringing together Dynamo's fully distributed design and Bigtable's ColumnFamily-based data model. Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines. Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime. Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure. Cassandra Space Data Source and Space Synchronization EndpointGigaSpaces comes with built in implementations of Space Data Source and Space Synchronization Endpoint for Cassandra, called CassandraSpaceDataSource and CassandraSpaceSynchronizationEndpoint, respectively. For further details about the persistency APIs used see Space Persistency. Cassandra Space Data SourceConfigurationA Cassandra based implementation of the Space Data Source. Library dependenciesThe Cassandra Space Data Source uses Cassandra JDBC Driver and Hector Library For communicating with the Cassandra cluster. <!-- currently the cassandra-jdbc library is not the central maven repository --> <repository> <id>org.openspaces</id> <name>OpenSpaces</name> <url>http://maven-repository.openspaces.org</url> </repository> <dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-clientutil</artifactId> <version>1.1.6</version> </dependency> <dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-thrift</artifactId> <version>1.1.6</version> </dependency> <dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-jdbc</artifactId> <version>1.1.2</version> </dependency> <dependency> <groupId>org.hectorclient</groupId> <artifactId>hector-core</artifactId> <version>1.1-2</version> </dependency> SetupAn example of how the Cassandra Space Data Source can be configured for a space that loads data back from Cassandra once initialized and
Spring
<?xml version="1.0"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:os-core="http://www.openspaces.org/schema/core" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd http://www.openspaces.org/schema/core http://www.openspaces.org/schema/9.5/core/openspaces-core.xsd"> <bean id="propertiesConfigurer" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"/> <bean id="cassandraDataSource" class="org.apache.cassandra.cql.jdbc.CassandraDataSource"> <constructor-arg value="${cassandra.host}" /> <constructor-arg value="${cassandra.port}" /> <constructor-arg value="${cassandra.keyspace}" /> <constructor-arg value="${cassandra.user}" /> <constructor-arg value="${cassandra.password}" /> <constructor-arg value="2.0.0" /> </bean> <bean id="hectorClient" class="org.openspaces.persistency.cassandra.HectorCassandraClientFactoryBean"> <!-- comma separated seed list --> <property name="hosts" value="${cassandra.host}" /> <!-- cassandra rpc communication port --> <property name="port" value="${cassandra.port}" /> <!-- keyspace name to work with --> <property name="keyspaceName" value="${cassandra.keyspace}" /> </bean> <bean id="cassandraSpaceDataSource" class="org.openspaces.persistency.cassandra.CassandraSpaceDataSourceFactoryBean"> <!-- configured above --> <property name="cassandraDataSource" ref="cassandraDataSource" /> <!-- configured above --> <property name="hectorClient" ref="hectorClient" /> </bean> <os-core:space id="space" url="/./dataSourceSpace" space-data-source="cassandraSpaceDataSource" schema="persistent" mirror="true"> <os-core:properties> <props> <!-- Use ALL IN CACHE, put 0 for LRU --> <prop key="space-config.engine.cache_policy">1</prop> <prop key="cluster-config.cache-loader.central-data-source">true</prop> <prop key="cluster-config.mirror-service.supports-partial-update">true</prop> </props> </os-core:properties> </os-core:space> <os-core:giga-space id="gigaSpace" space="space" /> </beans> Code HectorCassandraClient hectorClient = new HectorCassandraClientConfigurer() .clusterName(cluster) .hosts(cassandraHosts) .port(cassandraPort) .keyspaceName(cassandraKeyspaceName) .create(); CassandraDataSource ds = new CassandraDataSource( cassandraHosts, cassandraPort, cassandraKeyspaceName, cassandraUser, cassandraPassword, "2.0.0"); CassandraSpaceDataSource spaceDataSource = new CassandraSpaceDataSourceConfigurer() .cassandraDataSource(ds) .hectorClient(hectorClient) .create(); GigaSpace gigaSpace = new GigaSpaceConfigurer(new UrlSpaceConfigurer("/./space") .schema("persistent") .mirror(true) .cachePolicy(new LruCachePolicy()) .addProperty("cluster-config.cache-loader.central-data-source", "true") .addProperty("cluster-config.mirror-service.supports-partial-update", "true") .spaceDataSource(spaceDataSource) .space()).gigaSpace(); For more details about different configurations see Space Persistency. CassandraSpaceDataSource Properties
ConsiderationsGeneral limitations
Cache miss Query limitationsSupported queries:
Unsupported queries:
Unsupported queries and queries on unindexed properties will result in a runtime exception. Cassandra Space Synchronization EndpointConfigurationA Cassandra based implementation of the Space Synchronization Endpoint. Library dependenciesThe Cassandra Space Synchronization Endpoint uses the Hector Library For communicating with the Cassandra cluster.
hector using log4j
<dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-clientutil</artifactId> <version>1.1.6</version> </dependency> <dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-thrift</artifactId> <version>1.1.6</version> </dependency> <dependency> <groupId>org.hectorclient</groupId> <artifactId>hector-core</artifactId> <version>1.1-2</version> </dependency> hector using java.util.logging <dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-clientutil</artifactId> <version>1.1.6</version> </dependency> <dependency> <groupId>org.apache.cassandra</groupId> <artifactId>cassandra-thrift</artifactId> <version>1.1.6</version> </dependency> <dependency> <groupId>org.hectorclient</groupId> <artifactId>hector-core</artifactId> <version>1.1-2</version> <exclusions> <exclusion> <groupId>org.slf4j</groupId> <artifactId>slf4j-log4j12</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-api</artifactId> <version>1.6.6</version> </dependency> <dependency> <groupId>org.slf4j</groupId> <artifactId>slf4j-jdk14</artifactId> <version>1.6.6</version> </dependency> SetupAn example of how the Cassandra Space Synchronization Endpoint can be configured within a mirror.
Spring
<?xml version="1.0"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:os-core="http://www.openspaces.org/schema/core" xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd http://www.openspaces.org/schema/core http://www.openspaces.org/schema/9.5/core/openspaces-core.xsd"> <bean id="propertiesConfigurer" class="org.springframework.beans.factory.config.PropertyPlaceholderConfigurer"/> <bean id="hectorClient" class="org.openspaces.persistency.cassandra.HectorCassandraClientFactoryBean"> <!-- comma separated seed list --> <property name="hosts" value="${cassandra.host}" /> <!-- cassandra rpc communication port --> <property name="port" value="${cassandra.port}" /> <!-- keyspace name to work with --> <property name="keyspaceName" value="${cassandra.keyspace}" /> </bean> <bean id="cassandraSpaceSyncEndpoint" class="org.openspaces.persistency.cassandra.CassandraSpaceSynchronizationEndpointFactoryBean"> <!-- configured above --> <property name="hectorClient" ref="hectorClient" /> </bean> <os-core:mirror id="mirror" url="/./mirror-service" space-sync-endpoint="cassandraSpaceSyncEndpoint"> <os-core:source-space name="space" partitions="${numOfPartitiones}" backups="${numOfBackups}"/> </os-core:mirror> </beans> Code HectorCassandraClient hectorClient = new HectorCassandraClientConfigurer() .clusterName(cluster) .hosts(cassandraHosts) .port(cassandraPort) .keyspaceName(cassandraKeyspaceName) .create(); SpaceSynchronizationEndpoint syncEndpoint = new CassandraSpaceSynchronizationEndpointConfigurer() .hectorClient(hectorClient) .create(); IJSpace mirror = new UrlSpaceConfigurer("/./mirror-service") .schema("mirror") .spaceSynchronizationEndpoint(syncEndpoint) .addProperty("space-config.mirror-service.cluster.name", "space") .addProperty("space-config.mirror-service.cluster.partitions", String.valueOf(numOfPartitiones)) .addProperty("space-config.mirror-service.cluster.backups-per-partition", String.valueOf(numOfBackups)) .create(); For more details about different configurations see Space Persistency. CassandraSpaceSynchronizationEndpoint Properties
Property Value SerializerBy default when serializing object/document properties to column values, the following serialization logic is applied: For fixed properties:
For dynamic properties:
It is possible to override this default behavior by providing a custom implementation of PropertyValueSerializer . ByteBuffer toByteBuffer(Object value); Object fromByteBuffer(ByteBuffer byteBuffer); The behavior of overriding the serialization logic is different for fixed properties and dynamic properties:
Flattened Properties FilterIntroductionWhen a type is introduced to the Cassandra Space Synchronzation Endpoint, the type's fixed properties will be introspected and the final result will be a mapping from this type's nested properties to column family columns. // implementation omitted for brevity @SpaceClass public class Person { @SpaceId public Long getId() ... public String getName() ... public Address getAddress() ... ... } public class Address { public String getStreetName() ... public Long getStreetNumber() ... } By default, the fixed properties will be mapped to the Person column family in Cassandra like this:
Notice how the address property was flattened and its properties are flattened as columns. Now suppose that a Person is written to the space as a SpaceDocument which also includes these dynamic properties:
By default, dynamic properties are not flattened and are written as is to Cassandra. Moreover, their static type is not updated in the Column Family metadata and they are serialized using a custom serializer. (see Property Value Serializer). This is how they will be written to Cassandra:
CustomizationIt is possible to override the above behavior by providing a FlattenedPropertiesFilter implementation. The interface is defined by a single method: boolean shouldFlatten(PropertyContext propertyContext);
The return value indicates whether the current introspected property should be serialized as is or should its nested properties be introspected as well. the PropertyContext contains the following details about the current introspected property: String getPath(); String getName(); Class<?> getType(); boolean isDynamic(); int getCurrentNestingLevel(); Column Family Name ConverterDue to implementation details of Cassandra regarding Column Families there are certain limitations when converting a type name (e.g: com.example.data.Person) to a column family name. Among these limitations is a 48 characters max length limitation and invalid characters in the name (such as '.'). String toColumnFamilyName(String typeName); The default implementation is: DefaultColumnFamilyNameConverter . Considerations
|
![]() |
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence |