Summary: Understanding how space operations behave in a partitioned environment.
OverviewA partitioned space provides the ability to perform space operations against multiple spaces from a single proxy transparently. The primary goal of the partitioned space is to provide unlimited In-Memory space storage size and group objects into the same partition to speed up performance. The initial intention is to write data into the partitioned space, and route query operations based on the template data. Note that in such a configuration, the different spaces defined as partitions are not aware of each other ("Share Nothing") - the client proxy is the one that is aware of the partitioned space configuration. Defining a Routing PropertyPartitioning is used when the total data set is too big to be stored in a single space, and we want to divide the data into two or more groups (partitions). In order to do that, the proxy needs to know how to partition the data, i.e. which entry belongs in which partition. In order to accomplish that a routing property can be defined on the entry type. When the proxy is asked to write an entry, it uses the entry's routing property's hash code to determine the relevant partition for it: Target partition space ID = safeABS(routing field hashcode) % (number of partitions) int safeABS( int value) { return value == Integer.MIN_VALUE ? Integer.MAX_VALUE : Math.abs(value); } The routing property can be explicitly set using the @SpaceRouting annotation for POJO entries or via the SpaceTypeDescriptorBuilder for document entries. If the routing property is not explicitly set, the space id property is used for routing. If the space id property is not defined, the first indexed property (alphabetically) is used for routing, otherwise the first property (alphabetically) is used for routing.
In some scenarios, the data model does not require sophisticated partitioning, and simple space-id-based partitioning is all that's needed (hence the default). In other scenarios this is not enough. For example, suppose we have a Customer class with a customerId, and an Order class with an orderId and customerId, and we want all the orders of each customer to be co-located with it in the same partition. In that case we'd explicitly set the routing property of Order to customerId to ensure they're co-located. However, note that all objects with a given routing value will be stored on the same partition. This means that a given partition must be able to hold all similarly-routed objects. If the routing value is weighted too far in a given direction (i.e., the distribution is very uneven, with one value having 90% of the data set, for example) the partitioning will be uneven. It's best to use a derived routing field that gives a flat distribution across all nodes, if possible. Writing To a Partitioned SpaceAs explained above, in Defining a Routing Property, when a proxy is asked to write an entry it extracts the routing property value to determine the relevant partition id and forwards the entry to that partition. If the routing property value is null, an exception will be thrown indicating the proxy cannot write the entry because it does not know the target partition. When an entry is being updated, the proxy uses the routing property to route the update request to the relevant partition. On a batch write/update via writeMultiple, the proxy iterates over the batch of entries and divides them into groups according to the routing property, and each group is sent to the relevant partition in parallel. Starting with 8.0.1, an autogenerated id property can be used for routing as well - the proxy detects that routing is autogenerated, selects a partition using a round-robin approach and forwards the entry to that partition. The space embeds the partition id in the autogenerated id value, so if the entry is later updated the proxy will extract the partition id from the id value and route the entry to the correct partition. Note that autogenerated routing is only valid in conjunction with autogenerated id, and cannot be used to route other entries. Querying a Partitioned SpaceWhen a proxy is asked to execute a query (template, SQL or id) on a partitioned space, it first checks if the query contains a routing value. If it does, the query is executed only on the relevant partition, otherwise it is broadcast to the entire cluster. This is relevant to all the read / take methods (if exists, by id, async, multiple, etc.), as well as the count and clear methods and space notifications.
Initializing a Partitioned SpaceWhen initializing a partitioned space, it is possible to load only the data you need for a specific partition. For more details, see External Data Source Initial Load. Non-Partitioned OperationsThe clean and snapshot operations are not partition-aware, i.e. they are executed on a default partition that the proxy selects implicitly, and not broadcast to the entire cluster. |
![]() |
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence |