Summary: How to model application data for in-memory data grid
Moving from Centralized to Distributed Data ModelWhen moving from a centralized into a distributed data store, your data needs to be partitioned across multiple nodes (AKA partitions). Implementing the partitioning mechanism technically is not a hard task; however, planning the distribution of your data for scalability and performance, requires some thinking. There are several questions which need to be answered when planning for data partitioning: We recommend using the following table for this process:
Once you have identified the size and expected growth of your data, you can start thinking about partitioning it; however, there's more to consider before doing that. 2. What are my application's use cases? While you might be used to model your data by the logical relationship of your data items, in the case of distributed data, you need to think differently. The rule of thumb here is to avoid cross cluster relationships as much as possible, since they will lead to cross cluster queries and updates which are usually much less scalable and fast than their local counterparts. Thinking in terms of traditional relationships ("one to one", "one to many" and "many to many"), is deceiving with distributed data. If an entity is associated with several containers (parent entities), it can't be embedded within the containing entity. It might be also impossible to store it with all of its containers on the same partition. Here's an example: We have mentioned the concept of embedded relationships above, let us now explain this concept's implications on your application. Embedded vs. Non Embedded RelationshipsEmbedded Relationships mean that one object physically contains the associated objects and there is a strong lifecycle dependency between them - once you delete the containing object, you also delete all of its contained objects. With this type of object association, you are always ensuring a local transaction since the entire object graph is stored in the same entry within the Space. Here are example for embedded relationships data access: SQLQuery<Person> query = new SQLQuery<Person> (Person.class, "info.socialSecurity < ? and info.socialSecurity >= ?"); Embedded Map Query - The info property is a Map within the Person class: SQLQuery<Person> query = new SQLQuery<Person>(Person.class, "info.salary < 15000 and info.salary >= 8000"); Embedded Collection Query - The employees property is a collection within the Company class: SQLQuery<Company> query =
new SQLQuery<Company>
(Company.class, "employees[*].children[*].name = 'Junior Doe');
See the SQLQuery section for details about embedded entities query and indexing. Non Embedded Relationships mean that one object is associated with a number of other objects, so you can navigate from one object to another. However, there is no life cycle dependency between them, so if you delete the referencing object, you don't automatically delete the referenced object(s). The association is therefore manifested in storing IDs rather than storing the actual associated object itself. This type of relationship means that you don't duplicate data but you are more likely to use access more than one node in the cluster when querying or updating your data.
When Should Objects be Embedded?You already know it's not a good practice to embed related objects. But even when there's a good case for embedding related objects (sometimes at the cost of data duplications), you still should be aware of the following:
Thumb Rules for Choosing Embedded Relationships
|
![]() |
GigaSpaces.com - Legal Notice - 3rd Party Licenses - Site Map - API Docs - Forum - Downloads - Blog - White Papers - Contact Tech Writing - Gen. by Atlassian Confluence |