The Jonajo Blog

Maximizing Performance: A Comparison of spatial and non-spatial queries

Here at Jonajo we are interested in being as efficient as possible in order to maximize potential both for our clients and for ourselves. Because we are a company with many interests maximizing performance often comes in different forms. For instance, maximizing performance on the marketing team might involve collaboration and an ability to work with others to generate new ideas. On the other hand, maximizing performance for software engineers might mean finding and utilizing the appropriate tools for each job that they come across. This was the case recently here at Jonajo. One of the projects that we have been developing recently is called Rentalios. Rentalios is a room rental estimation tool. Essentially, landlords and tenants are able to utilize Rentalios in order to discover how much to charge for a room, or to find rooms that they can afford. Obviously, this application requires the use of a large amount of data. The issue that our programers were running into was that the algorithm they were originally using was too slow for this much data. How could we go about maximizing performance for Rentalios? Luckily, one of our software engineers, Ichwan Yasir, was on the case. It was time to find the appropriate tool and start maximizing performance.

BACKGROUND

During the development of the Rentalios application we ran into an issue. This issue was that the KNN algorithm was not an efficient way to analyse the data. In short, it was taking too long.   After further investigation into the function to understand the long processing time we found that the issue was with the geospatial filters. It requires a long time for the geospatial filters to pull the necessary data. . From that discovery we tried to speed up the process by finding other solutions. One of the solutions is to replace database technology that has the ability to execute geospatial queries, with good performance and high scalability for future improvement. To find the best database technology for our Rentalios application we needed to know the details of performance in spatial and non-spatial functions. 

Currently there are many types of technology databases with various advantages. To determine what would be best for our application, we looked for a database technology with the lowest execution time to run spatial queries and non-spatial queries. We compared several databases in order to find the best fit for Rentalios and other future applications. 

Several techniques have been proposed to improve the performance of spatial indexes. One of these techniques is R-tree. R-tree is a spatial extension of B-trees. They are height balanced and store the object information so that spatial queries, such as range queries, point queries, etc. can be executed efficiently. Spatial objects are stored in such a way that queries such as  “Find the rent of all houses within 2 miles from the current position” are executed within a fraction of a second. In the NEELABH’s research they concluded that R-tree is the best indexing structure for spatial measurement queries[1]. So in this comparison we only chose databases that utilize R-tree indexes.

 

SPATIAL DATABASE

  1. MySQL : is the world’s most popular open source database, enabling the cost-effective delivery of reliable information. Additionally, MySQL  is categorized as the second most popular Relational DBMS, according to DBEngine. In the official documentation, mysql v8.0 mysql already uses an R-tree index for innoDB and MyIssam Engine.
  2. Neo4j : Described by its developers as an ACID-compliant transactional database with native graph storage and processing. It is the most popular graph database according to DB-Engines, and the 22nd most popular database overall. Neo4j utilizes another library for R-tree indexes.  With this database, we need to setup spatial index library first to get the best possible performance in spatial query.
  3. Redis : is an in-memory data structure project implementing a distributed, in-memory key-value database with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indexes. It has a variety of indexes which can be optimized as needed, including R-tree indexes in their spatial indexes.

SPATIAL QUERY

We found the spatial execution time by making several queries. In each query we would  increase the radius. From these multiple queries we were able to ascertain an average for the execution time.

  1. MySQL : average execution time : 0.696777 ms. Details of the execution times are shown in Table 1 and Figure 1.
Execution times spatial query mysql

Table 1 : Execution times spatial query mysql

 Execution times spatial query mysql

Figure 1 : Execution times spatial query mysql

 

  1. Neo4j : average execution time : 0.035488 ms. Details of execution times are shown in Table 2 and Figure 2.
Execution times spatial query neo4j

Table 2 : Execution times spatial query neo4j

 

 Execution times spatial query neo4j

Figure 2 : Execution times spatial query neo4j

  1. Redis : average execution time : 0.014239 ms. Details of execution times are shown in Table 3 and Figure 3.

 

Execution times spatial query Redis

Table 3 : Execution times spatial query Redis

Execution times spatial query Redis

Figure 3 : Execution times spatial query Redis
Note: in the Redis experiment we only used the geospatial key. edis is key oriented, for that reason, detailed attributes are not included. If the attributes are included, the execution time will decrease 0.8763 ms on average (depending on the number of attributes).

NON-SPATIAL QUERY

For this experiment we attempted to execute CRUD query and compare the functions to each other. Details of execution times are shown in Table 4 and Figure 4.

Executions time non-spatial query

Table 4 : Executions time non-spatial query

Executions time non-spatial query

Figure 4 : Executions time non-spatial query

 

 

CONCLUSIONS

Based on the results of our tests, Redis performs the best in spatial query.However,  Neo4j is the best option of the three (MySQL and Redis) for implementation in the Rentalios application. This is based on the structure of the data which requires excellent performance in both non-spatial and spatial queries.

References

[1] Pant, N. (2015). Performance comparison of spatial indexing structures for different query types.

About author View all posts

Ichwan Yasir

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.