Here at Jonajo we are interested in being as efficient as possible in order to maximize potential both for our clients and for ourselves. Because we are a company with many interests maximizing performance often comes in different forms. For instance, maximizing performance on the marketing team might involve collaboration and an ability to work with others to generate new ideas. On the other hand, maximizing performance for software engineers might mean finding and utilizing the appropriate tools for each job that they come across. This was the case recently here at Jonajo. One of the projects that we have been developing recently is called Rentalios. Rentalios is a room rental estimation tool. Essentially, landlords and tenants are able to utilize Rentalios in order to discover how much to charge for a room, or to find rooms that they can afford. Obviously, this application requires the use of a large amount of data. The issue that our programers were running into was that the algorithm they were originally using was too slow for this much data. How could we go about maximizing performance for Rentalios? Luckily, one of our software engineers, Ichwan Yasir, was on the case. It was time to find the appropriate tool and start maximizing performance.
BACKGROUND
During the development of the Rentalios application we ran into an issue. This issue was that the KNN algorithm was not an efficient way to analyse the data. In short, it was taking too long. After further investigation into the function to understand the long processing time we found that the issue was with the geospatial filters. It requires a long time for the geospatial filters to pull the necessary data. . From that discovery we tried to speed up the process by finding other solutions. One of the solutions is to replace database technology that has the ability to execute geospatial queries, with good performance and high scalability for future improvement. To find the best database technology for our Rentalios application we needed to know the details of performance in spatial and non-spatial functions.
Currently there are many types of technology databases with various advantages. To determine what would be best for our application, we looked for a database technology with the lowest execution time to run spatial queries and non-spatial queries. We compared several databases in order to find the best fit for Rentalios and other future applications.
Several techniques have been proposed to improve the performance of spatial indexes. One of these techniques is R-tree. R-tree is a spatial extension of B-trees. They are height balanced and store the object information so that spatial queries, such as range queries, point queries, etc. can be executed efficiently. Spatial objects are stored in such a way that queries such as “Find the rent of all houses within 2 miles from the current position” are executed within a fraction of a second. In the NEELABH’s research they concluded that R-tree is the best indexing structure for spatial measurement queries[1]. So in this comparison we only chose databases that utilize R-tree indexes.
SPATIAL DATABASE
- MySQL : is the world’s most popular open source database, enabling the cost-effective delivery of reliable information. Additionally, MySQL is categorized as the second most popular Relational DBMS, according to DBEngine. In the official documentation, mysql v8.0 mysql already uses an R-tree index for innoDB and MyIssam Engine.
- Neo4j : Described by its developers as an ACID-compliant transactional database with native graph storage and processing. It is the most popular graph database according to DB-Engines, and the 22nd most popular database overall. Neo4j utilizes another library for R-tree indexes. With this database, we need to setup spatial index library first to get the best possible performance in spatial query.
- Redis : is an in-memory data structure project implementing a distributed, in-memory key-value database with optional durability. Redis supports different kinds of abstract data structures, such as strings, lists, maps, sets, sorted sets, HyperLogLogs, bitmaps, streams, and spatial indexes. It has a variety of indexes which can be optimized as needed, including R-tree indexes in their spatial indexes.
SPATIAL QUERY
We found the spatial execution time by making several queries. In each query we would increase the radius. From these multiple queries we were able to ascertain an average for the execution time.
- MySQL : average execution time : 0.696777 ms. Details of the execution times are shown in Table 1 and Figure 1.
- Neo4j : average execution time : 0.035488 ms. Details of execution times are shown in Table 2 and Figure 2.
- Redis : average execution time : 0.014239 ms. Details of execution times are shown in Table 3 and Figure 3.
NON-SPATIAL QUERY
For this experiment we attempted to execute CRUD query and compare the functions to each other. Details of execution times are shown in Table 4 and Figure 4.
CONCLUSIONS
Based on the results of our tests, Redis performs the best in spatial query.However, Neo4j is the best option of the three (MySQL and Redis) for implementation in the Rentalios application. This is based on the structure of the data which requires excellent performance in both non-spatial and spatial queries.
References
[1] Pant, N. (2015). Performance comparison of spatial indexing structures for different query types.