Arctic Logsearch developed based on Elastic stack components

Home
Case Studies
Arctic Logsearch developed based on Elastic stack components

Customer: I:FAO Group GmbH, Frankfurt am Main The project name: Arctic Logsearch developed based on Elastic stack components

High level concept of the project

The booking engine that is integrated with more than 50 external content providers such as Booking.com, Sabre, Amadeus, HRS and more. The system generates application logs from each executed event such as reservation, cancellation and more. Logs and events are connected with unique identifiers and constitute an event record in the entire booking engine. The logs provide a source of information for analyzing issues with the system stability. They are also indicators of the system performance and connectivity layer between the core system and the external provider.

Deliverables of the project and business expectations

Creating a platform that allows searching and processing events generated in the platform. The platform is to support full text search. The platform is to allow search based on defined attributes. It should allow creating complex queries with the use of logical operators and aggregation. It is necessary that it stores data for a minimum of 36 months. Accessing data from the last month has to be quick and efficient as any delay in entering or retrieving data can have a negative impact on the user and developer experience (DX). The system and its backend has to be horizontally scalable with the increase in data generated by the booking engine (the core of the system).

The initial version of the project, MVP

We have decided to use all the components from Elastic stack. The Elasticsearch cluster has become the heart of the system. We have used Logstash as a tool to transform and prepare the appropriate data structure. In order to visualize data, we have selected Kibana. Due to a great number of attributes we have suggested using dedicated indices in which only the data needed to create dashboard and diagrams are kept.

The final solution

We have deployed and configured the Elasticsearch cluster. In the cluster we have distinguished the appropriate roles such as master and dedicated data nodes to process CRUD operations. The high-level diagram of the entire system presents the architecture of the system with its key components. The appropriate data mapping has been prepared; it indexes only the selected fields that are used in a custom developed application and by Kibana. This approach has significantly impacted overall Elasticsearch cluster performance. As logs are a dynamic structure, it might happen that on the Elasticsearch level the already defined data type will be different from the type that we received in the event. In that case the log will be rejected. Therefore, we have implemented the Dead Letter Queue feature. It is a special queue where all logs which weren’t stored in Elasticsearch are placed. The team that is responsible for logs can review them and decide how to index such an event. We have configured a farm of multiple Logstash instances which are responsible for accepting data from the client’s applications and carrying out the defined data transformation. Weekly indices that are created by Logstash have been implemented. Considering the amount of data and the available resources we have designed indices with the appropriate number of primary shards and their replicas. Please note that 1 shard is a scaling unit in Elasticsearch.

In order to meet the requirements related to how long data will be kept, we have introduced a hot / warm / cold architecture - learn more about that approach in a separate article and in one of our webinars on [Effective log management with Elasticsearch.] (https://www.cometari.com/video/effective-log-management-with-elasticsearch) Here is the link to the mentioned article that is realted to building hot / warm architctures for Elasticsearch.

All actions such as data merge, shrink are managed by Curator that is part of Elastic Stack. This application called Logsearch becomes the main place to review logs from the entire booking system. Additionally, the application delivers a module that allows calculating statistics such as “look to book ratio".

Additionally, we have created a separate system that displays live data of booking in a beautiful ui. The amount of booking is split into air, hotels, rails and cars. Owing to our professional approach and technical experience, we have been able to deliver a comprehensive solution from a software development perspective as well as design and maintain production and pre-production environments for the entire stack. Considering good DevOps practices, each component has been launched as a container. The environment is orchestrated by Kubernetes. Such an approach allows us to deploy and release more quickly the newest version or even a minor patch for one of the components. We use Jenkins as a tool to build and deploy new versions of components. All the components are deployed and released using an approach called canary deployment. You can learn more about that approach from one of our webinar called Canary Deployment with Traefik and K3s.

Deliverables of the project.

The application is widely used as a source of truth in the process of investigating a specific case related to the stability of a booking tool. It allows reviewing logs from the entire system in an easy way without having to write a complex query, regular expressions or using other command line tools for text processing. The application allows seeing a full workflow from the application perspective.

Future development plans.

The application is being constantly developed and new features are being added. The new features are strictly related to statistical features.
Parallelly, the client core system is being developed by adding new business features related to the booking workflow, creating itineraries or introducing new payment formats. Therefore, it is our responsibility to maintain the system and implement changes that are developed by our customer. It mainly comes down to updating mappings and making changes to Logstash’s pipelines.

The data that are being processed by the system are not only error statuses but also data related to the performance which indicates whether system is good enough. Based on these data, we can ascertain whether the system works fine and whether its condition is acceptable since basic operating system metrics such CPU usage and memory don’t provide information about the condition of a system. They only show the amount of the available resources.

That’s why we analyze the specific metrics based on defined KPI such as service level Objective (SLO). In that particular case it is very important to define a metric called ‘error budget.’ Owing to such comprehensive metrics, we can assess whether our system works fine, whether it is good enough. This approach allows us to see the system through to the user’s eyes and see how satisfied customers are with the application.

Cometari