- Created on 11 January 2017
ITware is currently in the first phase of creating a new solution for one of our clients, we have created a case study to share with our viewers.
The customer’s inquiry: The customer is currently running an ELK (Elasicsearch + Logstash + Kibana) stack for log collecting and sorting, they would like a better solution than this due to the following issues.
- They want more zero configuration/ convention over configuration
- Multi line log processing is a very big issue currently
- They would like a non-custom protocol between the log senders and aggregators, because they want to proxy the communication through Apache instances
- They have legacy applications which can only write logs into files, has no syslog/syslog-ng implementation
- The customer would like some microservice like components in the solution without needing major new components in their infrastructure (e.g service registry)
- Every service must have a health check/ statistics interface over HTTP
Our solution: At ITware we use top technologies and methodologies including microservices, containerisation, cloud services and more. For this particular solution we will deliver a three-component solution, the parts are divided as follows
Log Collector:
- Listens for filesystem actions under a directory and starts reading newly created files
- Continuously reads the logfiles in this folder tree and passing the read lines to the log aggregator
- Source logfile paths are passed along their contents, with this path, the aggregator can decide which processing method and pattern should be used
- Aware of inode number, therefor, application level log rotating will not make a new logfile to read
- Communicates with the log aggregator over websocket (proxyable by apache)
- Can handle file truncate events
- Will have the ability to save its state, therefore, if it is restarted it will have the right file positions to continue the read from
Log Aggregator:
- Listens on a websocket for a collector
- Includes a parser database, which supports Grok and Javascript parsers
- Chosen processing method and pattern is decided by the source host and log file’s path
- It parses the log messages and stores them in Elasticsearch
- The common component with the predecessor ELK stack
- Stores the log messages
About the log parsing parsing/ processing:
- Grok is implemented because the predecessor system also contains the logs in this format
- A javascript engine is implemented in the aggregator, which allows it to write finite-state machines for log parsing, due to the fact that it has its own javascript context for every logfile source. With that, very complex rules can be implemented for multi-line log parsing.
About zero configuration: The aggregator decides the log processing parameters, depending on the source log path, for example, if every nginx log is put into a /var/log/nginx folder and a rule is defined for this, it will automatically be processed if a new source is added.
Conclusion: This solution will not only solve all the issues the customer is facing with their current log collector, it will also provide them with a more efficient, easy to use log collector, that has many added benefits for their company.