This story starts with an issue reported by one of my customer: a car manufacturer. Several times per day, its IT team thanks to batches has to migrate a large volume of data (several TB) from 3D CAD (Computer Aid Design) application version n to version n+1. Classical scenario of data migration. But it is not done once but several times per day, all the year. Indeed, they decided to use in parallel the two versions for a while until they are sure all is fine with new app version.
For it, 3D CAD application editor provided a lot of specific scripts generating a lot of logs (30GB per week). But, most of these logs are not structured Meaning that it is very complex to exploit logs data for understanding the frequent data migration issues. They cannot fix them and understand...
They ask us to have a look: first conclusions of data exploration and data mining. It costs a lot of time to read the log and try to understand something. Logs are not structured and retrieving one information requires to open 10 logs... Getting manually datasets for performing data mining required 8 hours for 50 observations. Findings of data mining with so few data are not excellent...Initial idea was to explain data processing issues thanks to others variables found in the logs..
We need a way to structure these logs and we cannot wait for CAD app editor for it.
Use of LogStash/ElasticSearch/Kibana Use of ElasticSearch in this case was powerful. It allows thanks to Logstash patterns and filters to extract and structure data from these logs. As ELK stack is scalable, large volume are not a problem. Kibana allows to design quickly dynamic reports of monitoring.
And as it is open source, no licence costs. Just some Linux servers and we get the control on those data.