You will probably hardly find a person in the IT field who has never heard of the parallel computing, parallel data processing or at least just the term cluster or grid. Nevertheless, we will let ourselves make our own contribution to this growing direction in the development of the IT industry.
At the beginning of 2013, a project called Flamenco was introduced. It was a semi-educational project with a practical focus. Two purposes were initially set:
At first, quite popular Apache Zookeeper was chosen as a node coordinator. Project compiling that used Maven for its integration was chosen as a practical task for a future computational cluster.
Having created a proof of concept (POC), we managed to verify the idea quite quickly: the process implementation of the distributed project compiling took about 4-5 days (we had to program from morning till night, even at weekends), and the first results inspired us to continue the work. The projects were really compiled in a distributed manner, with no errors, and the compiled JARs could be used.
A lot of interesting tasks, neglected during the creation of POC, or for which special conditions were made to avoid the problems, still remained at that time. It took us much time and effort to solve the related tasks. As a result, we mastered a pretty wide range of frameworks, went deep into understanding the project analysis process conducted by Maven, used the Apache Hadoop distributed file system called HDFS, made our knowledge in the field of Java IO more profound, and eventually, decided to transfer a part of the project into Java 7.
It took us far more time than creating POC, and in half a year or so we released the first version of Flamenco 0.1 based on Apache Zookeeper, Apache Hadoop HDFS and Spring. And it was also a success: a full cycle of distributed compilation, without the need of time-consuming preparation of the cluster node, a packaged solution. However, the problems of architecture and the chosen frameworks were detected, such as the occurrence of the bottleneck on the node controller, the existence of the external server Zookeeper and its support, and other problems.
Having achieved a solid result and released the first version, we decided to redesign the architecture taking into account the available experience, and in fact a new Flamenco computational grid v 0.2 was developed based on the past solutions. OSGi framework, jGroups and Spring were chosen as the basic frameworks.
To be continued…