The emergence of standards in any technical field shows that the time of unique start-up solutions is coming to an end. Today these solutions are gradually being replaced by an industrial approach. New players are stepping in, they are not eager to take part in terminology controversy, but they would like to build rapport with partners and open their business to new consumers with whom they can speak a common language. That was the case with Big Data – a set of emerging technologies that for several years were just a point on Gartner’s Hype Cycle and then suddenly dropped off in 2015. Considerable speculation on this theme led to a popular opinion that we had seen an outbreak of a new marketing term that disappeared together with the invested money. However, professionals explained it in a different way: Big Data moved rapidly beyond Gartner’s Hype Cycle and settled down firmly in the industry. To prove this fact, we will try to analyze recent standardized documents, that is standards and recommendations that regulate currents relations in the Big Data business.
In 2015 the first standardization documents in Big Data were issued, and that was NIST Special Publication 1500 under the title NIST Big Data Interoperability Framework in seven volumes: Definitions, Big Data Taxonomies, Use Cases and General Requirements, Security and Privacy, Architectures White Paper Survey, Reference Architecture, Standards Roadmap. ISO/IEC (International Organization for Standardization and International Electrotechnical Commission) began working on the standards in 2016. As a result, Joint Technical Committee ISO/IEC JTC 1, Information Technology developed the following standard: ISO/IEC CD 20546 – Information Technology – Big Data – Overview and Vocabulary. Besides, the committee outlined such standard as ISO/IEC TR 20547 Information Technology – Big Data Reference Architecture – Part 1: Framework and Application Process, Part 2: Use Cases and Derived Requirements. In 2013 ITU (International Telecommunication Union) first issued ITU-T Technology Watch Report under the title Big Data: Big Today, Normal Tomorrow and in 2015 the first recommendation known as ITU-T Y.3600 Big Data – Cloud Computing Based Requirements and Capabilities.
In all these documents, Big Data was described from an engineering perspective alternatively to a frequently cited publication by Danah Boyd and Kate Crawford who defined “Big Data as a socio-technical phenomenon”. Now the conversation between customers and developers can rely on a general definition based on multiplicity of v’s that semantically is identical in all standardized documents (Big Data consists of extensive datasets – primarily in the characteristics of volume, variety, velocity, and/or variability – that require a scalable architecture for efficient storage, manipulation, and analysis). Big Data focuses on the self-referencing viewpoint that data is big because it requires scalable systems to handle it. The ITU recommendation defines Big Data as a paradigm for enabling the collection, storage, management, analysis and visualization, potentially under real-time constraints, of extensive datasets with heterogeneous characteristics. (NOTE – Examples of datasets characteristics include high-volume, high-velocity, high-variety, etc.).
The major achievement of standardization in the field of Big Data is the introduction of Big Data reference architecture. It is comprised of a number of actors who perform specific roles and responsibilities in Big Data projects: a Data Provider, Data Consumer, System Orchestrator, Big Data Application Provider (data mining, analytics, preparation, visualization, and access control), Big Data Framework Provider (infrastructure, platform, processing, messaging and communication system, resource management).
Having quite a detailed description of each role, a project manager can effectively distribute work between people and partner organizations. A base model has a visual graphic form that allows using it as a template when high-level Big Data system designing is required. Within the regulatory documents, NIST and ISO/IEC have provided a highly detailed overview of the popular use cases where Big Data is a key for the implementation process. Such an overview enables highlighting the basic requirements to Big Data processing systems, which is very useful when positioning specific projects. If we take a look at the main application fields of Big Data today, we can group the following use cases:
It is clear that Big Data standards correlate with a number of other IT standards and recommendations adopted by various international organizations, thus representing future directions of standardization in this field.
Obviously, the purpose of standards development is their application in future. Quite often the advantage of using standards is not evident at the early stages of their emergence. However, we can recall the publication of ISO/OSI Reference Model (ISO 7498, ITU-T X/200) that helped the developers` community to gain complete understanding of network systems, and for many years it has been serving as a basis for maintaining a competitive software and network equipment market. A set of more detailed standards for separate model components was developed, and as a result, adherence to these standards allowed vendors to provide interaction between huge networks, and software manufacturers- to modernize networks without involvement of physical equipment. Thus, the Big Data industry has a great potential, and the emergence of standards will boost its development in the coming years.