Nowadays, big data is being integrated into systems requiring processing a vast amount of information from (geographically) distributed data sources while fulfilling the non-functional properties inherited from the domain in which analytics are applied, for example, smart cities or smart manufacturing domains.

ELASTIC has designed a novel software architecture to addresses the challenge of efficiently distributing extreme-scale big-data analytics across the compute continuum, from edge to cloud while providing guarantees on the non-functional requirements of real-time, energy, communications and security spanning from the smart mobility domain.

To that end, the ELASTIC software ecosystem incorporates different components from multiple information and communications technologies (ICTs), including distributed data analytics, embedded computing, internet of things (IoT), cyber-physical systems (CPS), software engineering, high-performance computing (HPC), and edge and cloud technologies.

The integration of all these components into a single development framework enables the design, implementation and efficient execution of extreme-scale big-data analytics. To achieve this goal, it has incorporated a new elasticity concept across the compute continuum, with the objective of providing the level of performance needed to process the envisioned volume and velocity of data from geographically dispersed sources at an affordable development cost whilst guaranteeing the fulfilment of the non-functional properties inherited from the system domain.

The ELASTIC SA, validated in three smart mobility use cases in the city of Florence, has achieved:

  • Integration and optimization of advanced data analytics methods into a complex workflow for both real-time and offline analytics, executed across the edge/cloud continuum and collecting extreme data from multiple sources from both the tramway network and the city infrastructure.
  • Up to 50% reduction in SW development costs, bringing down the development time for the smart city use case from 2 months to 2 weeks.
  • Up to 38% reduction of the analytics response time through advanced scheduling for distributed execution, taking into account data dependencies, the quality of communication links and real-time requirements.


The figure below shows a schematic view of the compute continuum considered in the ELASTIC use cases and an overview of the ELASTIC software architecture stack, showing the key software components.

 

SA


Overall, the ELASTIC SA consists of four layers, each tackled by a dedicated Work Package (WP) of the project, further described below. All software components have been added to the dedicated ELASTIC GitLab repository.

At the highest layer, the Distributed Data Analytics Platform (DDAP) deals with the management, storage, and retrieval of data at the time it is needed and at the location where it is needed. It also offers the necessary Application Programming Interfaces (APIs) to the programmer, enabling the development of extreme data analytics workflows able to leverage the distributed nature of the ELASTIC architecture.

The figure below shows an overview of the DDAP deployed for the ELASTIC smart mobility use cases.

The two key software components of DDAP are:

  • dataClay, employed for data-in-motion generated at the edge and consumed by real-time data analytics methods, and
  • Druid aggregating all data for offline data-at-rest analysis taking place in the cloud.

Other software components, such as Kafka, were employed as an intermediate step for data transition from the edge to the cloud.

Distributed Data Analytics Platform

 

 

Two data analytics platforms, composed of microservices for data extraction, data aggregation, data cleaning, data filtering, and geo triangulation, were also developed as part of the DDAP:

  • The ICE Knowledge Discovery tool is employed for the offline analysis of datasets coming from the connected trams and the city infrastructure as part of the public-private transport interaction use case.
  • The Predator tool implements learning algorithms for the detection and potential prediction of defective track segments as part of the predictive maintenance use case.

 

Check all the details in the ELASTIC GitLab sub-group repository: https://gitlab.bsc.es/elastic-h2020/elastic-sa/ddap

 

This layer implements the set of tools and components to continuously monitor the behaviour of the ELASTIC system regarding the fulfilment of non-functional requirements (time, energy, communication quality and security). The NFR tool architecture is shown in the figure below, consisting of the following key components:

  • At the lowest layer, a probing mechanism has been implemented, consisting of several application or system tools that extract relevant metrics from the underlying hybrid fog computing platform, published by a Data Router.
  • Three distributed NFR tool components (NFR monitors) for time and energy, communication quality and security, which monitor the corresponding properties and detect violations concerning a predefined set of thresholds based on the application and system requirements. The NFR components are deployed in each available edge node where the analytics workflows are executed.

A Global Resource Manager (GRM) entity that has a holistic view of the ELASTIC system over a cluster of edge nodes monitored by the distributed NFR tools, where multiple analytics applications may be executed. The GRM receives the metrics and violations reported by the NFR monitors and elaborates a list of available resources (i.e., computing nodes and a number of available CPUs) recommended per application (also considering each application’s requirements and priority level). These recommendations are passed as an input to the orchestrator layer, which is then responsible for enforcing the allocation of computational tasks on the available resources based on the implemented scheduling policies.

 

NFR tool

 

Check all the details in the ELASTIC GitLab sub-group repository: https://gitlab.bsc.es/elastic-h2020/elastic-sa/nfr-tool

 

At the heart of the ELASTIC SA, the orchestrator layer employs the COMP Superscalar (COMPSs framework), which handles the deployment and the scheduling of the computation (i.e., the data analytics tasks provided by the DDAP) for the efficient distributed execution of analytics across the compute continuum, from edge to cloud. The orchestrator layer is key to implementing the elasticity concept in which data analytics workflows are distributed not only by taking into account performance metrics but also non-functional requirements. This has been achieved with the following features:

  • New deployment mechanisms to facilitate the interoperability in the fog platform between edge and cloud computing resources, fully leveraging container technologies, enabling the abstraction of the deployment and execution needs of workflows. To that end,  COMPSs has been integrated with Nuvla, an edge-cloud management platform with advanced capabilities that facilitate the deployment across heterogeneous infrastructure.
  • Integration with the NFR tool to receive real-time information on relevant system metrics and recommendations on the available computing and communication resources, adapting to any changes in the compute continuum and ensuring the fulfilment of the non-functional properties of time, energy, communications and security.

 

Orchestration layer

 

  • New COMPSs workflow scheduling strategies leveraging the NFR tool information, implementing different heuristics that aim to minimize the execution time of the workflow. The tradeoff between exploiting parallelism in the execution of the analytics workflow and keeping down the response time of real-time analytics methods has also been explored. Overall, the proposed scheduling heuristics yielding a performance improvement of up to 38% in terms of execution time, with respect to baseline strategies.

 

At the lowest level of the ELASTIC software architecture, the hybrid fog computing platform contains all the software components forming the ELASTIC compute continuum, including cloud, edge, distributed data and communications infrastructure. Docker Swarm and Kubernetes infrastructures are supported at the cloud level, while two commercial edge solutions are owned by ELASTIC partners, namely the NuvlaBox (SixSq) and the KonnektBox (IKERLAN) are considered. The distributed storage component of the fog platform is implemented by dataClay, which is responsible for collecting data from edge sensors and preparing it for analytics by the DDAP. Finally, a data broker component has been implemented, in charge of exchanging information between the hybrid fog platform components and external services.

The dependencies between the components running on the fog are reduced to a minimum to allow the hot plug and dynamic reconfiguration of the services offered by the platform. All the corresponding software and hardware components for this architecture are either present in the cloud or edge, or in between any intermediate layer (i.e. middleware).

The hybrid architecture allows the use of dynamic applications in the form of microservices (containers) or native applications (monolithic). Predictability (native) and flexibility (microservices) can be achieved with this approach. The platform will be compatible with both orchestration systems offering a high level of flexibility for the use case.

 

Hybrid Fog Computing Platform

 

Check all the details in the ELASTIC GitLab sub-group repository: https://gitlab.bsc.es/elastic-h2020/elastic-sa/fog-platform.