A Testbed Setup for Edge Data Management Strategies

A Testbed Setup for Edge Data Management Strategies

Virtual and Real Environments (15.02.2020)

Förderjahr 2018 / Stipendien Call #13 / ProjektID: 3793 / Projekt: Data Management Strategies for Near Real-Time Edge Analytics

This blog introduces a testbed for experimentally evaluating edge data management strategies, aiming to have a better overview of needed technologies and physical resources.

Following the blog post from December 2019, in this blog post, I give an overview of a testbed setup for experimentation of proposed new concepts as well as for future real-world demonstrations of edge data management strategies. Currently, we are installing a lab where we plan to apply different evaluation strategies and simulations of proposed approaches.

Edge computing should enable adaptive placement of applications across different infrastructures

The edge computing paradigm has been proposed to meet the strict latency and accuracy requirements of modern applications by extending cloud functionalities closer to the source of data. It is possible nowadays to process data closer to its origination thanks to technological and scientific advances that allowed placing compute, network and storage capabilities to edge nodes, micro data centers or resource-constrained devices such as Raspberry Pis. In order to cope with the ever-growing application requirements and user needs, in the near future, we can expect many edge-deployed and distributed clusters managed by different providers as in the multi-cloud concept. The multi-cloud concept has been introduced to run user applications across several hosting environments, eliminating the reliance on specific architecture providers. Similarly, the new edge computing paradigm should ensure adaptive placement of data analytics tasks and application instances across different infrastructures to keep overall system performance under control. Edge-deployed clusters can be heterogeneous containing different initial capacities as well as different availability of resources over a certain period of time.

A set of technologies, tools, and languages should be used to set up a virtual environment for testing proposed strategies, including:

Kubernetes is a platform, developed by Google, representing one of the widely used open-source orchestrators that automates deployment and management of multiple containerized applications across multiple machines. Kubernetes can be installed with minikube as a single-node cluster running for testing purposes or creating a multi-node cluster on the localhost. Kubernetes is implemented in the Go programming language. Although there are many options, the Kubernetes cluster can be easily created using Vagrant and Ansible.
Docker is a container platform used to build and isolate applications and corresponding stack of services on containers, that is, standalone execution environments.
Vagrant represents a tool for managing virtual machine environments. One of the typical provider to set up the virtual machines is VirtualBox. Additionally, Ansible playbooks are used in combination with Vagrant to install needed packages and tools (e.g., Kubernetes, Docker). Ansible playbooks, as later Kubernetes deployment manifest files, are written in YAML (Yet Another Markup Language) as it is often used for configuration files.

A great tutorial on how to set up Kubernetes using Ansible and Vagrant on virtual machines can be found here. Although some EDM strategies are proposed and can be used in this environment, more important challenge is to test them across real infrastructures such as Raspberry PIs or server nodes.

Figures 1 and 2 show the physical architecture on which we plan to simulate novel approaches using 12 Raspberry Pis 3B+ separate into 3 stackable cases, each containing 1 master and 3 worker nodes. Each RPi is equipped with 1 GB RAM memory and a Quad-Core ARM processor running at 1.4 GHz. All RPis are connected to the network with Netgear 24-Port 10-Gigabit Switch and an Ethernet router. Further, we plan to utilize two additional in-house servers (rll-mdc01 and rll-mdc02) equipped with 256 GB RAM memory and 24-core Intel Xeon E5 processor running at 2.2 GHz. Figure 3 shows the installed equipment.

Figure 2: LiveLab – Raspberry Pi clusters

Since the Kubernetes cluster is based on master-worker architecture, Figure 3 illustrates the previous setup in which every cluster consists of three worker nodes and one master node.

Finally, this testbed will be used for testing adaptive placements of application instances considering data locality and resource requirements. Further, sensor data can be migrated to different locations (e.g., Persistent Volumes – PV) as well as to reside on two or more locations at the same time. The problem is how to automatically place analytics tasks and jobs to different clusters while accessing the right or best data location. In the next steps, the plan is to utilize current equipment to evaluate some of the edge data management strategies as well as for future real-world demonstrations.

Ivan Lujic

Skills:

cloud computing

edge computing

data analytics

Internet of Things

time series forecasting

statistical analysis

Weitere Blogbeiträge

Self-adaptive Analytics Placement – Architecture Overview

This blog introduces a high-level architecture model overview of the SAPLaw system, aiming to illustrate the main components and steps for achieving self-adaptive analytics placement at the edge.

Self-adaptive Analytics Placement - Use Case

This blog introduces a use case for self-adaptive placements of data analytics in the edge-cloud environment based on data locality, aiming to efficiently place the analysis where data is located.

Contributions accepted for publication in IEEE Transactions on Services Computing

Paper "Resilient Edge Data Management Framework" has been accepted for publication in IEEE Transactions on Services Computing. This blog post gives an overview of what the most significant outcomes of the evaluation were.

Förderjahr 2018 / Stipendien Call #13 / ProjektID: 3793 / Projekt: Data Management Strategies for Near Real-Time Edge Analytics

Edge computing should enable adaptive placement of applications across different infrastructures

Tags:

Ivan Lujic

Skills:

Weitere Blogbeiträge

Self-adaptive Analytics Placement – Architecture Overview

Self-adaptive Analytics Placement - Use Case

Contributions accepted for publication in IEEE Transactions on Services Computing

Practical applicability of EDM strategies

Edge Data Management Solutions and Limitations

Intuitive and Proactive IoT Systems

Impact on Architectural Design of Elastic Edge Storage Services

From the Raw Time Series to a Symbolic Data Representation

Dealing with Approximate Data Representation and Analytics

Engineering Principles for Edge Data Services

Investigating Elasticity for Edge Storage Services

Need for Elastic Edge Data Services

Towards Self-adaptive Technique Selection for Edge Data Recovery

From raw sensor data to smart actuator decisions

State-of-the-art

Finding methodologies for overcoming emerging challenges

Moving to the edge...