Kubernetes – based HTC framework for secure big data processing

There are several approaches for processing big data in the cloud. For data analysis that
require a global view of the data, Apache Spark, described in a previous section, is the most
popular. Nevertheless, some big data processing objectives require
only that individual pieces of data are independently processed. One example is the execution
of simulations that evaluate the status of the power grid, which is one of the SecureCloud
use cases. Other examples include the transformation
of data to compute user specific outputs (e.g., recommendations) or to load data into other
systems (e.g., ETL). In such cases, the elasticity of the cloud can be fully exploited as the
number of machines can be chosen based
on cost or automatically adjusted in other to guarantee that the processing will be finished
according to the desirable deadlines. If the data processing tasks can be mapped to this
model, resources can be much more efficiently
managed, in comparison to Spark, reducing costs and improving throughput. This asset consists
of a set of tools (available through Docker repositories, e.g., SCONE, CAS), applications
templates (which will be available on public
Github repositories), and configurations to existing open-source tools (Kubernetes and
Asperathos, configurations that will be available on public Github repositories). These tools
and configurations that enable users to build
applications that can process large amounts of data in a deadline- aware scalable fashion
that also supports the confidentialities aspects provided by the SecureCloud project. The
asset is open-source. By downloading images and
tools from Docker registry and Github, a user can build secure big data processing
applications based on Kubernetes Jobs. Nevertheless, to be able to put applications in
production, users may need to license other project technologies,
such as SCONE and/or the SGX-enabled OpenStack.