Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus

•Download as PPTX, PDF•

2 likes•322 views

Nowadays different Cloud services enable enterprises to migrate applications to the Cloud. An application can be partially migrated by replacing some of its components with Cloud services, or by migrating one or multiple of its layers to the Cloud. As a result, accessing application data stored off-premise requires mechanisms to mitigate the negative impact on Quality of Service (QoS), e.g. due to network latency. In this work, we propose and realize an approach for transparently accessing data migrated to the Cloud using a multi-tenant open source Enterprise Service Bus (ESB) as the basis. Furthermore, we enhance the ESB with QoS awareness by integrating it with an open source caching solution. For evaluation purposes we generate a representative application workload using data from the TPC-H benchmark. Based on this workload, we then evaluate the optimal caching strategy among multiple eviction algorithms when accessing relational databases located at different Cloud providers.

Software

University of Stuttgart
Universitätsstr. 38
70569 Stuttgart
Germany
Phone +49-711-685 88337
Fax +49-711-685 88472
Research
Santiago Gómez Sáez, Vasilios Andrikopoulos, Frank Leymann, and Steve Strauch
Institute of Architecture of Application Systems
{gomez-saez, andrikopoulos, leymann, strauch}@iaas.uni-stuttgart.de
Evaluating Caching Strategies
for Cloud Data Access using an
Enterprise Service Bus
IEEE IC2E 2014

Research
© Santiago Gómez Sáez 2
Agenda
 Motivating Scenario
 CDASMix Architecture & Realization
 Evaluation
 Conclusion and Future Work

33
Research
© Santiago Gómez Sáez
Motivating Scenario
Presentation
Layer
Application
Business
Layer
SQL
Data Access
Layer
SQL
Data Access LayerCloud-Enabled Data Access Layer
SQL
Registry
Public Cloud Public CloudTraditional
Application Layers
Deployment
Models
Assumptions
 Database layer has already been
migrated
 Focus on Relational Databases

44
Research
© Santiago Gómez Sáez
CDASMix - Architecture
Presentation
Business
Logic
Resources
Web Service API
Configuration Registry Manager
Tenant Registry Manager
Service Registry Manager
JBI Container Manager
Service Assembly Manager
Service Registry
Database Cluster
Configuration
Registry Database
JBI Container
Instance Cluster
Access Layer
Web UI
Tenant Registry
Database
Message Broker
(1) Strauch et al.: Transparent Access to Relational Databases in the Cloud Using a Multi-tenant ESB. CLOSER’14
(2) ESBMT Project: www.iaas.uni-stuttgart.de/esbmt/

55
Research
© Santiago Gómez Sáez
CDASMix – Cloud Data Access ESB Instance
OSGi Environment
JBI Environment
Standardized Interfaces for Service Engines
Standardized Interfaces for Binding Components
Normalized Message Router
External
Application
SMX-Camel
-mt
MySQL
Proxy
SMX-
Camel
Camel
cdasmixJDBC
Backend Cloud Data Store Provider
Legend
Message Flow
OSGi Component
JBI Component
NMR API
Cache Cluster
Instance 1Instance 1Instance 1
• Ehcache 2.6.0
• LRU, LFU & FIFO
• Multi-tenancy Awareness

66
Research
© Santiago Gómez Sáez
Evaluation – Methodology & Data Set
 Measure how caching mitigates the performance degradation
when accessing data through CDASMix
 Analyze the optimal cache eviction algorithm (in tandem with the
MySQL instances)
 Cache Hit rate in % and throughput in Req./s
 TPC-H 1 GB data distributed in 8 tables
 Discrete uniform (1/N) generated workload from 5 adapted TPC-
H queries -> read intensive (2.5 MB per query) constituted by 100
queries from initial sample of 9 queries
 Generated Load publicly available at
https://santiago.studiforge.informatik.uni-
stuttgart.de/svn/publications/IC2E14/queries4Load/generatedLoad 5-100.csv

77
Research
© Santiago Gómez Sáez
Evaluation Setup
VM0 (Flexiscale)
Apache JMeter
2.9
CDASMix
MySQL 5.1
TPC-H
Amazon RDS
MySQL 5.1 instance
VM1 (Amazon EC2)
MySQL 5.1
D1D2 D3
E3 E2 E1
Legend
Message Flow
Measurement Point
Throughput and
Transfer Rate
Built-in Cache
E
TPC-H
TPC-H
QueryGen.shload.csv
MySQL & Ehcache cache size 16MB

88
Research
© Santiago Gómez Sáez
Evaluation – MySQL in IaaS Flexiscale & AWS EC2
Flexiscale AWS EC2
-51%
-14%
-10%
+17%
-39%
+21%
+30%
+16%

99
Research
© Santiago Gómez Sáez
Evaluation – MySQL in IaaS Flexiscale & AWS EC2
Flexiscale AWS EC2
-51%
+43%
+46%
+42%
-39%
+50%
+53%
+48%

1010
Research
© Santiago Gómez Sáez
Evaluation – MySQL in AWS RDS
-93%
-89% -89%
-89%

1111
Research
© Santiago Gómez Sáez
Evaluation – MySQL in AWS RDS
-93%
+37% +37%
+37%

1212
Research
© Santiago Gómez Sáez
Evaluation – CDASMix Cache Hit Ratio
Flexiscale AWS EC2 AWS RDS

1313
Research
© Santiago Gómez Sáez
Conclusion & Future Work
 Design and realization of CDASMix, a multi-tenant aware ESB
solution that enables transparent data access
 Caching support for ameliorating the performance
 Evaluation based on
 different database deployment scenarios
 the utilization of different caching eviction algorithms
 Extend CDASMix towards supporting PostgreSQL
 CDASMix horizontal scalability & distributed caching
 Further evaluation
 + Caching Eviction Algorithms
 Different workloads

Machine learning, big data, and simulation challenges have led to a proliferation of computing hardware and software solutions. Hyperscale data centers, accelerators, and programmable logic can deliver enormous performance via a wide range of analytic environments and data storage technologies. Apache Accumulo is a unique technology with the potential to enable all of these fields. Effectively exploiting Accumulo in these fields requires mathematically rigorous interfaces that allow users to focus on their domains. Mathematically rigorous interfaces are at the core MIT Lincoln Laboratory Supercomputing Center (LLSC) and enable the LLSC to deliver Apache Accumulo o thousands of scientists and engineers. This talk discusses the rapidly evolving computing landscape and how mathematically rigorous interfaces are the key to exploiting Apache Accumulo's advanced capabilities. – Speaker – Jeremy Kepner Fellow, MIT Dr. Jeremy Kepner is a MIT Lincoln Laboratory Fellow. He founded the Lincoln Laboratory Supercomputing Center and pioneered the establishment of the Massachusetts Green High Performance Computing Center. He has developed novel big data and parallel computing software used by thousands of scientists and engineers worldwide. He has led several embedded computing efforts, which earned him a 2011 R&D 100 Award. Dr. Kepner has chaired SIAM Data Mining, the IEEE Big Data conference, and the IEEE High Performance Extreme Computing conference. Dr. Kepner is the author of two bestselling books, Parallel MATLAB and Graph Algorithms in the Language of Linear Algebra. His peer-reviewed publications include works on abstract algebra, astronomy, cloud computing, cybersecurity, data mining, databases, graph algorithms, health sciences, signal processing, and visualization. Dr. Kepner holds a BA degree in astrophysics from Pomona College and a PhD degree in astrophysics from Princeton University. — More Information — For more information see http://www.accumulosummit.com/

NASA_EPSCoR_poster_2015Longyin Cui

ieeeprojectsvadapalani

Reliable and confidential cloud storage with efficient data forwarding functi...

ieeepondy

BeeGFS Enterprise Deployment

Dirk Petersen

BeeGFS - Dealing with Extreme Requirements in HPC

inside-BigData.com

In this deck from the HPC User Forum, Frank Herold from ThinkParQ presents: BeeGFS - Dealing with Extreme Requirements in HPC. BeeGFS is a pure software solution for scale-out parallel network-accessible storage, developed with a strong focus on performance and designed for very easy installation and management. "ThinkParQ is the company behind the popular parallel cluster file system BeeGFS. We address the complex requirement of managing large amounts of data with BeeGFS, an easy to use and highly scalable software-defined storage solution, on-premise and in the cloud. Next to helping our world-wide BeeGFS customers get the best out of their systems and consulting, we continuously drive the development and optimization of BeeGFS for today’s and tomorrow’s performance-critical systems to its next level. Together with our partners around the globe, system integrators and solution providers, we work hard to create the fastest, most stable and most flexible turn-key solutions for every performance-oriented environment. Therefore, you can find BeeGFS powered solutions not only in HPC, but also in all other markets where performance matters: Life Sciences, Artificial Intelligence, Finance, Oil & Gas, Media & Entertainment, and Automotive." Watch the video: https://wp.me/p3RLHQ-kaF Learn more: https://www.beegfs.io/content/ and http://hpcuserforum.com Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

CloudLightning and the OPM-based Use Case

CloudLightning

Applying Cloud Techniques to Address Complexity in HPC System Integrations

inside-BigData.com

In this video from the HPC User Forum at Argonne, Arno Kolster from Providentia Worldwide presents: Applying Cloud Techniques to Address Complexity in HPC System Integrations. "The Oak Ridge Leadership Computing Facility (OLCF) and technology consulting company Providentia Worldwide recently collaborated to develop an intelligence system that combines real-time updates from the IBM AC922 Summit supercomputer with local weather and operational data from its adjacent cooling plant, with the goal of optimizing Summit’s energy efficiency. The OLCF proposed the idea and provided facility data, and Providentia developed a scalable platform to integrate and analyze the data." Watch the video: https://wp.me/p3RLHQ-kOg Learn more: http://www.providentiaworldwide.com/ and http://hpcuserforum.com Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Motivations of this work: • Efficient exploitation of heterogeneous hardware for deep learning and higher order physics applications. • To achieve high efficiency in real time. • Intelligent resource management with the goals of reducing inter-application interference and intra-application contention. • Enhance task scheduling with the knowledge of workload and resource requirement. • Adapting to resource availability and the underlying hardware topology.

Hpc Cloud project Overview

Floris Sluiter

With the HPC Cloud facility, SURFsara offers self-service, dynamically scalable and fully configurable HPC systems to the Dutch academic community. Users have, for example, a free choice of operating system and software. The HPC Cloud offers full control over a HPC cluster, with fast CPUs and high memory nodes and it is possible to attach terabytes of local storage to a compute node. Because of this flexibility, users can fully tailor the system for a particular application. Long-running and small compute jobs are equally welcome. Additionally, the system facilitates collaboration: users can share control over their virtual private HPC cluster with other users and share processing time, data and results. A portal with wiki, fora, repositories, issue system, etc. is offered for collaboration projects as well.

HybridAzureCloudChris Condo

An introduction to Workload Modelling for Cloud Applications

Ravi Yogesh

An optimized scientific workflow scheduling in cloud computing

DIGVIJAY SHINDE

Analyse de sécurité de bout en bout avec la Suite Elastic

Elasticsearch

Compose hardware resources on the fly with openstack valence

Shuquan Huang

In the face of ever-growing data processing requirements, existing data center infrastructures struggle to deliver on flexibility and TCO expectations. Intel Rack Scale Design Technology is to evolve the data center design methodology from physical resource aggregation to resource pools and eventually target for future’s service aware orchestration. Meanwhile, as a ubiquitous cloud OS that able to control/orchestrate large pools of compute, storage and network resources, OpenStack can take advantages of it to deliver more optimized flexibility and performance against cost in areas such as deep learning, big data, telecom, etc. Valence is a new OpenStack project announced in Barcelona Summit which is a collection of all things about RSD for OpenStack. In this session, we’ll elaborate valence’s mission, user cases and status. And then, we’ll discuss how other projects can leverage RSD by integration with valence. Finally, we'll show some live demos about how valence works right now.

Enabling Efficient and Geometric Range Query with Access Control over Encrypt...

JAYAPRAKASH JPINFOTECH

Presentation fyp1automationreplicationinopenstack

athiqah

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Alluxio, Inc.

Build bare metal kubernetes cluster for hpc on open stack in translational me...

Shuquan Huang

With the medicine technology advances, it is possible to create a patient profile by a full genomic data along with vast amount of patient data from personal fitness devices, medical record, etc. Translational medicine is aim at moving faster from research to patient care by integrating the data across traditional silos which requires a cloud to support multi-tenancy, self-service, big data analysis and HPC workload. OpenStack are good at virtual machine management, while we use VM to serve as a container host which brings virtualization overhead in overlay networking, I/O operations, etc. and it won’t get benefit from some advanced features. In this session, we'll share: How to build an OpenStack cloud for HPC with multi-tenancy and self-services; How to build a bare metal kubernetes cluster with OpenStack Magnum and Ironic; The issues we are facing and how we overcome them; The gaps remaining and what does the community need to improve in the future; More thinking and new ideas.

IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...

InfluxData

The presentation introduces a Google Cloud native architecture for collecting, processing, analyzing and archiving of events from IoT devices, vehicles as well as upstream software systems. InfluxDB and its connection to global native Google Cloud services like BigQuery or Cloud Machine Learning Engine as well as Kubernetes is at the center of the architecture. The architecture demonstrates how access to global scaling cloud services addresses use cases from the Energy Sector.

Distributed, concurrent, and independent access to encrypted cloud databases

Papitha Velumani

Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes

Alluxio, Inc.

DATACUBES: Conquering Space & Time

plan4all

distributed, concurrent, and independent access to encrypted cloud databases

swathi78

The Past, Present, and Future of OpenACC

inside-BigData.com

In this deck from the University of Houston CACDS HPC Workshop, Jeff Larkin from Nvidia presents: The Past, Present, and Future of OpenACC. "OpenACC is an open specification for programming accelerators with compiler directives. It aims to provide a simple path for accelerating existing applications for a wide range of devices in a performance portable way. This talk with discuss the history and goals of OpenACC, how it is being used today, and what challenges it will address in the future." Watch the video presentation: http://wp.me/p3RLHQ-dTm

Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...

Eran Chinthaka Withana

Performance_and_Cost_Evaluation

Santiago Gómez Sáez

The success of the Cloud computing paradigm, together with the increase of Cloud providers and optimized Infrastructure-as-a-Service (IaaS) offerings have contributed to a raise in the number of research and industry communities that are strong supporters of migrating and running their applications in the Cloud. Focusing on eScience simulation-based applications, scientific workflows have been widely adopted in the last years, and the scientific workflow management systems have become strong candidates for being migrated to the Cloud. In this research work we aim at empirically evaluating multiple Cloud providers and their corresponding optimized and non-optimized IaaS offerings with respect to their offered performance, and its impact on the incurred monetary costs when migrating and executing a workflow-based simulation environment. The experiments show significant performance improvements and reduced monetary costs when executing the simulation environment in off-premise Clouds.

Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...

Joachim Schlosser

In einer Gesellschaft, in der das Sammeln von personenbezogenen Daten mittlerweile alltäglich geworden ist, ist es nicht weiter verwunderlich, dass auch der innovative Maschinenbauer Daten sammelt, wo er nur kann. Produktdaten, Maschinendaten, Statistikdaten – in einer durchschnittlichen Produktionsanlage fallen bereits heute jeden Tag Gigabytes an Daten an. „Big Data“ wurde eines der Schlagworte der Industrie 4.0. Doch was verspricht man sich davon? Welche Information steckt in den aufgezeichneten Maschinen- und Produktdaten? Und wie erfolgt die Auswertung? Im Rahmen des Vortrags wird aufgezeigt, wie Unternehmen auf Basis einer etablierten Plattform wie MATLAB® ihre Auswertealgorithmen entwickeln, testen und ausrollen können. Die kontinuierliche Auswertung selbst erfolgt dann wahlweise auf einem Anlagenserver oder aber auch in Echtzeit direkt an der Maschine. Veranschaulicht wird dies anhand von Beispielen aus der Praxis. Doch neben der gesammelten Daten kommt auch den Steuerungseinheiten in der Produktion in der Industrie 4.0 eine größere Bedeutung zu. Wenn Werkstücke demnächst selbst wissen, wo sie im Produktionsablauf hin möchten und welcher Verarbeitungsschritt ihnen angedeihen soll, dann bedeutet das auch für die einzelnen Komponenten und Module in Produktion und Logistik ein mehr an Funktionalität, da sie auf diese Eingaben ebenfalls reagieren sollen. Wie stellen Sie sicher, dass diese zusätzliche Funktionalität nicht zu Lasten der Energiebilanz gehen? Wie fahren Sie die Motoren und anderen aktiven Komponenten Ihrer Fertigung so, dass sie flexibel auf veränderte Routen der Werkstücke reagieren und dennoch im optimalen Bereich fahren? Mehr denn je brauchen Sie gesteuerte und geregelte Komponenten und Module. Das sollte schon seit Industrie 3.0 vorhanden sein, jedoch ist auch hier noch viel ganz konkretes Potential zur Steigerung von Produktivität und Einsparung von Energie und Produktionszeit vorhanden. Sie sehen im Vortrag, wie Sie ihre Komponenten besser beschalten, dass die vernetzten dynamischen Anforderungen von Industrie 4.0 lokal effizient umgesetzt werden können.

Tool-Driven Technology Transfer in Software Engineering

Heiko Koziolek

This talk presentst the tool-driven technology transfer process ABB Corporate Research applies in selected software engineering University collaborations. As an example, we have created an add-in to a popular UML tool and developed the tooling in close interaction with the target users. Centering the technology transfer around tool implementations brings many benefits such as the need to make conceptual contributions applicable and the ability to quickly benefit from the new concepts. A challenge to this form of technology transfer is the long-term commitment to the maintenance of the tooling, which we try to address by creating an open developer community. Tool-driven technology transfer projects have proven to be valuable a instrument of bringing advanced software engineering technologies into our organization.

Webinar: Cutting Time, Complexity and Cost from Data Science to Production

iguazio

Imagine a system where one collects real-time data, develops a machine learning model… Runs analysis and training on powerful GPUs… Clicks on a magic button and then deploys code and ML models to production… All without any heavy lifting from data and DevOps engineers. Today, data scientists work on laptops with just a subset of data and time is wasted while waiting for data and compute. It’s about efficient use of time! Join Iguazio and NVIDIA so that you can get home early today! Learn how to speed up data science from development to production: - Access to large scale, real-time and operational data without waiting for ETL - Run high performance analytics and ML on NVIDIA GPUs (Rapids) - Work on a shared, pre-integrated Kubernetes cluster with - - Jupyter notebook and leading data science tools - One-click (really!) deployment to production Speakers: Yaron Haviv, CTO at Iguazio, Or Zilberman, Data Scientist at Iguazio and Jacci Cenci, Sr. Technical Marketing Engineer at NVIDIA

What's hot

ACACES 2019: Towards Energy Efficient Deep Learning

LEGATO project

Hpc Cloud project Overview

Floris Sluiter

HybridAzureCloudChris Condo

An introduction to Workload Modelling for Cloud Applications

Ravi Yogesh

An optimized scientific workflow scheduling in cloud computing

DIGVIJAY SHINDE

Analyse de sécurité de bout en bout avec la Suite Elastic

Elasticsearch

Compose hardware resources on the fly with openstack valence

Shuquan Huang

Enabling Efficient and Geometric Range Query with Access Control over Encrypt...

JAYAPRAKASH JPINFOTECH

Presentation fyp1automationreplicationinopenstack

athiqah

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Alluxio, Inc.

Build bare metal kubernetes cluster for hpc on open stack in translational me...

Shuquan Huang

IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...

InfluxData

Distributed, concurrent, and independent access to encrypted cloud databases

Papitha Velumani

Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes

Alluxio, Inc.

DATACUBES: Conquering Space & Time

plan4all

distributed, concurrent, and independent access to encrypted cloud databases

swathi78

The Past, Present, and Future of OpenACC

inside-BigData.com

Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...

Eran Chinthaka Withana

What's hot (18)

ACACES 2019: Towards Energy Efficient Deep Learning

Hpc Cloud project Overview

HybridAzureCloud

An introduction to Workload Modelling for Cloud Applications

An optimized scientific workflow scheduling in cloud computing

Analyse de sécurité de bout en bout avec la Suite Elastic

Compose hardware resources on the fly with openstack valence

Enabling Efficient and Geometric Range Query with Access Control over Encrypt...

Presentation fyp1automationreplicationinopenstack

Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio

Build bare metal kubernetes cluster for hpc on open stack in translational me...

IoT Event Processing and Analytics with InfluxDB in Google Cloud | Christoph ...

Distributed, concurrent, and independent access to encrypted cloud databases

Deep Learning and Gene Computing Acceleration with Alluxio in Kubernetes

DATACUBES: Conquering Space & Time

distributed, concurrent, and independent access to encrypted cloud databases

The Past, Present, and Future of OpenACC

Towards Enabling Mid-Scale Geo-Science Experiments Through Microsoft Trident ...

Similar to Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus

Performance_and_Cost_Evaluation

Santiago Gómez Sáez

Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...

Joachim Schlosser

Tool-Driven Technology Transfer in Software Engineering

Heiko Koziolek

Webinar: Cutting Time, Complexity and Cost from Data Science to Production

iguazio

Processing Large Datasets for ADAS Applications using Apache Spark

Databricks

Semantic segmentation is the classification of every pixel in an image/video. The segmentation partitions a digital image into multiple objects to simplify/change the representation of the image into something that is more meaningful and easier to analyze [1][2]. The technique has a wide variety of applications ranging from perception in autonomous driving scenarios to cancer cell segmentation for medical diagnosis. Exponential growth in the datasets that require such segmentation is driven by improvements in the accuracy and quality of the sensors generating the data extending to 3D point cloud data. This growth is further compounded by exponential advances in cloud technologies enabling the storage and compute available for such applications. The need for semantically segmented datasets is a key requirement to improve the accuracy of inference engines that are built upon them. Streamlining the accuracy and efficiency of these systems directly affects the value of the business outcome for organizations that are developing such functionalities as a part of their AI strategy. This presentation details workflows for labeling, preprocessing, modeling, and evaluating performance/accuracy. Scientists and engineers leverage domain-specific features/tools that support the entire workflow from labeling the ground truth, handling data from a wide variety of sources/formats, developing models and finally deploying these models. Users can scale their deployments optimally on GPU-based cloud infrastructure to build accelerated training and inference pipelines while working with big datasets. These environments are optimized for engineers to develop such functionality with ease and then scale against large datasets with Spark-based clusters on the cloud.

IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...

IEEEGLOBALSOFTSTUDENTPROJECTS

2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...

IEEEFINALSEMSTUDENTPROJECTS

Design_Support_Cloud_Application_Redistribution

Santiago Gómez Sáez

The Cloud computing paradigm emerged by establishing innovative resources provisioning and consumption models. Together with the improvement of resource management techniques, these models have contributed to an increase in the number of application developers that are strong supporters of partially or completely migrating their application to a highly scalable and pay-per-use infrastructure. However, due to the continuous growth of Cloud providers and Cloud offerings, Cloud application developers nowadays must face additional application design challenges related to the efficient selection of such offerings to optimally distribute the application in a Cloud infrastructure. Focusing on the performance aspects of the application, additional challenges arise, as application workloads fluctuate over time, and therefore produce a variation of the infrastructure resources demands. In this research work we aim to define and realize the underpinning concepts towards supporting the optimal (re-)distribution of an application in the Cloud in order to handle fluctuating over time workloads.

Dynamic_Cloud_Application_Redistribution_Performance_Optimization

Santiago Gómez Sáez

The Cloud computing paradigm emerged by establishing new resources provisioning and consumption models. Together with the improvement of resource management techniques, these models have contributed to an increase in the number of application developers that are strong supporters of partially or completely migrating their application to a highly scalable and pay-per-use infrastructure. In this paper we derive a set of functional and non-functional requirements and propose a process-based approach to support the optimal distribution of an application in the Cloud in order to handle fluctuating over time workloads. Using the TPC-H workload as the basis, and by means of empirical workload analysis and characterization, we evaluate the application persistence layer's performance under different deployment scenarios using generated workloads with particular behavior characteristics.

Privacy preserving public auditing for regenerating code based cloud storage

kitechsolutions

AIST Super Green Cloud: lessons learned from the operation and the performanc...

Ryousei Takano

Managing and Deploying High Performance Computing Clusters using Windows HPC ...

Saptak Sen

The new management features built into Windows HPC Server 2008 R2 are the foundation for deploying and managing HPC clusters of scale up to 1000 nodes. Join us for a deep dive in monitoring and diagnostic tools, a review of the updated heat-map and template-based deployment. We also cover the new PowerShell-based scripting capabilities: the basics of management shell, as well as the underlying design and key concepts, new Reporting Capabilities, and a discussion on network boot.

OS for AI: Elastic Microservices & the Next Gen of ML

Nordic APIs

AI has been a hot topic lately, with advances being made constantly in what is possible, there has not been as much discussion of the infrastructure and scaling challenges that come with it. How do you support dozens of different languages and frameworks, and make them interoperate invisibly? How do you scale to run abstract code from thousands of different developers, simultaneously and elastically, while maintaining less than 15ms of overhead? At Algorithmia, we’ve built, deployed, and scaled thousands of algorithms and machine learning models, using every kind of framework (from scikit-learn to tensorflow). We’ve seen many of the challenges faced in this area, and in this talk I’ll share some insights into the problems you’re likely to face, and how to approach solving them. In brief, we’ll examine the need for, and implementations of, a complete “Operating System for AI” – a common interface for different algorithms to be used and combined, and a general architecture for serverless machine learning which is discoverable, versioned, scalable and sharable.

Providing user security guarantees in public infrastructure clouds

Finalyearprojects Toall

BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform

Big Data Week

Path to continuous delivery

Anirudh Bhatnagar

Why AIOps Matters For Kubernetes

Timothy Chen

MongoDB World 2018: MongoDB for High Volume Time Series Data Streams

MongoDB

Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...

Jason Dai

Foundstone scq cypherpath

Learn24x7

Similar to Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus (20)

Performance_and_Cost_Evaluation

Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...

Tool-Driven Technology Transfer in Software Engineering

Webinar: Cutting Time, Complexity and Cost from Data Science to Production

Processing Large Datasets for ADAS Applications using Apache Spark

IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Adaptive algorithm for minimizing clo...

2014 IEEE JAVA CLOUD COMPUTING PROJECT Adaptive algorithm for minimizing clou...

Design_Support_Cloud_Application_Redistribution

Dynamic_Cloud_Application_Redistribution_Performance_Optimization

Privacy preserving public auditing for regenerating code based cloud storage

AIST Super Green Cloud: lessons learned from the operation and the performanc...

Managing and Deploying High Performance Computing Clusters using Windows HPC ...

OS for AI: Elastic Microservices & the Next Gen of ML

Providing user security guarantees in public infrastructure clouds

BDW16 London - William Vambenepe, Google - 3rd Generation Data Platform

Path to continuous delivery

Why AIOps Matters For Kubernetes

MongoDB World 2018: MongoDB for High Volume Time Series Data Streams

Automated ML Workflow for Distributed Big Data Using Analytics Zoo (CVPR2020 ...

Foundstone scq cypherpath

Recently uploaded

Into the Box 2024 - Keynote Day 2 Slides.pdf

Ortus Solutions, Corp

Accelerate Enterprise Software Engineering with Platformless

WSO2

Key takeaways: Challenges of building platforms and the benefits of platformless. Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience. How Choreo enables the platformless experience. How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo. Demo of an end-to-end app built and deployed on Choreo.

Quarkus Hidden and Forbidden Extensions

Max Andersen

Vitthal Shirke Microservices Resume Montevideo

Vitthal Shirke

Globus Compute wth IRI Workflows - GlobusWorld 2024

Globus

As part of the DOE Integrated Research Infrastructure (IRI) program, NERSC at Lawrence Berkeley National Lab and ALCF at Argonne National Lab are working closely with General Atomics on accelerating the computing requirements of the DIII-D experiment. As part of the work the team is investigating ways to speedup the time to solution for many different parts of the DIII-D workflow including how they run jobs on HPC systems. One of these routes is looking at Globus Compute as a way to replace the current method for managing tasks and we describe a brief proof of concept showing how Globus Compute could help to schedule jobs and be a tool to connect compute at different facilities.

Advanced Flow Concepts Every Developer Should Know

Peter Caitens

Why React Native as a Strategic Advantage for Startup Innovation.pdf

ayushiqss

Do you know that React Native is being increasingly adopted by startups as well as big companies in the mobile app development industry? Big names like Facebook, Instagram, and Pinterest have already integrated this robust open-source framework. In fact, according to a report by Statista, the number of React Native developers has been steadily increasing over the years, reaching an estimated 1.9 million by the end of 2024. This means that the demand for this framework in the job market has been growing making it a valuable skill. But what makes React Native so popular for mobile application development? It offers excellent cross-platform capabilities among other benefits. This way, with React Native, developers can write code once and run it on both iOS and Android devices thus saving time and resources leading to shorter development cycles hence faster time-to-market for your app. Let’s take the example of a startup, which wanted to release their app on both iOS and Android at once. Through the use of React Native they managed to create an app and bring it into the market within a very short period. This helped them gain an advantage over their competitors because they had access to a large user base who were able to generate revenue quickly for them.

2024 RoOUG Security model for the cloud.pptx

Georgi Kodinov

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...

Globus

The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf

AMB-Review

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos https://www.amb-review.com/tubetrivia-ai Exclusive Features: AI-Powered Questions, Wide Range of Categories, Adaptive Difficulty, User-Friendly Interface, Multiplayer Mode, Regular Updates. #TubeTriviaAI #QuizVideoMagic #ViralQuizVideos #AIQuizGenerator #EngageExciteExplode #MarketingRevolution #BoostYourTraffic #SocialMediaSuccess #AIContentCreation #UnlimitedTraffic

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Globus

Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.

Enhancing Research Orchestration Capabilities at ORNL.pdf

Globus

Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.

BoxLang: Review our Visionary Licenses of 2024

Ortus Solutions, Corp

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Juraj Vysvader

Explore Modern SharePoint Templates for 2024

Sharepoint Designs

Strategies for Successful Data Migration Tools.pptx

varshanayak241

Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.

Corporate Management | Session 3 of 3 | Tendenci AMS

Tendenci - The Open Source AMS (Association Management Software)

Experience our free, in-depth three-part Tendenci Platform Corporate Membership Management workshop series! In Session 1 on May 14th, 2024, we began with an Introduction and Setup, mastering the configuration of your Corporate Membership Module settings to establish membership types, applications, and more. Then, on May 16th, 2024, in Session 2, we focused on binding individual members to a Corporate Membership and Corporate Reps, teaching you how to add individual members and assign Corporate Representatives to manage dues, renewals, and associated members. Finally, on May 28th, 2024, in Session 3, we covered questions and concerns, addressing any queries or issues you may have. For more Tendenci AMS events, check out www.tendenci.com/events

Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...

informapgpstrackings

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

WSO2

A Comprehensive Look at Generative AI in Retail App Testing.pdf

kalichargn70th171

Recently uploaded (20)

Into the Box 2024 - Keynote Day 2 Slides.pdf

Accelerate Enterprise Software Engineering with Platformless

Quarkus Hidden and Forbidden Extensions

Vitthal Shirke Microservices Resume Montevideo

Globus Compute wth IRI Workflows - GlobusWorld 2024

Advanced Flow Concepts Every Developer Should Know

Why React Native as a Strategic Advantage for Startup Innovation.pdf

2024 RoOUG Security model for the cloud.pptx

Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...

Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf

Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...

Enhancing Research Orchestration Capabilities at ORNL.pdf

BoxLang: Review our Visionary Licenses of 2024

In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...

Explore Modern SharePoint Templates for 2024

Strategies for Successful Data Migration Tools.pptx

Corporate Management | Session 3 of 3 | Tendenci AMS

Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

A Comprehensive Look at Generative AI in Retail App Testing.pdf

Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus

1. University of Stuttgart Universitätsstr. 38 70569 Stuttgart Germany Phone +49-711-685 88337 Fax +49-711-685 88472 Research Santiago Gómez Sáez, Vasilios Andrikopoulos, Frank Leymann, and Steve Strauch Institute of Architecture of Application Systems {gomez-saez, andrikopoulos, leymann, strauch}@iaas.uni-stuttgart.de Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus IEEE IC2E 2014

2. Research © Santiago Gómez Sáez 2 Agenda  Motivating Scenario  CDASMix Architecture & Realization  Evaluation  Conclusion and Future Work

3. 33 Research © Santiago Gómez Sáez Motivating Scenario Presentation Layer Application Business Layer SQL Data Access Layer SQL Data Access LayerCloud-Enabled Data Access Layer SQL Registry Public Cloud Public CloudTraditional Application Layers Deployment Models Assumptions  Database layer has already been migrated  Focus on Relational Databases

4. 44 Research © Santiago Gómez Sáez CDASMix - Architecture Presentation Business Logic Resources Web Service API Configuration Registry Manager Tenant Registry Manager Service Registry Manager JBI Container Manager Service Assembly Manager Service Registry Database Cluster Configuration Registry Database JBI Container Instance Cluster Access Layer Web UI Tenant Registry Database Message Broker (1) Strauch et al.: Transparent Access to Relational Databases in the Cloud Using a Multi-tenant ESB. CLOSER’14 (2) ESBMT Project: www.iaas.uni-stuttgart.de/esbmt/

5. 55 Research © Santiago Gómez Sáez CDASMix – Cloud Data Access ESB Instance OSGi Environment JBI Environment Standardized Interfaces for Service Engines Standardized Interfaces for Binding Components Normalized Message Router External Application SMX-Camel -mt MySQL Proxy SMX- Camel Camel cdasmixJDBC Backend Cloud Data Store Provider Legend Message Flow OSGi Component JBI Component NMR API Cache Cluster Instance 1Instance 1Instance 1 • Ehcache 2.6.0 • LRU, LFU & FIFO • Multi-tenancy Awareness

6. 66 Research © Santiago Gómez Sáez Evaluation – Methodology & Data Set  Measure how caching mitigates the performance degradation when accessing data through CDASMix  Analyze the optimal cache eviction algorithm (in tandem with the MySQL instances)  Cache Hit rate in % and throughput in Req./s  TPC-H 1 GB data distributed in 8 tables  Discrete uniform (1/N) generated workload from 5 adapted TPC- H queries -> read intensive (2.5 MB per query) constituted by 100 queries from initial sample of 9 queries  Generated Load publicly available at https://santiago.studiforge.informatik.uni- stuttgart.de/svn/publications/IC2E14/queries4Load/generatedLoad 5-100.csv

7. 77 Research © Santiago Gómez Sáez Evaluation Setup VM0 (Flexiscale) Apache JMeter 2.9 CDASMix MySQL 5.1 TPC-H Amazon RDS MySQL 5.1 instance VM1 (Amazon EC2) MySQL 5.1 D1D2 D3 E3 E2 E1 Legend Message Flow Measurement Point Throughput and Transfer Rate Built-in Cache E TPC-H TPC-H QueryGen.shload.csv MySQL & Ehcache cache size 16MB

8. 88 Research © Santiago Gómez Sáez Evaluation – MySQL in IaaS Flexiscale & AWS EC2 Flexiscale AWS EC2 -51% -14% -10% +17% -39% +21% +30% +16%

9. 99 Research © Santiago Gómez Sáez Evaluation – MySQL in IaaS Flexiscale & AWS EC2 Flexiscale AWS EC2 -51% +43% +46% +42% -39% +50% +53% +48%

13. 1313 Research © Santiago Gómez Sáez Conclusion & Future Work  Design and realization of CDASMix, a multi-tenant aware ESB solution that enables transparent data access  Caching support for ameliorating the performance  Evaluation based on  different database deployment scenarios  the utilization of different caching eviction algorithms  Extend CDASMix towards supporting PostgreSQL  CDASMix horizontal scalability & distributed caching  Further evaluation  + Caching Eviction Algorithms  Different workloads

14. 14 Thanks for your attention!!

Editor's Notes

1- In the last years Cloud computing has become popular among IT organizations aiming to reduce its operational costs 2- Applications can be designed to run in the Cloud, or can be partially or completely migrated to the Cloud. 3- Focusing on the three layered application pattern, in other works we have focused on migrating the application data to the Cloud. Migrating the application data to the Cloud requires adaptations, e.g. rewiring to access the migrated to the Cloud databases. 4- In this work we target how to mitigate the performance degradation due to accessing the migrated to the Cloud data through CDASMix
Contributions of this work: The design and realization of CDASMix, a multi-tenant aware ESB solution with caching support that enables transparent data access to databases both on-premise and off-premise.Design and realization of CDASMix, a multi-tenant aware ESB solution that enables transparent access to databases hosted on or off-premise A performance evaluation of our proposal, with the dual purpose of showing the impact of introducing CDASMix to the performance of the application, and identifying the optimal caching strategy for CDASMix for different deployment options across Cloud service providers. A set of initial findings stemming from this evaluation, that can be valuable for related efforts
1- Focus on the three layered application pattern proposed by Fowler 2- Data layer is subdivided into the data access layer and the database layer 3.1- Consider an application whose stack is completely hosted on-premise. 3.2- Data is partially or completely migrated to the Cloud, e.g. to Amazon RDS 3.3- The data Access layer must be adapted and rewired in order to access the migrated to the Cloud database 3.4- If we assume 3 different scenarios, e.g. data partially hosted between on-premise, and DBaaS or IaaS solutions, the data access layer must be aware of such locations towards retrieving the data from the different backend data sources. 3.5- Therefore, there is a need of a Cloud enabled data access layer able to redirect the data storage and retrieval requests to the different databases. 3.6- For example, being able to redirect SQL requests to data migrated to AWS RDS.
Presentation layer: Extended to provide a larger amount of operations not only for multi-tenant aware administration and management, but also to enable the registration of the necessary information for routing requests between multiple backend data sources. Business Logic Layer: encapsulates the business logic of the ESBmt administration and management. Have extended to incorporate cloud data access awareness. Access layer: based on role based access control. Tenant and users access the system with an unique tenant id and user id. Tenant Registry Manager, Configuration Registry Manager, and Service Registry Manager: wrap the interaction functionalities with the persistent resources, tenant registry, configuration registry, and service registry. JBI Container Manager and Service Assembly Managers contain the necessary functionalities to interact with the JBI Container Cluster for deployment and undeployment of message adapters and transformers. Resources Layer: encapsulates the persistancy resources, and the resources which are managed and administered through the upper layers. ESB Instance Cluster: multiple ESB instaces which perform the tasks associated with ESB solutions, e.g. message routing and transformation. Each ESB instances can be seen as three main components: Message adapters, message processors, and a normalized message router. Tenant Registry: contains information related to the tenants and users, e.g. id, email, etc. Configuration Registry: contains information related to the configuration of each tenant, e.g. tenant operator permissions, used jbi clusters, quota for message adapters, etc. Service Registry: tenant’s services in the ESB cluster, as well as the configuration of each message adapter deployed to the ESB instance. Message broker is an intermediate component for communicating with the ESB instances based on topic subscription.
- MySQL Proxy: OSGi and JBI compliant version of Java MySQL Proxy implementing native MySQL communication protocol, providing one endpoint - Caching: EhCache realizing Least Recently Used (LRU) caching policy and deleting cach records when SQL statements involve data modifications - NMR: enables integration of OSGi Proxy and NMR - SMX-Camel-mt: multi-tenancy, integration between JBI and Enterprise Integration Patterns provided by Apache Camel CamelcdasmixJDBC: dynamically connecting to backend data stores via corresponding database communication protocol JNDI: to register database connections in order to reduce latency when creating a database connection per user SMX-Camel: enables loading CamelcedasmixJDBC packages at runtime, e.g. updates for supporting a new backend data store or data service
MySQL query cache uses the an improved LRU eviction algorithm incorporating a midpoint insertion strategy. Following a temporal storage based on lists, the list is divided into the most recently accessed and the oldest values which are less recently used. With this approach, the list contains blocks which are the most recently used. TPC-H benchmark is a database decision support benchmar which comprises a set of queries with a high degree of complexity that run over a large volume of data. 9 Adapted queries which are distributed among a workload constituted by 100 queries distributed with probability 1/9 Average of 10 Rounds per scenario
RDS and EC2 m1.xlarge EC2 and a db.m1.xlarge instances Amazon instances in the EU zone.
Refer the RDS results in the paper First comparison is based on the performance degradation Second comparison is based on how the degraded performance is mitigated by introducing the cache. In previous papers we identified the network latency as approx 3 % of the total throughput.
Refer the RDS results in the paper First comparison is based on the performance degradation Second comparison is based on how the degraded performance is mitigated by introducing the cache.
Performs better in Tandem with the Optimized LRU caching strategy implemented in MySQL, as it relies on the LRU. The combination of both provide better performance results.

Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus

Similar to Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus (20)

Recently uploaded

Recently uploaded (20)

Evaluating Caching Strategies for Cloud Data Access using an Enterprise Service Bus

Editor's Notes