The presentation from a MSc seminar course at the University of Cyprus, on cloud computing, elasticity and the research interests of the CELAR project (http://www.celarcloud.eu).
CELAR Components Featured
=======================
- Cloud Application Management Framework (CAMF)
- JCatascopia Cloud Monitoring Framework
- ADVISE Cloud Elasticity Evalaution Framework
The document discusses various data reduction strategies including attribute subset selection, numerosity reduction, and dimensionality reduction. Attribute subset selection aims to select a minimal set of important attributes. Numerosity reduction techniques like regression, log-linear models, histograms, clustering, and sampling can reduce data volume by finding alternative representations like model parameters or cluster centroids. Dimensionality reduction techniques include discrete wavelet transformation and principal component analysis, which transform high-dimensional data into a lower-dimensional representation.
This document provides an overview of parametric and non-parametric supervised machine learning. Parametric learning uses a fixed number of parameters and makes strong assumptions about the data, while non-parametric learning uses a flexible number of parameters that grows with more data, making fewer assumptions. Common examples of parametric models include linear regression and logistic regression, while non-parametric examples include K-nearest neighbors, decision trees, and neural networks. The document also briefly discusses calculating parameters using ordinary least mean square for parametric models and the limitations when data does not follow predefined assumptions.
The document provides an overview of seamless MLOps using Seldon and MLflow. It discusses how MLOps is challenging due to the wide range of requirements across the ML lifecycle. MLflow helps with training by allowing experiment tracking and model versioning. Seldon Core helps with deployment by providing servers to containerize models and infrastructure for monitoring, A/B testing, and feedback. The demo shows training models with MLflow, deploying them to Seldon for A/B testing, and collecting feedback to optimize models.
MLOps - Getting Machine Learning Into ProductionMichael Pearce
Creating autonomy and self-sufficiency by giving people what they need in order to do the things they need to do! What gets in the way, and how can we overcome those barriers? How do we get started quickly, effectively and safely? We'll come together to look at what MLOps entails, some of the tools available and what common MLOps pipelines look like.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination of both will bring the most value.
Discover the origins of big data, discuss existing and new projects, share common use cases for those projects, and explain how you can modernize your architecture using data analytics, data operations, data engineering and data science.
Big Data Fundamentals is your prerequisite to building a modern platform for machine learning and analytics optimized for the cloud.
We’ll close out with a live Q&A with some of our technical experts as well.
Stretch your brain with a packed agenda:
Open source software
Data storage
Data ingestion
Data analytics
Data engineering
IoT and life after Lambda architectures
Data science
Cybersecurity
Cluster management
Big data in the cloud
Success stories
This document summarizes different methods for time series analysis and prediction in the deep learning era. It discusses classical autoregressive and Bayesian models, general machine learning approaches, and various deep learning techniques including DeepAR, Deep Ensembles, Deep State Space models, and combinations of deep neural networks with Gaussian processes. The document compares the pros and cons of each approach in terms of scalability, ability to share information across time series, handling cold starts with limited data, estimating predictive uncertainty, and dealing with unevenly spaced time series data.
A VERY high level over view of Graph Analytics concepts and techniques, including structural analytics, Connectivity Analytics, Community Analytics, Path Analytics, as well as Pattern Matching
The document discusses various data reduction strategies including attribute subset selection, numerosity reduction, and dimensionality reduction. Attribute subset selection aims to select a minimal set of important attributes. Numerosity reduction techniques like regression, log-linear models, histograms, clustering, and sampling can reduce data volume by finding alternative representations like model parameters or cluster centroids. Dimensionality reduction techniques include discrete wavelet transformation and principal component analysis, which transform high-dimensional data into a lower-dimensional representation.
This document provides an overview of parametric and non-parametric supervised machine learning. Parametric learning uses a fixed number of parameters and makes strong assumptions about the data, while non-parametric learning uses a flexible number of parameters that grows with more data, making fewer assumptions. Common examples of parametric models include linear regression and logistic regression, while non-parametric examples include K-nearest neighbors, decision trees, and neural networks. The document also briefly discusses calculating parameters using ordinary least mean square for parametric models and the limitations when data does not follow predefined assumptions.
The document provides an overview of seamless MLOps using Seldon and MLflow. It discusses how MLOps is challenging due to the wide range of requirements across the ML lifecycle. MLflow helps with training by allowing experiment tracking and model versioning. Seldon Core helps with deployment by providing servers to containerize models and infrastructure for monitoring, A/B testing, and feedback. The demo shows training models with MLflow, deploying them to Seldon for A/B testing, and collecting feedback to optimize models.
MLOps - Getting Machine Learning Into ProductionMichael Pearce
Creating autonomy and self-sufficiency by giving people what they need in order to do the things they need to do! What gets in the way, and how can we overcome those barriers? How do we get started quickly, effectively and safely? We'll come together to look at what MLOps entails, some of the tools available and what common MLOps pipelines look like.
Independent of the source of data, the integration of event streams into an Enterprise Architecture gets more and more important in the world of sensors, social media streams and Internet of Things. Events have to be accepted quickly and reliably, they have to be distributed and analysed, often with many consumers or systems interested in all or part of the events. Storing such huge event streams into HDFS or a NoSQL datastore is feasible and not such a challenge anymore. But if you want to be able to react fast, with minimal latency, you can not afford to first store the data and doing the analysis/analytics later. You have to be able to include part of your analytics right after you consume the event streams. Products for doing event processing, such as Oracle Event Processing or Esper, are avaialble for quite a long time and also used to be called Complex Event Processing (CEP). In the last 3 years, another family of products appeared, mostly out of the Big Data Technology space, called Stream Processing or Streaming Analytics. These are mostly open source products/frameworks such as Apache Storm, Spark Streaming, Apache Samza as well as supporting infrastructures such as Apache Kafka. In this talk I will present the theoretical foundations for Event and Stream Processing and present what differences you might find between the more traditional CEP and the more modern Stream Processing solutions and show that a combination of both will bring the most value.
Discover the origins of big data, discuss existing and new projects, share common use cases for those projects, and explain how you can modernize your architecture using data analytics, data operations, data engineering and data science.
Big Data Fundamentals is your prerequisite to building a modern platform for machine learning and analytics optimized for the cloud.
We’ll close out with a live Q&A with some of our technical experts as well.
Stretch your brain with a packed agenda:
Open source software
Data storage
Data ingestion
Data analytics
Data engineering
IoT and life after Lambda architectures
Data science
Cybersecurity
Cluster management
Big data in the cloud
Success stories
This document summarizes different methods for time series analysis and prediction in the deep learning era. It discusses classical autoregressive and Bayesian models, general machine learning approaches, and various deep learning techniques including DeepAR, Deep Ensembles, Deep State Space models, and combinations of deep neural networks with Gaussian processes. The document compares the pros and cons of each approach in terms of scalability, ability to share information across time series, handling cold starts with limited data, estimating predictive uncertainty, and dealing with unevenly spaced time series data.
A VERY high level over view of Graph Analytics concepts and techniques, including structural analytics, Connectivity Analytics, Community Analytics, Path Analytics, as well as Pattern Matching
This document provides an overview of an ontology engineering tutorial presented by Dr. Elena Simperl and Dr. Christoph Tempich. It introduces the presenters and their backgrounds. The tutorial will cover ontology engineering methodologies, development processes, and useful management and support methods. It aims to teach how to implement a successful ontology engineering initiative in a company and convince leadership to support the project.
1) Advanced analytics uses predictive, proactive, and forecasting capabilities to gain insights from large amounts of structured and unstructured data from various sources.
2) By 2014, 30% of analytic applications will use advanced analytic techniques and the global market for analytics software is expected to reach $34 billion.
3) Enablers of advanced analytics include in-memory databases, data mining, real-time data warehouses, and analytics-as-a-service to process large volumes of data and provide faster results.
This document discusses MLOps, which aims to standardize and streamline machine learning model development and deployment through continuous delivery. MLOps applies agile principles to machine learning projects and treats models and datasets as first-class citizens within CI/CD systems. The document outlines three levels of MLOps implementation from manual to fully automated pipelines. It also describes common MLOps platform tools for data management, modeling, and operationalization, including tools for data labeling, versioning, experiment tracking, hyperparameter optimization, model deployment, and monitoring.
Slides for my Associate Professor (oavlönad docent) lecture.
The lecture is about Data Streaming (its evolution and basic concepts) and also contains an overview of my research.
Build Intelligent Fraud Prevention with Machine Learning and GraphsNeo4j
See how financial services, banking and retail are using graph-enhanced machine learning to thwart fraud. Fraudsters are becoming increasingly sophisticated, organized and adaptive; traditional, rule-based solutions are not broad or nimble enough to deal with this reality. This session will cover several demonstrations and real-world technical examples including preventing credit card fraud, identifying money laundering and reducing false positives.
This document provides an overview of data streaming fundamentals and tools. It discusses how data streaming processes unbounded, continuous data streams in real-time as opposed to static datasets. The key aspects covered include data streaming architecture, specifically the lambda architecture, and popular open source data streaming tools like Apache Spark, Apache Flink, Apache Samza, Apache Storm, Apache Kafka, Apache Flume, Apache NiFi, Apache Ignite and Apache Apex.
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
For a long time, relational database management systems have been the only solution for persistent data store. However, with the phenomenal growth of data, this conventional way of storing has become problematic.
To manage the exponentially growing data traffic, largest information technology companies such as Google, Amazon and Yahoo have developed alternative solutions that store data in what have come to be known as NoSQL databases.
Some of the NoSQL features are flexible schema, horizontal scaling and no ACID support. NoSQL databases store and replicate data in distributed systems, often across datacenters, to achieve scalability and reliability.
The CAP theorem states that any networked shared-data system (e.g. NoSQL) can have at most two of three desirable properties:
• consistency(C) - equivalent to having a single up-to-date copy of the data
• availability(A) of that data (for reads and writes)
• tolerance to network partitions(P)
Because of this inherent tradeoff, it is necessary to sacrifice one of these properties. The general belief is that designers cannot sacrifice P and therefore have a difficult choice between C and A.
In this seminar two NoSQL databases are presented: Amazon's Dynamo, which sacrifices consistency thereby achieving very high availability and Google's BigTable, which guarantees strong consistency while provides only best-effort availability.
The financial industry is witnessing an emerging trend of Large Language Models (LLMs) applications to improve operational efficiency. This article, based on a round table discussion hosted by TruEra and QuantUniversity in New York in May 2023, explores the potential use cases of LLMs in financial institutions (FIs), the risks to consider, approaches to manage these risks, and the implications for people, skills, and ways of working. Frontline personnel from Data and Analytics/AI teams, Model Risk, Data Management, and other roles from fifteen financial institutions devoted over two hours to discussing the LLM opportunities within their industry, as well as strategies for mitigating associated risks.
The discussions revealed a preference for discriminative use cases over generative ones, with a focus on information retrieval and operational automation. The necessity for a human-in-the-loop was emphasized, along with a detailed discourse on risks and their mitigation.
Spatial data mining involves discovering patterns from large spatial datasets. It differs from traditional data mining due to properties of spatial data like spatial autocorrelation and heterogeneity. Key spatial data mining tasks include clustering, classification, trend analysis and association rule mining. Clustering algorithms like PAM and CLARA are useful for grouping spatial data objects. Trend analysis can identify global or local trends by analyzing attributes of spatially related objects. Future areas of research include spatial data mining in object oriented databases and using parallel processing to improve computational efficiency for large spatial datasets.
Horizontal sharding separates database rows across multiple servers to improve performance and scalability. Vertical sharding separates columns. Horizontal sharding is best for queries that return subsets of rows grouped by fields like date ranges, while vertical sharding is best when queries return subsets of columns. To shard data, a shard key is chosen to partition data across servers. Key-based, range-based, and directory-based sharding techniques partition data differently. While sharding improves performance and availability, it can cause unbalanced shards if data is unevenly distributed across key ranges. Other scaling options before sharding include remote databases, caching, read replicas, and upgrading servers.
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. In in this lecture we overview the mining of data streams
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
Presented at All Things Open RTP Meetup
Presented by Karthik Uppuluri, Fidelity
Title: Generative AI
Abstract: In this session, let us embark on a journey into the fascinating world of generative artificial intelligence. As an emergent and captivating branch of machine learning, generative AI has become instrumental in myriad of sectors, ranging from visual arts to creating software for technological solutions. This session requires no prior expertise in machine learning or AI. It aims to inculcate a robust understanding of fundamental concepts and principles of generative AI and its diverse applications. Join us as we delve into the mechanics of this transformative technology and unpack its potential.
Machine learning can be distributed across multiple machines to allow for processing of large datasets and complex models. There are three main approaches to distributed machine learning: data parallel, where the data is partitioned across machines and models are replicated; model parallel, where different parts of large models are distributed; and graph parallel, where graphs and algorithms are partitioned. Distributed frameworks use these approaches to efficiently and scalably train machine learning models on big data in parallel.
This document provides an outline and summary of a thesis submitted for a Master's degree in Computer Science. The thesis explores security and privacy issues related to implementing a private cloud for a university. The objectives are to reduce costs, improve resource sharing, and address security through authentication, authorization, and encryption techniques. The research methodology involves developing applications on the Windows Azure platform using .NET and SQL Server.
Human: Thank you for the summary. You captured the key points well in 3 sentences.
Cloud computing refers to delivering computing resources and services over the Internet. It allows on-demand access to shared computing power, storage, databases, and applications without requiring local servers or hardware. Key benefits include scalability, accessibility, pay-as-you-go pricing, efficient resource utilization, and self-service capabilities. Common applications include SaaS, IaaS, PaaS, big data analytics, IoT, and disaster recovery. While cloud computing provides advantages, businesses must also address risks such as security, vendor lock-in, downtime, and regulatory compliance.
This document provides an overview of an ontology engineering tutorial presented by Dr. Elena Simperl and Dr. Christoph Tempich. It introduces the presenters and their backgrounds. The tutorial will cover ontology engineering methodologies, development processes, and useful management and support methods. It aims to teach how to implement a successful ontology engineering initiative in a company and convince leadership to support the project.
1) Advanced analytics uses predictive, proactive, and forecasting capabilities to gain insights from large amounts of structured and unstructured data from various sources.
2) By 2014, 30% of analytic applications will use advanced analytic techniques and the global market for analytics software is expected to reach $34 billion.
3) Enablers of advanced analytics include in-memory databases, data mining, real-time data warehouses, and analytics-as-a-service to process large volumes of data and provide faster results.
This document discusses MLOps, which aims to standardize and streamline machine learning model development and deployment through continuous delivery. MLOps applies agile principles to machine learning projects and treats models and datasets as first-class citizens within CI/CD systems. The document outlines three levels of MLOps implementation from manual to fully automated pipelines. It also describes common MLOps platform tools for data management, modeling, and operationalization, including tools for data labeling, versioning, experiment tracking, hyperparameter optimization, model deployment, and monitoring.
Slides for my Associate Professor (oavlönad docent) lecture.
The lecture is about Data Streaming (its evolution and basic concepts) and also contains an overview of my research.
Build Intelligent Fraud Prevention with Machine Learning and GraphsNeo4j
See how financial services, banking and retail are using graph-enhanced machine learning to thwart fraud. Fraudsters are becoming increasingly sophisticated, organized and adaptive; traditional, rule-based solutions are not broad or nimble enough to deal with this reality. This session will cover several demonstrations and real-world technical examples including preventing credit card fraud, identifying money laundering and reducing false positives.
This document provides an overview of data streaming fundamentals and tools. It discusses how data streaming processes unbounded, continuous data streams in real-time as opposed to static datasets. The key aspects covered include data streaming architecture, specifically the lambda architecture, and popular open source data streaming tools like Apache Spark, Apache Flink, Apache Samza, Apache Storm, Apache Kafka, Apache Flume, Apache NiFi, Apache Ignite and Apache Apex.
A non-technical overview of Large Language Models, exploring their potential, limitations, and customization for specific challenges. While this deck is tailored for an audience from the financial industry in mind, its content remains broadly applicable.
(Note: Discover a slightly updated version of this deck at slideshare.net/LoicMerckel/introduction-to-llms.)
For a long time, relational database management systems have been the only solution for persistent data store. However, with the phenomenal growth of data, this conventional way of storing has become problematic.
To manage the exponentially growing data traffic, largest information technology companies such as Google, Amazon and Yahoo have developed alternative solutions that store data in what have come to be known as NoSQL databases.
Some of the NoSQL features are flexible schema, horizontal scaling and no ACID support. NoSQL databases store and replicate data in distributed systems, often across datacenters, to achieve scalability and reliability.
The CAP theorem states that any networked shared-data system (e.g. NoSQL) can have at most two of three desirable properties:
• consistency(C) - equivalent to having a single up-to-date copy of the data
• availability(A) of that data (for reads and writes)
• tolerance to network partitions(P)
Because of this inherent tradeoff, it is necessary to sacrifice one of these properties. The general belief is that designers cannot sacrifice P and therefore have a difficult choice between C and A.
In this seminar two NoSQL databases are presented: Amazon's Dynamo, which sacrifices consistency thereby achieving very high availability and Google's BigTable, which guarantees strong consistency while provides only best-effort availability.
The financial industry is witnessing an emerging trend of Large Language Models (LLMs) applications to improve operational efficiency. This article, based on a round table discussion hosted by TruEra and QuantUniversity in New York in May 2023, explores the potential use cases of LLMs in financial institutions (FIs), the risks to consider, approaches to manage these risks, and the implications for people, skills, and ways of working. Frontline personnel from Data and Analytics/AI teams, Model Risk, Data Management, and other roles from fifteen financial institutions devoted over two hours to discussing the LLM opportunities within their industry, as well as strategies for mitigating associated risks.
The discussions revealed a preference for discriminative use cases over generative ones, with a focus on information retrieval and operational automation. The necessity for a human-in-the-loop was emphasized, along with a detailed discourse on risks and their mitigation.
Spatial data mining involves discovering patterns from large spatial datasets. It differs from traditional data mining due to properties of spatial data like spatial autocorrelation and heterogeneity. Key spatial data mining tasks include clustering, classification, trend analysis and association rule mining. Clustering algorithms like PAM and CLARA are useful for grouping spatial data objects. Trend analysis can identify global or local trends by analyzing attributes of spatially related objects. Future areas of research include spatial data mining in object oriented databases and using parallel processing to improve computational efficiency for large spatial datasets.
Horizontal sharding separates database rows across multiple servers to improve performance and scalability. Vertical sharding separates columns. Horizontal sharding is best for queries that return subsets of rows grouped by fields like date ranges, while vertical sharding is best when queries return subsets of columns. To shard data, a shard key is chosen to partition data across servers. Key-based, range-based, and directory-based sharding techniques partition data differently. While sharding improves performance and availability, it can cause unbalanced shards if data is unevenly distributed across key ranges. Other scaling options before sharding include remote databases, caching, read replicas, and upgrading servers.
Course "Machine Learning and Data Mining" for the degree of Computer Engineering at the Politecnico di Milano. In in this lecture we overview the mining of data streams
A short presentation for beginners on Introduction of Machine Learning, What it is, how it works, what all are the popular Machine Learning techniques and learning models (supervised, unsupervised, semi-supervised, reinforcement learning) and how they works with various Industry use-cases and popular examples.
Presented at All Things Open RTP Meetup
Presented by Karthik Uppuluri, Fidelity
Title: Generative AI
Abstract: In this session, let us embark on a journey into the fascinating world of generative artificial intelligence. As an emergent and captivating branch of machine learning, generative AI has become instrumental in myriad of sectors, ranging from visual arts to creating software for technological solutions. This session requires no prior expertise in machine learning or AI. It aims to inculcate a robust understanding of fundamental concepts and principles of generative AI and its diverse applications. Join us as we delve into the mechanics of this transformative technology and unpack its potential.
Machine learning can be distributed across multiple machines to allow for processing of large datasets and complex models. There are three main approaches to distributed machine learning: data parallel, where the data is partitioned across machines and models are replicated; model parallel, where different parts of large models are distributed; and graph parallel, where graphs and algorithms are partitioned. Distributed frameworks use these approaches to efficiently and scalably train machine learning models on big data in parallel.
This document provides an outline and summary of a thesis submitted for a Master's degree in Computer Science. The thesis explores security and privacy issues related to implementing a private cloud for a university. The objectives are to reduce costs, improve resource sharing, and address security through authentication, authorization, and encryption techniques. The research methodology involves developing applications on the Windows Azure platform using .NET and SQL Server.
Human: Thank you for the summary. You captured the key points well in 3 sentences.
Cloud computing refers to delivering computing resources and services over the Internet. It allows on-demand access to shared computing power, storage, databases, and applications without requiring local servers or hardware. Key benefits include scalability, accessibility, pay-as-you-go pricing, efficient resource utilization, and self-service capabilities. Common applications include SaaS, IaaS, PaaS, big data analytics, IoT, and disaster recovery. While cloud computing provides advantages, businesses must also address risks such as security, vendor lock-in, downtime, and regulatory compliance.
The document discusses the key requirements and challenges faced by several educational institutions and how Citrix cloud solutions helped address them. The University of Sao Paulo needed to deliver virtual resources to its 100,000 students and 6,000 professors across 11 campuses and built a private cloud with Citrix solutions to provide a self-service portal for desktops, applications, and infrastructure. A large community college in the US used Citrix to become a cloud provider for other colleges and gain showback of department usage. The Royal Melbourne Institute of Technology deployed Citrix clouds to enable self-service access control and integrate with its Cisco and NetApp infrastructure.
This document discusses cloud computing and its applications in education. It defines cloud computing as the delivery of computing resources like data storage, servers, and software over the Internet. The document outlines the main types of cloud computing (public, private, hybrid), cloud services, and advantages like cost savings, scalability, and accessibility. Challenges of cloud computing in education like connectivity and cost issues are discussed along with solutions. Examples of cloud-based learning management systems and productivity tools are provided. The future of cloud computing in education is predicted to include more efficient and cost-effective remote learning and collaboration solutions.
Introduction of Cloud Computing & Historical Background
Cloud Service Models & Cloud Deployment Models
Benefits of Cloud Computing
Risks and Challenges
Future Trends in Cloud Computing
Edge Computing, Serverless Computing, AI & Machine Learning in Cloud, Security and
Compliance
Needs and Obstacles for Cloud Deployment
Conclusion
The document discusses cloud testing and how it can be leveraged. It defines cloud computing and its services like SaaS, PaaS, and IaaS. It then discusses different types of testing that can be done in the cloud like load testing, performance testing, functional testing, etc. It provides an example of a media company that was able to use Amazon cloud services to reduce costs and latency issues. In conclusion, it states that cloud testing is well-suited for the fast-moving digital world as it allows for quick deployment and is cost-effective.
Distance Learning Education with Cloud Computingamanyosama12
Cloud computing delivers various computing resources and services over the Internet. It offers benefits for education like increased accessibility and collaboration between students and teachers regardless of location. While connectivity and costs pose challenges, cloud computing allows learning management systems and productivity tools to be accessed from anywhere, enabling distance learning education with tools for file sharing, communication, and tracking student progress.
Elucidating the impact of cloud computing in education sector Benefits and Ch...Dr. Trilok Kumar Jain
Cloud computing provides numerous benefits to the education sector by allowing on-demand access to applications and storage of data over the internet from any device. It enables students, teachers, and staff to access software, files, and computing resources through web-based tools rather than relying on local servers or software loaded onto individual computers. While cloud computing increases access to educational resources, there are also challenges to address regarding data security, management of instructional software, adequate IT support for schools, and equipping students with devices to access digital materials in the cloud.
Using the Technology Organization Environment Framework for Adoption and Impl...theijes
This document discusses adoption and implementation of cloud computing in institutions of higher learning in Kenya using the Technology Organization Environment (TOE) framework. It identifies key issues that need to be addressed for successful adoption, including the need for cloud computing expertise, adequate internet bandwidth and infrastructure, financial resources for metered cloud payments, information security concerns, and cloud service/infrastructure readiness. The TOE framework examines technological, organizational, and environmental factors influencing technological adoption. This study analyzes these contexts and suggests strategies for effective utilization of cloud resources in learning institutions in developing countries.
The document discusses how universities are facing increasing demands on their IT infrastructure due to exponential growth in data and usage of personal devices. This is straining university budgets and capacity. Many universities are adopting cloud computing to gain efficiencies and flexibility without sacrificing performance. Key benefits include cost savings of around 21% on average, increased efficiency and rapid provisioning, and the ability to innovate more easily. However, universities face unique security and compliance challenges that require a customized cloud strategy and transition approach. With expert guidance and proven methodologies, universities can develop a comprehensive cloud strategy and make a smooth transition to the cloud.
This document summarizes a presentation on cloud computing given by Dr. M. Prasad and Dr. Rajarao P B V. It defines cloud computing as the third wave of the digital revolution and outlines the main types of cloud services: infrastructure as a service, platform as a service, and software as a service. It compares cloud computing models to traditional on-premises computing and discusses some of the opportunities and concerns around cloud adoption.
This document summarizes a presentation on cloud and security challenges given by Dr. Tonny K. Omwansa at the ISACA Kenya conference in May 2014. The presentation covered an overview of cloud computing, the results of a study on cloud penetration in Kenya, and security challenges and solutions related to cloud computing. Some key findings from the study included that 69% of organizations in Kenya use some form of cloud, with private cloud being more common than public cloud. The top security concerns related to cloud computing were around traditional security issues, availability concerns, and lack of control and transparency with third-party data in the cloud. Recommendations focused on developing cloud strategies, policies, skills and awareness to better facilitate cloud adoption in Kenya
This document proposes a cloud streaming service for the University of Bedfordshire. It discusses the objectives of cloud streaming, including cost savings and flexibility. The document outlines different cloud deployment models like private, public, and hybrid clouds. It explains that cloud streaming allows sharing of resources like video and software from anywhere. The architecture and major providers of cloud streaming are also summarized. Finally, the document discusses challenges of reliability, governance, security and vendor lock-in for cloud streaming services.
ACHIEVING SEAMLESS MIGRATION TO PRIVATECLOUD INFRASTRUCTURE FOR MULTI-CAMPUS ...ijccsa
This document discusses the challenges faced by multi-campus universities in managing their IT infrastructure and explores how private cloud migration can help address these challenges. It presents a case study of the University of the Aegean's migration to a private cloud. The university faced issues with an inflexible, inefficient and outdated infrastructure across its six island campuses. It evaluated its infrastructure and network connections over time. It then migrated its infrastructure to a private cloud with a new data center and high-speed network connections between campuses, improving flexibility, scalability, reliability and security while reducing costs and redundancy. The document identifies critical success factors for universities considering private cloud migration.
Role and Service of Cloud Computing for Higher Education SystemIRJET Journal
1) The document discusses how cloud computing can help address limitations in traditional higher education systems by providing flexible IT infrastructure and resources on an as-needed basis.
2) Traditional systems are limited by high costs of maintaining dedicated infrastructure, inability to adapt to different learning styles, and lack of digital content and tools.
3) Cloud computing allows educational institutions to access computing power, storage, applications, and other resources through the internet without having to build and maintain their own expensive infrastructure. Resources can be scaled up or down as needed.
4) This makes new technologies and digital learning materials more accessible for students and faculty while reducing costs for educational institutions compared to maintaining dedicated systems.
This presentation gives a detailed overview about Cloud Computing, its features and challenges faced by it in the market. It gives an insight into cloud security and privacy issues and its measures.
Similar to Cloud Elasticity and the CELAR Project (20)
Rapidly Testing ML-Driven Drone Applications - The FlockAI FrameworkDemetris Trihinas
As drone technology penetrates even more application domains, Machine Learning (ML) is becoming a key driver enabling intelligence in the sky. However, ML Practitioners and Drone Application Operators are often faced with several challenges when wanting to test ML-driven drone applications early in the design phase. These include the development and configuration of experiment use-cases over a robotics simulator along with the collection and assessment of desired KPIs which can range from ML algorithm accuracy to drone resource utilization and the impact of “intelligence” to the drone’s energy footprint. This talk will introduce FlockAI, an open and modular by design framework supporting users with the rapid deployment and repeatable testing during the design phase of ML-driven drone applications over the Webots robotics simulator. With FlockAI users can design drone testbeds with “ready-to-go” drone templates, deploy ML models, configure on-board/remote inference, monitor and export drone resource utilization, network overhead and energy consumption to pinpoint performance inefficiencies and understand if various trade-offs can be exploited.
Towards Energy and Carbon Footprint and Testing for AI-driven IoT ServicesDemetris Trihinas
This document discusses the need for energy and carbon footprint testing of AI-driven IoT services. It notes that increased computational power for AI/ML results in greater energy usage and potential carbon emissions. While current IoT testing tools can evaluate service quality, they lack benchmarks for energy/carbon impacts. The document considers scenarios for object detection using edge computing and proposes observations about how training/inference locations, times, models and device heterogeneity can significantly affect energy use and carbon footprint. It argues that future testing tools need power models, carbon emission data and insights into tradeoffs to help optimize resource efficiency and sustainability of edge AI services.
StreamSight: A Query-Driven Framework Extending Streaming IoT Analytics to th...Demetris Trihinas
This document describes StreamSight, a query-driven framework for extending streaming IoT analytics to fog computing environments. StreamSight uses an SQL-like query language to analyze streaming data in a way that is optimized for fog architectures. It supports optimizations such as operator placement hints, caching of intermediate results, and query prioritization. The framework was evaluated using a real-world smart city bus dataset, and StreamSight showed speedups of up to 4x compared to a baseline Spark implementation, demonstrating its ability to efficiently process queries in distributed fog environments.
Low-Cost Approximate and Adaptive Techniques for the Internet of ThingsDemetris Trihinas
This document outlines Demetris Trihinas' upcoming seminar talk at the University of Pittsburgh. The talk will discuss low-cost approximate and adaptive monitoring techniques for IoT devices. It will introduce the AdaM framework, which allows monitoring sources to dynamically adjust their sampling and data dissemination rates based on estimation models of monitoring stream evolution. This helps reduce resource usage while still providing sufficient accuracy. The talk will also cover a model-based adaptive dissemination plugin called ADMin that transmits estimation models instead of raw data points to approximate future values within given accuracy thresholds.
Telling a Story – or Even Propaganda – Through Data VisualizationDemetris Trihinas
A Chinese proverb states that "a picture is worth 1000 words"... it may even be worth more. Expanding on this point, this talk goes beyond aesthetics by introducing data visualization as a powerful tool for data exploration and knowledge communication. However, although data visualizations can be used to make story narratives more apprehendable and statistics easier to digest, they can also be used for deceit, misinformation and even propaganda. The negative impact of storytelling through data will be a prominent part of this talk where we will cover how misinformation can prevail unintentionally by misinterpreting the knowledge extracted from data, and intentionally by “fitting” the visualization to the message that must be conveyed.
The Data Science Process: From Mining Raw Data to Story VisualizationDemetris Trihinas
Data is now marketed and labelled as "the new oil". Towards this, data is now being extracted from all aspects of our everyday lives with the hope that by analyzing these large volumes of data, useful insights and knowledge will be derived. This talk provides a broad overview of the Data Science process, starting from "tapping" into online data sources, to the analysis, and then the importance of data visualization. Through the discussion of each stage of the Data Science process we will outline tasks that should be followed along with practical challenges and strategies to overcome these challenges.
This document contains a presentation on storytelling through data and data mining. It discusses collecting raw data from sources like social media and news outlets using techniques like web crawling and APIs. It then discusses storing large amounts of data and analyzing it using data mining techniques like classification, clustering, pattern discovery, and association to convert raw data into structured information and knowledge. The goal is to take large and complex datasets and identify patterns and stories within the data to communicate insights to others.
Designing Scalable and Secure Microservices by Embracing DevOps-as-a-Service ...Demetris Trihinas
This document provides an overview of a tutorial on designing scalable and secure microservices. It discusses the evolution of software architectures from monolithic applications to microservices and the benefits of microservices for scalability, innovation, and continuous delivery. It also covers challenges such as coordination and deployment complexities that microservices aim to address through practices like containerization, automation, and decentralized governance.
Low-Cost Approximate and Adaptive Monitoring Techniques for the Internet of T...Demetris Trihinas
An overview of monitoring techniques used on the edge to lower big data and energy efficiency barriers for IoT. To achieve this we introduce the AdaM and ADMin frameworks. This presentation is from a talk given at the University of Cyprus (March 2017). If used, please cite one of the following:
- "Adam: An adaptive monitoring framework for sampling and filtering on IoT devices", D. Trihinas et al., IEEE BigData 2015, 10.1109/BigData.2015.7363816
- "ADMin: Adaptive Monitoring Dissemination for the Internet of Things", D. Trihinas et al., IEEE INFOCOM 2017, to appear
The AdaM framework aims to reduce the volume of data generated by IoT devices while preserving battery life. It does this by dynamically adapting the rate of data collection and filtering out redundant metric values to balance efficiency and accuracy. The framework adjusts the sampling rate and filter range based on the evolution of metric streams. This allows IoT devices running AdaM to use less processing, network bandwidth, and energy while still achieving high accuracy levels compared to static monitoring approaches.
Low-Cost Adaptive Monitoring Techniques for the Internet of ThingsDemetris Trihinas
AdaM is an adaptive monitoring framework that uses adaptive sampling and filtering techniques to reduce the processing, network traffic, and energy consumption of IoT devices while monitoring data streams. It dynamically adjusts the sampling period and filter range based on the variability and evolution of the metric stream to balance efficiency and accuracy. Evaluation shows AdaM achieves significant reductions in overhead compared to state-of-the-art techniques while maintaining high estimation accuracy.
AdaM: an Adaptive Monitoring Framework for Sampling and Filtering on IoT DevicesDemetris Trihinas
This document summarizes an adaptive monitoring framework called AdaM that dynamically adjusts sampling periods and filtering ranges for data collected from IoT devices. AdaM aims to reduce data volumes, network traffic, and energy consumption on IoT devices while maintaining accuracy. It uses probabilistic exponential weighted moving averages and adaptive algorithms to adjust sampling periods and filtering ranges based on the variability and evolution of collected metric streams. An evaluation shows AdaM significantly reduces processing, network traffic, and energy use compared to other techniques while achieving high estimation accuracy above 89% on various real-world datasets.
Andreas is a history and archaeology graduate who has an idea for a nonprofit mobile platform to bring together Cypriot youth from both communities to share knowledge about cultural heritage. However, Andreas lacks the knowledge and resources to find the right team of developers, marketers, and others to implement his idea. Findaproject.eu aims to match people with project ideas to potential team members with relevant skills to help ideas become reality. It uses algorithms to match profiles on skills and interests and integrates with LinkedIn. The initial team includes Demetris who focuses on backend development, Athanasios on frontend development, and Pavlos who provides financial expertise.
[ccgrid2014] JCatascopia: Monitoring Elastically Adaptive Applications in the...Demetris Trihinas
Demetris Trihinas presented JCatascopia, an open-source cloud monitoring system capable of supporting elastic applications. JCatascopia uses lightweight monitoring agents that collect metrics from probes and send them to monitoring servers. It can monitor applications deployed across multiple cloud platforms and dynamically adapt to changes in application topology or resource allocation. Experiments showed JCatascopia has low runtime overhead and can effectively monitor elastic applications in public and private clouds.
Demetris Trihinas presented JCatascopia, an open-source cloud monitoring system. JCatascopia can monitor elastic cloud applications across multiple cloud platforms. It uses lightweight monitoring agents that collect metrics and dynamically adapt to changes in application topology. JCatascopia's pub/sub architecture allows monitoring servers to operate independently of agent locations. Experiments showed JCatascopia has low overhead and can effectively monitor applications with elastic scaling.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
Dandelion Hashtable: beyond billion requests per second on a commodity serverAntonios Katsarakis
This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsDianaGray10
Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more.
The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications.
We’ll discuss and demo the benefits of UiPath Apps and connectors including:
Creating a compelling user experience for any software, without the limitations of APIs.
Accelerating the app creation process, saving time and effort
Enjoying high-performance CRUD (create, read, update, delete) operations, for
seamless data management.
Speakers:
Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP
Charlie Greenberg, host
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
High performance Serverless Java on AWS- GoTo Amsterdam 2024Vadym Kazulkin
Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving
Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event.
The objectives of the Lunch and Learn presentation were to:
- Review what KM ‘is’ and ‘isn’t’
- Understand the value of KM and the benefits of engaging
- Define and reflect on your “what’s in it for me?”
- Share actionable ways you can participate in Knowledge - - Capture & Transfer
2. Demetris Trihinas
Back in the old days…
10 March 2015, University of Cyprus
you had an idea…
but no money… difficult to start an
online business…
“In the early days (20 years ago), most new e-commerce sites, for example, cost a million
dollars to set up. Now the price is closer to $100” [M. Zwilling, NY Times, Jul 2014]
Why was it so difficult in the past?
3. Demetris Trihinas
Motivation
• he is an ambitious CS student
• he wants to develop a “youtube”
alternative
• John has no money, he must use his
knowledge and open-source tools
to develop his system
10 March 2015, University of Cyprus
Meet John!
4. Demetris Trihinas
Online Video Streaming Service #1
• From CS courses John learns about web service development
• Client-Sever model
10 March 2015, University of Cyprus
upload/download video
response
Clients (John’s family) Server (John’s computer)
• Processing done on client side (thick clients)
• Updating application logic code is not easy
Hosting database
(e.g. MySQL)
Desktop client (e.g.
Java) to connect
with server
5. Demetris Trihinas
Online Video Streaming Service #2
• Video streaming service is attracting more users (family & friends)
• John’s software development skills are getting better
• 3-tier web application
10 March 2015, University of Cyprus
upload/download video
Clients
(John’s friends and family)
application
server
store/extract video
database
Presentation layer:
CMS website (e.g.
Joomla), HTML/CSS
Application logic layer:
RESTful API, Apache
Tomcat
Data storage backend:
e.g. MySQL
6. Demetris Trihinas
Online Video Streaming Service #2
• John’s family and friends like the video service. They are telling
their friends about it.
• Scalability
• John’s system cannot sustain the increasing number of users
• Replace application server, database server with “stronger” ones
• Buying new servers is expensive
• Maintenance
• Software/hardware updates/upgrades
• Cooling, Security, Backups, etc.
10 March 2015, University of Cyprus
7. Demetris Trihinas
Back in the old days…
10 March 2015, University of Cyprus
you had an idea…
but no money… difficult to start an
online business…
“In the early days (20 years ago), most new e-commerce sites, for example, cost a million
dollars to set up. Now the price is closer to $100” [M. Zwilling, NY Times, Jul 2014]
• Infrastructure
• Hardware
• Software licences
• Maintenance
• Software updates
• Hardware upgrades
• Cooling
• Security
• Backups
Why was it so difficult in the past?
10. Demetris Trihinas
Cloud Computing
A model for enabling ubiquitous, convenient, on-demand
network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and
services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction.
NIST definition, 2011
10 March 2015, University of Cyprus
source: http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
11. Demetris Trihinas 10 March 2015, University of Cyprus
http://cloudtweaks.com/2012/09/true-facts-to-help-you-talk-about-cloud-computing-in-the-social-scene/
14. Demetris Trihinas
Online Video Streaming Service #3
• John moves video service to the Cloud!
• He has learnt about Cloud application development
• Video service is now scalable
10 March 2015, University of Cyprus
store/extract video
.
.
.
.
.
.
upload/download
video
Distribute
client requests
clients
Application Server Tier Database Backend
Load Balancer
Cloud
Provider
15. Demetris Trihinas
Elasticity in Cloud Computing
• Ability of a system to expand or contract its dedicated resources
to meet the current demand
10 March 2015, University of Cyprus
Workload(req/s)
Time (s)
De-allocate unused
resources to reduce cost
Allocate resources to
increase throughput
Provision only the
required resources
Stakeholders state that elasticity
(54%) and cost reduction (48%)
are driving cloud adoption
[FOC Survey 2013]
16. Demetris Trihinas
Elasticity Control
• MAPE-K control loop (Monitoring, Analysing, Planning, Executing
using Knowledge)
10 March 2015, University of Cyprus
Resource
Utilization
Application
Behaviour
“…automatic resource provisioning is challenging due to the fact that monitoring and
managing elastic cloud services is not a trivial task…”
Monitor
Analyse Plan
Execute
Knowledge
Elastic Cloud Service
"Managing and Monitoring Elastic Cloud Applications", D. Trihinas and C. Sofokleous and N. Loulloudes and A. Foudoulis
and G. Pallis and M. D. Dikaiakos, 14th International Conference on Web Engineering (ICWE 2014), Toulouse, France 2014
17. Demetris Trihinas
Online Video Streaming Service #5
• John decides to use an elasticity controller to scale his application
10 March 2015, University of Cyprus
store/extract video
.
.
.
.
.
.
Distribute
client requests
Application Server Tier Database Backend
Load Balancer
add/remove resourcesElasticity
Controller
if (metricA > X) then add VM
else if (metricA < X) then remove VM
else if (metricB > Y) then increase RAM
…
upload/download
video
clients
Elasticity constraints are to complex for users and based on low-level metrics
Cloud
Provider
18. Demetris Trihinas
Current Elasticity Controllers
10 March 2015, University of Cyprus
• Manual or semi-automated
elasticity control
• Vendor-specific
AutoScaling
• Elasticity modelled as a one-dimensional property
• No control over cost, performance and quality
• Only fine-grained elasticity control
• e.g. add/remove virtual instances
19. Demetris Trihinas 10 March 2015, University of Cyprus
Fully Automated
Intelligent Decision
Making Algorithms
Application
Management
Vendor
Neutrality
Multi-layer Scalable
Monitoring
Multi-Dimensional
Control
Open-Source
Multi-Grain
Elasticity Control
www.celarcloud.eu
22. Demetris Trihinas
CAMF
• A Cloud Application Management Framework providing
developers a complete set of graphical tools for:
Describing cloud applications topologies
Defining elasticity requirements and scaling actions
Deploying cloud application description(s) on any cloud
platform
By adopting the open OASIS TOSCA standard
Managing complete lifecycle of a cloud application
Open-source (on top of Eclipse Rich Client Platform)
10 March 2015, University of Cyprus
23. Demetris Trihinas
Cloud Application Management
• Emerging technology
• CSC acquired ServiceMesh for $350M
10 March 2015, University of Cyprus
CloudFormation
• Current frameworks lack in:
• Application portability – “describe once, deploy anywhere”
• “vendor neutrality (interoperability) is one of the main challenges in cloud
application management” [Gartner, CMP Landscape 2012]
24. Demetris Trihinas
CAMF
10 March 2015, University of Cyprus
“c-Eclipse: An Open-Source Management Framework for Cloud Applications", C. Sofokleous, N. Loulloudes, D.Trihinas, G.Pallis and
M. D. Dikaiakos, 20th International Conference on Parallel Processing (Euro-Par 2014), Porto, Portugal 2014
CSARCSAR
27. Demetris Trihinas
Elasticity Policy Definition
• Multi-grain elasticity policy definition
• Application’s constraints related to cost, performance and quality metrics
• Express specific strategies to be enforced when constraints are violated
• Based on powerful and flexible SYBL definition language
10 March 2015, University of Cyprus
Elasticity Policy View -> No knowledge of SYBL is required!
28. Demetris Trihinas
Cloud Provider Selection
• Users can:
– Select a Cloud provider to deploy their application(s)
– Add a new provider to the list by providing their CELAR endpoint and authentication
credentials
10 March 2015, University of Cyprus
30. Demetris Trihinas 10 March 2015, University of Cyprus
The status of the deployments is shown in the Application Deployments View
John’s Video Service Description via CAMF
32. Demetris Trihinas
Cloud Monitoring Challenges
• Monitor heterogeneous types of information and resources
• Extract metrics from multiple levels of the cloud
• Low-level metrics (i.e. CPU usage, network traffic)
• High-level metrics (i.e. application throughput, latency, availability)
• Metrics collected at different time granularities
• Non-intrusiveness
10 March 2015, University of Cyprus
33. Demetris Trihinas
Cloud Monitoring Challenges
• Cloud Platform Independence
• If a cloud service is portable then it can be moved to another
platform due to better pricing schemes, availability, QoS, etc.
• Monitoring System?
• Portable
• Easily configurable on new platform
10 March 2015, University of Cyprus
Cloud Service
Monitoring
Cloud Service
Monitoring
Provider A Provider B
Vendor lock-in concerns
have dropped 45%
[GIGAOM 2014]
34. Demetris Trihinas
Cloud Monitoring Challenges
• Interoperability
• Distribute a cloud service across multiple providers due to
better resource locality, availability or security concerns
• Monitoring System?
• Operate and collect metrics seamlessly across multiple providers
10 March 2015, University of Cyprus
Cloud Service
Monitoring Monitoring
Provider A Provider CProvider B
Cloud Service
Monitoring
42% are interested in adopting
hybrid cloud. Estimated to rise to
55% by 2016 [GIGAOM 2014]
35. Demetris Trihinas
Cloud Monitoring Challenges
• Elasticity Support
• Detect configuration changes in a cloud service
• Monitoring System?
• Detect configuration changes automatically without restarting
monitoring process or part of it and without any human intervention
10 March 2015, University of Cyprus
Cloud Service
VMVM VM VM VM. . .
Cloud Service
VM VM VM. . .VM
Application topology changes
(e.g. new VM added)
Allocated resource changes
(e.g. new disk attached to VM)
36. Demetris Trihinas
Cloud Specific Monitoring Tools
• Public and Private cloud providers offer
monitoring capabilities
• Fully documented
• Well integrated with underlying platform
10 March 2015, University of Cyprus
• REST APIs and graphical web interfaces
• Automated notification and alerting mechanisms
• Commercial and proprietary -> limited portability and
interoperability
37. Demetris Trihinas
JCatascopia Monitoring System
Open-source
Multi-Layer Cloud Monitoring
• Customizable and Extensible by Users
• Metric Subscription Rule Language and Mechanism
Platform Independent
• Operate on any cloud platform since neither metric collecting,
distribution or storage is depend to underlying infrastructure
10 March 2015, University of Cyprus
38. Demetris Trihinas
JCatascopia Monitoring System
Interoperable
• Support for application distributed across multiple cloud platforms
Capable of Supporting Elastic Cloud Services
• JCatascopia Pub/Sub Message Communication Protocol
Scalable
10 March 2015, University of Cyprus
"JCatascopia: Monitoring Elastically Adaptive Applications in the Cloud", D. Trihinas and G. Pallis and M. D.
Dikaiakos, 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2014), 2014
39. Demetris Trihinas
JCatascopia Pub/Sub Message
Communication Protocol
• Elasticity support
• Automatic monitoring instance discovery and removal
• Dynamic resource configuration (e.g. new disk is attached at runtime)
• Dynamic network interface change at runtime (e.g. elastic ip)
10 March 2015, University of Cyprus
40. Demetris Trihinas
Multi-Tier Monitoring
10 March 2015, University of Cyprus
avgActiveConnections = AVG(busyThreads)
MEMBERS = [id1, ... ,idN]
ACTION = NOTIFY(<70, >=140)
avgCPUUsage = AVG(1-cpuIdle)
MEMBERS = [id1, ... ,idN]
ACTION = NOTIFY(<30, >=85)
JCatascopia Metric Rule Language
and Mechanism
41. Demetris Trihinas
XDB In-Memory
Data Analytics
JCatascopia: Portability and Interoperability
SCAN Genome Pipeline
Multi-Graph Clustering in the Cloud
Online Gaming Multi-Tier Video Streaming
10 March 2015, University of Cyprus
42. Demetris Trihinas
JCatascopia: Advance over State-of-the-Art
Monitoring Agent Runtime Footprint
for a 3-tier Video Streaming Service
HAProxy Load Balancer Cassandra DB Node
Tomcat Application ServerOnline Directory Node
As metric count increases, Ganglia doubles its runtime footprint since custom application-specific metrics are
external processes in contrast to JCatascopia where Probes are loaded as lightweight Java threads
10 March 2015, University of Cyprus
43. Demetris Trihinas
JCatascopia: Advance over State-of-the-Art
When in need of application-level
monitoring, for a small runtime overhead,
JCatascopia can reduce monitoring
network traffic and consequently
monitoring cost
Network Utilization
for 3-tier Video Streaming Service
10 March 2015, University of Cyprus
48. Demetris Trihinas
JCatascopia: Scalability Evaluation
When archiving time is high, we can direct monitoring metric traffic through
multiple Monitoring Servers, allowing the monitoring system to scale
Node #1a
Node #M
Node #1b
Node #K
.
.
.
A
MS
Web
Service
Node #K+1
A
A
A
A
A
add node to the cluster
Monitoring Agent
Monitoring
Server
Monitoring
Server
.
.
.
Metrics
Monitoring
Server
Elastically Control
JCatascopia
10 March 2015, University of Cyprus
49. Demetris Trihinas
JCatascopia: Release and Exploitation
• Open-source under Apache 2.0 Licence
• JCatascopia Website (docs, examples, videos, publications, etc.)
• Packaging (JARs, tarballs, RPMs and Chef recipes) available in CELAR
repo
• JCatascopia Probe Library and Java Probe API
• System-level monitoring probes (for both Linux and Windows)
• Application-specific probes (Tomcat, Cassandra DB, HAProxy, Postgres DB,
RabbitMQ)
• Supporting 2 Different Database Backends (MySQL, Cassandra DB)
https://github.com/CELAR/cloud-ms
http://linc.ucy.ac.cy/CELAR/jcatascopia
https://github.com/dtrihinas/JCatascopia-Probe-Library
10 March 2015, University of Cyprus
50. Demetris Trihinas
So is simple elasticity control based
on user defined directives enough?
10 March 2015, University of Cyprus
51. Demetris Trihinas
Elasticity Control Estimation and Evaluation
10 March 2015, University of Cyprus
• How should we interpret a sudden drop in request throughput
at the business tier of a 3-tier cloud service?
• There are less clients which makes the business
tier inefficiently utilized
• Right Decision: Remove an Application Server
• Video storage backend is under-provisioned,
requests are getting queued at business tier
• Right Decision: Add another Database Node
Elasticity Controller with simple IF-THEN-ELSE policies based on metric
violations cannot determine the right ECP to improve QoS or cost
52. Demetris Trihinas
ADVISE Framework
Input
• Cloud Service topology description
(CAMF)
• Multi-layer monitoring metric
evolution (JCatascopia)
• Elasticity Control Processes (rSYBL)
• Cloud specific info (Info Service)
10 March 2015, University of Cyprus
Processing
• Project metric evolution on n-
dimensional space
• Cluster metrics and discover (or better
learn) metric correlations
• Create execution plan based on historic
info to improve resource utilization,
QoS and reduce cost
Knowledge Base
• Metric evolution
• Metric correlations
• ECPs and possible
plans
-> Collect more metrics
-> Refine clusters and discover new correlations
-> Increase our knowledge base
53. Demetris Trihinas
Elasticity Control Estimation and Evaluation
with ADVISE
10 March 2015, University of Cyprus
"ADVISE: a Framework for Evaluating Cloud Service Elasticity Behavior [Best Paper]", G. Copil, D. Trihinas, H.L Truong, D. Moldovan, G.
Pallis, S. Dustdar, M. D. Dikaiakos, 12th International Conference on Service Oriented Computing (ICSOC 2014), Paris, France 2014.
54. Demetris Trihinas
ADVISE-based Multi-Dimensional Control
A single peek causes a
“ping-pong” effect which is
billing users for resources
they aren’t really consuming
10 March 2015, University of Cyprus
ADVISE-based Control
AWS uses a hourly
charge rate
“Evaluating Cloud Service Elasticity Behavior", G. Copil, D. Trihinas, H.L Truong, D. Moldovan, G. Pallis, S. Dustdar, M. D. Dikaiakos,
International Journal of Cooperative Information Systems (IJCIS), 2015.
55. Demetris Trihinas
So is CELAR applicative anywhere
else other than video streaming?
10 March 2015, University of Cyprus
56. Demetris Trihinas
Use Case: Cancer Genome Detection
• process large amount of genomic and proteomic data
10 March 2015, University of Cyprus
CPU and disk I/O
intensive
Memory
intensive
Disk I/O and
memory intensive
Disk I/O, CPU and
network intensive
• Old approach
• Provision HPC cluster with max capacity
57. Demetris Trihinas
Acknowledgements
10 March 2015, University of Cyprus
www.celarcloud.eu
co-funded by the
European Commission
source code: https://github.com/CELAR/
website: http://linc.ucy.ac.cy/CELAR/