Advanced service-based data analytics:
concepts and designs
Hong-Linh Truong
Distributed Systems Group,
Vienna University ...
Outline
 Principles of elasticity for advanced service-
based data analytics
 Data analytics within a single system
 Da...
PRINCIPLES OF ELASTICITY FOR DATA
ANALYTICS
ASE Summer 2014 3
Advanced service-based data
analytics (1)
ASE Summer 2014 4
Cities, e.g. including:
10000+ buildings
1000000+ sensors
Near...
Advanced service-based data
analytics (2)
ASE Summer 2014 5
A lot of input data (L0):
~2.7 TB per day
A lot of results (L1...
Advanced service-based data
analytics -- fundamental concepts
ASE Summer 2014 6
Part A Part B ...... Part N
Cluster Grid L...
Design questions
 Which system infrastructures are used?
 Which are the functions of units?
 Which interfaces are suita...
Fundamental concepts – system
infrastructure unit
ASE Summer 2014 8
System
infrastructures
Cloud
Software-
based Cloud
sys...
Fundamental concepts – unit
functions
ASE Summer 2014 9
Function
Front-
end/Presentation
Data Analytics
Service
Visualizat...
Fundamental concepts –
programming model within units
ASE Summer 2014 10
Programming
model
MapReduce MPI
Parallel
Database...
Fundamental concepts – interfaces
between units
ASE Summer 2014 11
Interface
Standard
REST SOAP
APIs
Specific APIs
Standar...
Fundamental concepts – services
and data concerns
ASE Summer 2014 12
Service and
data
concerns
Data
concerns
Quality of
da...
Complex dependencies in (big)
data analytics
 More data  more
computational resources
(e.g. more VMs)
 More types of da...
Complex dependencies in (big)
data analytics
Elasticity principles
can be used to
support this!
Elasticity principles
can ...
Elasticity Principles: Elasticity of
data and computational models
 Multiple types of objects from different sources with...
Elasticity Principles: Elasticity of
data resources
 Data provided, managed and shared by different
providers
 Data asso...
Elasticity Principles: Elasticity of
humans and software as computing
units
 Human in the loop to solve analytics tasks t...
Elasticity Principles: Elasticity of
quality of results
 Definition of quality of results
 Trade-offs of time, cost, qua...
WE NEED TO START FROM
DATA ANALYTICS WITHIN A
SINGLE SYSTEM
ASE Summer 2014 19
Domain ADomain A
Data analytics within a single
system
 They are complex enough but
do not meet all requirements
 In a s...
Data analytics within a single
system
ASE Summer 2014 21
1. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, ...
Data analytics within a single
system – some examples
ASE Summer 2014 22
Message Passing
Interface (MPI) + Cluster-
based ...
Discussion time
ASE Summer 2014 23
WHY ANALYTICS UNITS SHOULD BE
„CLOSED“ TO DATA UNITS?
WHICH CONCERNS COULD BE IGNORE IN...
Data analytics across multiple
systems – design choice
 Programming models
for data analytics
service
 Data service unit...
Data analytics across multiple systems
– programming models (1)
Static data
ASE Summer 2014 25
Local
input
data
Analytics...
Stockmarket
Social media
M2M
Stockmarket
Social media
M2M
Data analytics across multiple systems
– programming models (2)
...
Big data (e.g.,
satellite images)
Big data (e.g.,
satellite images)
Data analytics across multiple systems
– programming m...
Data analytics across multiple
systems – data service units
ASE Summer 2014 28
Cluster file
LustreLustreNFSNFS
Data
Analyt...
Data analytics across multiple
systems – data service units
ASE Summer 2014 29
Storage-as-a-
Service
Google Storage Servic...
Data analytics across multiple
systems – data service units
ASE Summer 2014 30
Database-as-a-
Service
SkySQL
Amazon RDS
Mi...
Data analytics across multiple
systems – data service units
ASE Summer 2014 31
DaaS
Infochimps
Microsoft Azure
Xively
GNIP...
Middleware service unit for
transfering large data -- GlobusOnline
ASE Summer 2014 32
Source: Bryce Allen, John Bresnahan,...
Middleware service unit for
transfering large data -- ProxyWS
ASE Summer 2014 33
Spiros Koulouzis, Reginald Cushing, K. A....
Middleware service units for
messages/queuing
 Advanced Message Queuing Protocol (AMQP)
 Simple (or Streaming) Text Orie...
SOME EXAMPLES OF
COMPLEX DATA ANALYTICS
SERVICE
ASE Summer 2014 35
The SMAD distributed processing
architecture
36ASE Summer 2014
Different possibilities: Grids and/or
clouds
 Raw images stored in archival: iRODS, HTTP server, or
Amazon S3
 Notificat...
Prototype
38
PBS on Vienna Scientific
Cluster (vsc.ac.at), In total
~ 4000 cores
ASE Summer 2014
39
Illustrative experiment (1)
ASE Summer 2014
Sustainability governance analysis
ASE Summer 2014 40
Cities, e.g. including:
10000+ buildings
1000000+ sensors
Near
realt...
DaaS for sustainability governance
 Monitoring data DaaS
 Domain-specific knowledge DaaS
ASE Summer 2014 41
Hong-Linh Tr...
Platform-as-a-Service for
Sustainability Governance
• different types of analytics application models,
such as batch, work...
Cloud-based Sustainability
governance analysis framework
ASE Summer 2014 43
Cloud-based Sustainability
governance analysis framework
ASE Summer 2014 44
HOW TO DEAL WITH COST
AND QUALITY OF COMPLEX
SERVICES?
Discussion time
ASE Summer 2014 45
46 46
Examples of our complex data
analytics
 Bio-mechanic applications
 Simulate the stiffness of human bones
 Data an...
Composable evaluation approach
 We test with „cost“
ASE Summer 2014 47
Part A Part B ...... Part N
Dealing with performance and cost
of complex applications in clouds
 Application complexity
 Elastic high performance ap...
Composable cost evaluation
Part A Part B Part C
Cost/performance
model i
Cost/performance
model j
Cost/performance
model k...
Composable cost evaluation:
Estimation and Monitoring
 Leverage our previous knowledge on event representations,
applicat...
Event Representations and
Instrumentation
 Captured monitoring events based on a well-defined
specification
– Well-known ...
Composable cost evaluation --
Fine-grained composition cost
models
52 52ASE Summer 2014
Composable cost evaluation --
Illustrative experiments
53
Examples with the Bones application
ASE Summer 2014
Simple Cost Estimation - Examples
 aaaa
54ASE Summer 2014 54
Online Cost Monitoring - Examples
 One experiment of a
bioinformatic
workflow in EC2
 Support runtime
cost-based
composi...
Exercises
 Read mentioned papers
 Analyze the relationships between programming
models and system infrastructures for da...
57
Thanks for
your attention
Hong-Linh Truong
Distributed Systems Group
Vienna University of Technology
truong@dsg.tuwien....
Upcoming SlideShare
Loading in …5
×

TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

468 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
468
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

TUW-ASE-Summer 2014: Advanced service-based data analytics: concepts and designs

  1. 1. Advanced service-based data analytics: concepts and designs Hong-Linh Truong Distributed Systems Group, Vienna University of Technology truong@dsg.tuwien.ac.at dsg.tuwien.ac.at/staff/truong 1ASE Summer 2014 Advanced Services Engineering, Summer 2014, Lecture 7 Advanced Services Engineering, Summer 2014, Lecture 7
  2. 2. Outline  Principles of elasticity for advanced service- based data analytics  Data analytics within a single system  Data analytics across multiple systems  Composable cost evaluation ASE Summer 2014 2
  3. 3. PRINCIPLES OF ELASTICITY FOR DATA ANALYTICS ASE Summer 2014 3
  4. 4. Advanced service-based data analytics (1) ASE Summer 2014 4 Cities, e.g. including: 10000+ buildings 1000000+ sensors Near realtime analytics Near realtime analytics Predictive data analytics Visual Analytics Enterprise Resource Planning Enterprise Resource Planning Emergency Management Emergency Management Internet/public cloud boundary Organization-specific boundary Tracking/Log istics Tracking/Log istics Infrastructure Monitoring Infrastructure Monitoring Infrastructure/Internet of Things ......
  5. 5. Advanced service-based data analytics (2) ASE Summer 2014 5 A lot of input data (L0): ~2.7 TB per day A lot of results (L1, L2): e.g., L1 has ~140 MB per day for a grid of 1kmx1km Soil moisture analysis for Sentinel-1 Michael Hornacek,Wolfgang Wagner, Daniel Sabel, Hong-Linh Truong, Paul Snoeij, Thomas Hahmann, Erhard Diedrich, Marcela Doubkova, Potential for High Resolution Systematic Global Surface Soil Moisture Retrieval Via Change Detection Using Sentinel-1, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, April, 2012 Michael Hornacek,Wolfgang Wagner, Daniel Sabel, Hong-Linh Truong, Paul Snoeij, Thomas Hahmann, Erhard Diedrich, Marcela Doubkova, Potential for High Resolution Systematic Global Surface Soil Moisture Retrieval Via Change Detection Using Sentinel-1, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, April, 2012 Data-as-a-Service and Platform-as-a- Service in clouds Data-as-a-Service and Platform-as-a- Service in clouds
  6. 6. Advanced service-based data analytics -- fundamental concepts ASE Summer 2014 6 Part A Part B ...... Part N Cluster Grid Local Cloud Public cloud/Sky Applications System infrastructures Domain 1 Domain 2 Domain n
  7. 7. Design questions  Which system infrastructures are used?  Which are the functions of units?  Which interfaces are suitable for units?  Which programming models are used within units?  Which are fundamental units to be used?  How do different units interact?  Which non-functional parameters are important and how to measure them? ASE Summer 2014 7 Part = a (composite) service unitPart = a (composite) service unit
  8. 8. Fundamental concepts – system infrastructure unit ASE Summer 2014 8 System infrastructures Cloud Software- based Cloud system Human-based Cloud system Cluster Grid High Performance Server
  9. 9. Fundamental concepts – unit functions ASE Summer 2014 9 Function Front- end/Presentation Data Analytics Service Visualization Service Middleware Enterprise Service Bus Publish/Subscription Messaging/Queuing Data Transfer
  10. 10. Fundamental concepts – programming model within units ASE Summer 2014 10 Programming model MapReduce MPI Parallel Database Workflow Other solutions
  11. 11. Fundamental concepts – interfaces between units ASE Summer 2014 11 Interface Standard REST SOAP APIs Specific APIs Standard APIs (e.g. OpenStack) Interaction Pull Push
  12. 12. Fundamental concepts – services and data concerns ASE Summer 2014 12 Service and data concerns Data concerns Quality of data Pricing Data Right ... Service Concerns QoS Pricing ...
  13. 13. Complex dependencies in (big) data analytics  More data  more computational resources (e.g. more VMs)  More types of data  more computational models  more analytics processes  Change quality of results  Change quality of data  Change response time  Change cost  Change types of result (form of the data output, e.g. tree, visual, story, etc.)  More data  more computational resources (e.g. more VMs)  More types of data  more computational models  more analytics processes  Change quality of results  Change quality of data  Change response time  Change cost  Change types of result (form of the data output, e.g. tree, visual, story, etc.) Data Computational Model Analytics Process Analytics Result Data Data DataxDatax DatayDatay DatazDataz Computational Model Computational ModelComputational Model Computational ModelComputational Model Computational Model Analytics Process Analytics ProcessAnalytics Process Analytics ProcessAnalytics Process Analytics Process Quality of Result ASE Summer 2014 13 Hong-Linh Truong, Schahram Dustdar, "Principles of Software-defined Elastic Systems for Big Data Analytics", (c) IEEE Computer Society, IEEE International Workshop on Software Defined Systems, 2014 IEEE International Conference on Cloud Engineering (IC2E 2014), Boston, Massachusetts, USA, 10-14 March 2014 Hong-Linh Truong, Schahram Dustdar, "Principles of Software-defined Elastic Systems for Big Data Analytics", (c) IEEE Computer Society, IEEE International Workshop on Software Defined Systems, 2014 IEEE International Conference on Cloud Engineering (IC2E 2014), Boston, Massachusetts, USA, 10-14 March 2014
  14. 14. Complex dependencies in (big) data analytics Elasticity principles can be used to support this! Elasticity principles can be used to support this! ASE Summer 2014 14
  15. 15. Elasticity Principles: Elasticity of data and computational models  Multiple types of objects from different sources with complex dependencies, relevancies, and quality  Different data and computational models the same analytics subject  New analytics subjects can be defined and analytics goals can be changed  Decide/select/define/compose not only computational models for analytics subjects but also data models based on existing ones Management and modeling of elasticity of data and computational model during the analytics Management and modeling of elasticity of data and computational model during the analytics ASE Summer 2014 15
  16. 16. Elasticity Principles: Elasticity of data resources  Data provided, managed and shared by different providers  Data associated with different concerns (cost, quality of data, privacy, contract, etc.  Static data, open data, data-as-a-service, opportunistic data (from sensors and human sensing)  Not just centralized big data and total data ownership Data resources can be taken into account in an elastic manger: similar to VMs, based on their quality, relevancy, pricing, etc. Data resources can be taken into account in an elastic manger: similar to VMs, based on their quality, relevancy, pricing, etc. ASE Summer 2014 16
  17. 17. Elasticity Principles: Elasticity of humans and software as computing units  Human in the loop to solve analytics tasks that software cannot solve  Human-based compute units can be scaled up/down with different cost, availability, performance models  Human-based compute units + software-based compute units for executing computational models  Elasticity controls can be also done by humans Provisioning hybrid compute units in an elastic way for computational/data/network tasks as well as for monitoring/control tasks in the analytics process Provisioning hybrid compute units in an elastic way for computational/data/network tasks as well as for monitoring/control tasks in the analytics process ASE Summer 2014 17
  18. 18. Elasticity Principles: Elasticity of quality of results  Definition of quality of results  Trade-offs of time, cost, quality of data, forms of output  Using quality of results to select suitable computational models, data resources, computing units  Multi-level control for the elasticity based on quality of results Able to cope with changes in quality of data, performance, cost and types of results at runtime Able to cope with changes in quality of data, performance, cost and types of results at runtime ASE Summer 2014 18
  19. 19. WE NEED TO START FROM DATA ANALYTICS WITHIN A SINGLE SYSTEM ASE Summer 2014 19
  20. 20. Domain ADomain A Data analytics within a single system  They are complex enough but do not meet all requirements  In a single domain  Tightly coupled computing infrastructures  E.g., in the same cloud  Computation and data are close  Several concerns can be by-passed ASE Summer 2014 20 Data service unit Data Analytics Unit Not always provisioned under the „Service Unit“ model
  21. 21. Data analytics within a single system ASE Summer 2014 21 1. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 165-178. DOI=10.1145/1559845.1559865 http://doi.acm.org/10.1145/1559845.1559865 2. Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anand Kesari: S4: Distributed Stream Computing Platform. ICDM Workshops 2010: 170-177 3. Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel, Prabhat, and Rob D. Ryne. 2011. Parallel index and query for large scale data analysis. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, , Article 30 , 11 pages. DOI=10.1145/2063384.2063424 http://doi.acm.org/10.1145/2063384.2063424 4. Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, Prashant J. Shenoy: A platform for scalable one-pass analytics using MapReduce. SIGMOD Conference 2011: 985-996 5. Fabrizio Marozzo, Domenico Talia, Paolo Trunfio: A Cloud Framework for Parameter Sweeping Data Mining Applications. CloudCom 2011: 367-374 6. Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst: HaLoop: Efficient Iterative Data Processing on Large Clusters. PVLDB 3(1): 285-296 (2010) 1. Andrew Pavlo, Erik Paulson, Alexander Rasin, Daniel J. Abadi, David J. DeWitt, Samuel Madden, and Michael Stonebraker. 2009. A comparison of approaches to large-scale data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09), Carsten Binnig and Benoit Dageville (Eds.). ACM, New York, NY, USA, 165-178. DOI=10.1145/1559845.1559865 http://doi.acm.org/10.1145/1559845.1559865 2. Leonardo Neumeyer, Bruce Robbins, Anish Nair, Anand Kesari: S4: Distributed Stream Computing Platform. ICDM Workshops 2010: 170-177 3. Jerry Chou, Mark Howison, Brian Austin, Kesheng Wu, Ji Qiang, E. Wes Bethel, Arie Shoshani, Oliver Rübel, Prabhat, and Rob D. Ryne. 2011. Parallel index and query for large scale data analysis. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC '11). ACM, New York, NY, USA, , Article 30 , 11 pages. DOI=10.1145/2063384.2063424 http://doi.acm.org/10.1145/2063384.2063424 4. Boduo Li, Edward Mazur, Yanlei Diao, Andrew McGregor, Prashant J. Shenoy: A platform for scalable one-pass analytics using MapReduce. SIGMOD Conference 2011: 985-996 5. Fabrizio Marozzo, Domenico Talia, Paolo Trunfio: A Cloud Framework for Parameter Sweeping Data Mining Applications. CloudCom 2011: 367-374 6. Yingyi Bu, Bill Howe, Magdalena Balazinska, Michael D. Ernst: HaLoop: Efficient Iterative Data Processing on Large Clusters. PVLDB 3(1): 285-296 (2010) Some papers
  22. 22. Data analytics within a single system – some examples ASE Summer 2014 22 Message Passing Interface (MPI) + Cluster- based File system Message Passing Interface (MPI) + Cluster- based File system MapReduce + Google File System MapReduce + Google File System Hadoop + HDFSHadoop + HDFS Dryad+LINQDryad+LINQ Parallel Database (SQL/NonSQL) Parallel Database (SQL/NonSQL) Yahoo S4Yahoo S4 WorkflowWorkflow A short, good overview in Chapter 6: Cloud Programming and Software Environments, Book: Distributed and Cloud Computing – from Parallel Processing to the Internet of Things, Kai Hwang, Geoffrey C. Fox and Jack J Dongarra, Morgan Kaufmann, 2012 A short, good overview in Chapter 6: Cloud Programming and Software Environments, Book: Distributed and Cloud Computing – from Parallel Processing to the Internet of Things, Kai Hwang, Geoffrey C. Fox and Jack J Dongarra, Morgan Kaufmann, 2012
  23. 23. Discussion time ASE Summer 2014 23 WHY ANALYTICS UNITS SHOULD BE „CLOSED“ TO DATA UNITS? WHICH CONCERNS COULD BE IGNORE IN SINGLE SYSTEM DATA ANALYTICS? WHICH ISSUES WE NEED TO CONSIDER WHEN OUR DATA UNITS ARE IN DIFFERENT SYSTEMS?
  24. 24. Data analytics across multiple systems – design choice  Programming models for data analytics service  Data service units  Supporting middleware units ASE Summer 2014 24 Programming model System Infrastrucure Interface
  25. 25. Data analytics across multiple systems – programming models (1) Static data ASE Summer 2014 25 Local input data Analytics Results MapReduce/Hadoop Workflow MPI Other solutions Servers/Cloud/Cluster What are our design concerns?What are our design concerns? Input data Output data
  26. 26. Stockmarket Social media M2M Stockmarket Social media M2M Data analytics across multiple systems – programming models (2) Near-realtime data ASE Summer 2014 26 Analytics Results Complex event processing Stream data analysis Other solutions Servers/Cloud/Cluster Input data Output data What are our design concerns?What are our design concerns?
  27. 27. Big data (e.g., satellite images) Big data (e.g., satellite images) Data analytics across multiple systems – programming models (3) Near-realtime data ASE Summer 2014 27 Analytics Results MPI Workflow Other solutions Servers/Cloud/Cluster Input data Output data What are our design concerns?What are our design concerns?
  28. 28. Data analytics across multiple systems – data service units ASE Summer 2014 28 Cluster file LustreLustreNFSNFS Data Analytics Unit • Read/write data via direct , low-level read/write via IO Interface • Cluster or cluster of clusters • Can be very large System • Usually parallel processing Programming model Hadoop File SystemHadoop File System Google file systemGoogle file system Read/write data
  29. 29. Data analytics across multiple systems – data service units ASE Summer 2014 29 Storage-as-a- Service Google Storage Service (REST API) Google Storage Service (REST API) Amazon S3 (SOAP/REST API) Amazon S3 (SOAP/REST API) Data Analytics Unit • Direct data transfer via REST/SOAP APIs Interface • Decouple between analytics and storage System • May require middleware for data transfer • Request via SOAP/REST • Real data transfer done by external middleware • A rich set of programming models can be used Programming model commands data
  30. 30. Data analytics across multiple systems – data service units ASE Summer 2014 30 Database-as-a- Service SkySQL Amazon RDS Microsoft SQL Azure Clustrix DBaaS SkySQL Amazon RDS Microsoft SQL Azure Clustrix DBaaS MongoDB/MongoLab Amazon DynamoDB Amazon SimpleDB Cloudant Data MongoDB/MongoLab Amazon DynamoDB Amazon SimpleDB Cloudant Data Data Analytics Unit • REST/SOAP APIs • Mainly for commands and results Interface • Decouple between analytics unit and database • Database as a sevice can be very large System • Analytics can be done at both sides • Analytic units can use any programming models • Database-as-a-service can perform a lot of analytics • Parallel database operations Programming model Technology queries data
  31. 31. Data analytics across multiple systems – data service units ASE Summer 2014 31 DaaS Infochimps Microsoft Azure Xively GNIP Infochimps Microsoft Azure Xively GNIP Data Analytics Unit • Data transfer can be uni or bi- direction • REST/SOAP APIs Interface • Both systems for DaaS and for analytics units can be very large System • Can be any Programming model Technology
  32. 32. Middleware service unit for transfering large data -- GlobusOnline ASE Summer 2014 32 Source: Bryce Allen, John Bresnahan, Lisa Childers, Ian Foster, Gopi Kandaswamy, Raj Kettimuthu, Jack Kordas, Mike Link, Stuart Martin, Karl Pickett, and Steven Tuecke. 2012. Software as a service for data scientists. Commun. ACM 55, 2 (February 2012), 81-88. DOI=10.1145/2076450.2076468 http://doi.acm.org/10.1145/2076450.2076468 Source: Bryce Allen, John Bresnahan, Lisa Childers, Ian Foster, Gopi Kandaswamy, Raj Kettimuthu, Jack Kordas, Mike Link, Stuart Martin, Karl Pickett, and Steven Tuecke. 2012. Software as a service for data scientists. Commun. ACM 55, 2 (February 2012), 81-88. DOI=10.1145/2076450.2076468 http://doi.acm.org/10.1145/2076450.2076468
  33. 33. Middleware service unit for transfering large data -- ProxyWS ASE Summer 2014 33 Spiros Koulouzis, Reginald Cushing, K. A. Karasavvas, Adam Belloum, Marian Bubak: Enabling Web Services to Consume and Produce Large Datasets. IEEE Internet Computing 16(1): 52-60 (2012) Spiros Koulouzis, Reginald Cushing, K. A. Karasavvas, Adam Belloum, Marian Bubak: Enabling Web Services to Consume and Produce Large Datasets. IEEE Internet Computing 16(1): 52-60 (2012)
  34. 34. Middleware service units for messages/queuing  Advanced Message Queuing Protocol (AMQP)  Simple (or Streaming) Text Orientated Messaging Protocol (STOMP)  Specific protocols/APIs ASE Summer 2014 34 Amazon SQSAmazon SQSStormMQStormMQ RabbitMQRabbitMQ
  35. 35. SOME EXAMPLES OF COMPLEX DATA ANALYTICS SERVICE ASE Summer 2014 35
  36. 36. The SMAD distributed processing architecture 36ASE Summer 2014
  37. 37. Different possibilities: Grids and/or clouds  Raw images stored in archival: iRODS, HTTP server, or Amazon S3  Notification  Any queuing system: on-premise or cloud-based service  Reference images:  Local/pre-deployed or deployed on demand  Computation: set of workstations, cluster, EC2, etc.  Sentinel-1 images and SSM storage:  Local files, cloud storage, iRODs, etc.  Result notification and sharing: to whom? At which scale? 37 The choices are also strongly dependent on “collaboration needs” and money! But how easy data sharing is? The choices are also strongly dependent on “collaboration needs” and money! But how easy data sharing is? ASE Summer 2014
  38. 38. Prototype 38 PBS on Vienna Scientific Cluster (vsc.ac.at), In total ~ 4000 cores ASE Summer 2014
  39. 39. 39 Illustrative experiment (1) ASE Summer 2014
  40. 40. Sustainability governance analysis ASE Summer 2014 40 Cities, e.g. including: 10000+ buildings 1000000+ sensors Near realtime analytics Near realtime analytics Predictive data analytics Visual Analytics Enterprise Resource Planning Enterprise Resource Planning Emergency Management Emergency Management Internet/public cloud boundary Organization-specific boundary Tracking/Log istics Tracking/Log istics Infrastructure Monitoring Infrastructure Monitoring Infrastructure/Internet of Things ......
  41. 41. DaaS for sustainability governance  Monitoring data DaaS  Domain-specific knowledge DaaS ASE Summer 2014 41 Hong-Linh Truong, Schahram Dustdar , Sustainability Data and Analytics in Cloud-Based M2M Systems, Big Data and Internet of Things: A Roadmap for Smart Environments Studies in Computational Intelligence Volume 546, 2014, pp 343-365 Hong-Linh Truong, Schahram Dustdar , Sustainability Data and Analytics in Cloud-Based M2M Systems, Big Data and Internet of Things: A Roadmap for Smart Environments Studies in Computational Intelligence Volume 546, 2014, pp 343-365
  42. 42. Platform-as-a-Service for Sustainability Governance • different types of analytics application models, such as batch, workflow and stream applications and intelligent bots  different programming models and languages  For analytics of large-scale data but also bot-as- a-service ASE Summer 2014 42
  43. 43. Cloud-based Sustainability governance analysis framework ASE Summer 2014 43
  44. 44. Cloud-based Sustainability governance analysis framework ASE Summer 2014 44
  45. 45. HOW TO DEAL WITH COST AND QUALITY OF COMPLEX SERVICES? Discussion time ASE Summer 2014 45
  46. 46. 46 46 Examples of our complex data analytics  Bio-mechanic applications  Simulate the stiffness of human bones  Data and computation intensive applications  Sequential and parallel programs (e.g., parfe and paraview),  Complex software installation: Parmetis, Trilinos, Parfe, Paraview, and HDF5  run under batch and interactive modes ASE Summer 2014
  47. 47. Composable evaluation approach  We test with „cost“ ASE Summer 2014 47 Part A Part B ...... Part N
  48. 48. Dealing with performance and cost of complex applications in clouds  Application complexity  Elastic high performance applications on multiple clouds: libraries, software services, virtual machines, etc.  Cost and performance are needed for determining which parts of the application should be excuted in the clouds and when  Cost/performance model complexity  Coarse- and fine-grained cost models of clouds at different layers:  Too coarse-grained (networks, storages, machines) or too fine-grained (IO calls)  Software-, data-, human-specific cost/performance models  Cost models for individual parts (workflow, MPI, OpenMP, etc.) Tran Vu Pham, Hong-Linh Truong, Schahram Dustdar "Elastic High Performance Applications - A Composition Framework", The 2011 Asia-Pacific Services Computing Conference (IEEE APSCC 2011), (c) IEEE Computer Society, December 12 - 15, 2011, Jeju, Korea Hong Linh Truong, Schahram Dustdar: Composable cost estimation and monitoring for computational applications in cloud computing environments. Procedia CS 1(1): 2175-2184 (2010) Tran Vu Pham, Hong-Linh Truong, Schahram Dustdar "Elastic High Performance Applications - A Composition Framework", The 2011 Asia-Pacific Services Computing Conference (IEEE APSCC 2011), (c) IEEE Computer Society, December 12 - 15, 2011, Jeju, Korea Hong Linh Truong, Schahram Dustdar: Composable cost estimation and monitoring for computational applications in cloud computing environments. Procedia CS 1(1): 2175-2184 (2010) ASE Summer 2014 48
  49. 49. Composable cost evaluation Part A Part B Part C Cost/performance model i Cost/performance model j Cost/performance model k Runtime: Elastic processes Elastic high performance applications on multiple clouds: libraries, software services, virtual machines, etc. Utilize different performance and dependencies models for sequential, parallel, workflows, etc. ASE Summer 2014 49
  50. 50. Composable cost evaluation: Estimation and Monitoring  Leverage our previous knowledge on event representations, application monitoring, performance analysis, dependability analysis  Employ service-oriented approach  RESTful service, JSON and XML event data 50 50ASE Summer 2014
  51. 51. Event Representations and Instrumentation  Captured monitoring events based on a well-defined specification – Well-known instrumentation techniques can be reused  Consider different application execution models (e.g., MPI, workflows, etc.) ASE Summer 2014 51
  52. 52. Composable cost evaluation -- Fine-grained composition cost models 52 52ASE Summer 2014
  53. 53. Composable cost evaluation -- Illustrative experiments 53 Examples with the Bones application ASE Summer 2014
  54. 54. Simple Cost Estimation - Examples  aaaa 54ASE Summer 2014 54
  55. 55. Online Cost Monitoring - Examples  One experiment of a bioinformatic workflow in EC2  Support runtime cost-based composition and execution 55ASE Summer 2014 55
  56. 56. Exercises  Read mentioned papers  Analyze the relationships between programming models and system infrastructures for data analytics across multiple domains  Examine http://cloudcomputingpatterns.org and see how it supports data analytics patterns  Develop some patterns for data analytics across multiple systems  Work on composable cost evaluation for complex data analytics ASE Summer 2014 56
  57. 57. 57 Thanks for your attention Hong-Linh Truong Distributed Systems Group Vienna University of Technology truong@dsg.tuwien.ac.at dsg.tuwien.ac.at/staff/truong ASE Summer 2014

×