SlideShare a Scribd company logo
1 of 49
Overview of Cloud Computing and Workflow
Research in NGSP Group
Dr. Dong YUAN
Research Fellow
Swinburne University of Technology
Melbourne, Australia
Outline
> SUCCESS Centre and NGSP Group
> Background: Big Data, Cloud Computing and Workflow
> Research Topics
– Data Management in Cloud Computing
– Performance Management in Scientific Workflows
– Security and Privacy Protection in the Cloud
– SwinDeW-C Cloud Workflow System
The Centre of SUCCESS
> SUCCESS: Swinburne University Centre for Computing
and Engineering Software Systems
– SUCCESS is the “NO.1” Software Engineering Centre in
Australia
– SUCCESS is one of the 7 Tire 1 Centres at Swinburne
University of Technology (Times World Ranking: 351- 400,
Academic Ranking of World Universities: 301- 400)
> The ambition of the Centre is to become the top centre
for software research in the Southern Hemisphere
within the next five years.
3
SUCCESS
> Research Focus Areas
– Knowledge and Data Intensive Systems
– Nature of Software
– Next Generation Software Platforms
– SE Education and IBL/RBL
– Software Analysis and Testing
– Software R&D Group
> http://www.swinburne.edu.au/ict/success/research-
expertise/
4
NGSP (Small) Group Overview
> We conduct research into cloud computing and workflow
technologies for complex software systems and services.
> Members:
Leader:
Prof Yun Yang
(PC Member for
ICSE 07/08, FSE09
ICSE 10/11/12)
Researchers:
Dr Xiao Liu (Postdoc, China)
Dr Dong Yuan (Postdoc)
Gaofeng Zhang
Wenhao Li
Dahai Cao
Jofry Hadi SUTANTO
Antonio Giardina
Others:
Prof John Grundy
Prof Chengfei Liu
5
Visitors:
Prof Lee Osterweil
Prof Lori Clarke
Prof Ivan Stojmenovic
Prof Paola Inverardi
Prof Amit Sheth
Prof Wil van der Aalst
Prof Hai Jin
Prof Hai Zhuge
> Primary projects:
– (Cloud) workflow technology: Scheduling and temporal analysis in cloud
workflows
• ARC LP0990393 (Y Yang, R Kotagiri, J Chen, C Liu)
– Cloud computing: Intermediate data management in cloud computing
• ARC DP110101340 (Y Yang, J Chen, J Grundy)
> Secondary project:
– Management control systems for effective information sharing and
security in government organisations
• ARC LP110100228 (S Cugenasen, Y Yang)
R&D Projects – Grants
6
> SwinDeW workflow family including SwinDeW-C
– Architectures / Models (D Cao)
– Scheduling / Data and service management (D Yuan, X Liu)
– Verification / Exception handling (X Liu)
> Cloud computing:
– Data management (D Yuan, X Liu, W Li)
– Privacy and Security (G Zhang, X Zhang, C Liu)
R&D Projects – Overview
7
> J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic
Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on
Software Engineering and Methodology, 20(3), 2011
> X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific
Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805-
825, Nov./Dec. 2011.
> D. Yuan, Y. Yang, X. Liu and J. Chen, On demand Minimum Cost Benchmarking for‑
Intermediate Datasets Storage in Scientific Cloud Workflow Systems. Journal of Parallel
and Distributed Computing, 71:(316-332), 2011
> J. Chen and Y. Yang, Localising Temporal Constraints in Scientific Workflows. Journal of
Computer and System Sciences, Elsevier, 76(6):464-474, Sept. 2010
> G. Zhang, Y. Yang and J. Chen, A Historical Probability based Noise Generation Strategy for
Privacy Protection in Cloud Computing. Journal of Computer and System Sciences,
Elsevier, published online, Dec. 2011.
> Another 8 A* papers are currently under review…
Some Recent ERA A* Ranked Publications
8
Part 1: Outline
> SUCCESS Centre and NGSP Group
> Background: Big Data, Cloud Computing and Workflow
> Research Topics
– Data Management in Cloud Computing
– Performance Management in Scientific Workflows
– Security and Privacy Protection in the Cloud
– SwinDeW-C Cloud Workflow System
Big Data
> Data explosion
– TB (1012
), PB(1015
), exabyte (EB, 1018
), zettabyte (ZB, 1021
), yottabyte (YB,1024
)
– The total amount of global data in 2010:
– Google processes ? data everyday in 2009:
– Every day, Facebook 10T, Twitter 7T, Youtube 4.5T
> Moore's law vs. data explosion speed
– Application data double every year over the next decade and further -
[Szalay et al. Nature, 2006]
> Buzzwords: data storage, data processing, parallel, distributed,
virtualisation, commodity machines, energy consumption, data
centres, utility computing, software (everything) as a service
10
1.2 ZB
24 PB
11
Example: Pulsar Searching
> Astrophysics: pulsar searching
> Pulsars: the collapsed cores of stars that were once more massive than 6-10 times
the mass of the Sun
> http://astronomy.swin.edu.au/cosmos/P/Pulsar
> Parkes Radio Telescope (http://www.parkes.atnf.csiro.au/)
> Swinburne Astrophysics group (http://astronomy.swinburne.edu.au/) has been
conducting pulsar searching surveys (http://astronomy.swin.edu.au/pulsar/) based
on the observation data from Parkes Radio Telescope.
> Typical scientific workflow which involves a large number of data and computation
intensive activities. For a single searching process, the average data volume (not
including the raw stream data from the telescope) is over 4 terabytes and the
average execution time is about 23 hours on Swinburne high performance
supercomputing facility (http://astronomy.swinburne.edu.au/supercomputing/).
left: Image of the Crab Nebula taken with
the Palomar telescope
right: A close up of the Crab Pulsar from
the Hubble Space Telescope
Credit: Jeff Hester and Paul Scowen
(Arizona State University) and NASA
Pulsar Searching Workflow
12
Dr. Willem
van Straten
Benefits of Clouds
> No upfront infrastructure investment
– No procuring hardware, setup, hosting, power, etc..
> On demand access
– Lease what you need and when you need..
> Efficient Resource Allocation
– Globally shared infrastructure …
> Nice Pricing
– Based on Usage, QoS, Supply and Demand, Loyalty, …
> Application Acceleration
– Parallelism for large-scale data analysis…
> Highly Availability, Scalable, and Energy Efficient
> Supports Creation of 3rd Party Services & Seamless offering
– Builds on infrastructure and follows similar Business model as Cloud
13
SwinDeW Workflow Series
SwinDeW – Swinburne Decentralised Workflow
- foundation prototype based on p2p
– SwinDeW – past
– SwinDeW-S (for Services) – past
– SwinDeW-B (for BPEL4WS) – past
– SwinDeW-G (for Grid) – past
– SwinDeW-A (for Agents) – past
– SwinDeW-V (for Verification) – current
– SwinDeW-C (for Cloud) – current
Part 1: Outline
> SUCCESS Centre and NGSP Group
> Background: Big Data, Cloud Computing and Workflow
> Research Topics
– Data Management in Cloud Computing
– Performance Management in Scientific Workflows
– Security and Privacy Protection in the Cloud
– SwinDeW-C Cloud Workflow System
16
Dr. Dong Yuan
http://www.ict.swin.edu.au/personal/dyuan/
Data Management in Cloud
Computing
Research Topics
Data Management in Cloud Computing
> Scientific applications in cloud computing
– Computation and data intensive applications
– Excessive computation and storage resources
– Pay-as-you-go model
> Three aspects of data management in the cloud
– Data storage
– Data placement
– Data replication
Data Storage
> Developing smart data storage strategies for reducing
the cost of storing big data in the cloud
– Data regeneration (computation and storage
trade-off)
– Data de-duplication
– Data compression
> Researcher: Dong Yuan
Publications
> D. Yuan, Y. Yang, X. Liu, J. Chen, On demand Minimum Cost Benchmarking for‑
Intermediate Datasets Storage in Scientific Cloud Workflow Systems, Journal of
Parallel and Distributed Computing, Elsevier, vol. 71(2), pp. 316-332, 2011.
> D. Yuan, Y. Yang, X. Liu, G. Zhang, J. Chen, A Data Dependency Based Strategy
for Intermediate Data Storage in Scientific Cloud Workflow Systems, Concurrency
and Computation: Practice and Experience, Wiley, 24(9), pp. 956-976, Jun. 2012.
> D. Yuan, Y. Yang, X. Liu, J. Chen, A Cost-Effective Strategy for Intermediate Data
Storage in Scientific Cloud Workflow Systems, Proc. of 24th IEEE International
Parallel & Distributed Processing Symposium (IPDPS10), Atlanta, USA, Apr. 2010.
> D. Yuan, Y. Yang, X. Liu and J. Chen, A Local-Optimisation based Strategy for
Cost-Effective Datasets Storage of Scientific Applications in the Cloud, Proc. of 4th
IEEE International Conference on Cloud Computing (Cloud2011), Washington DC,
USA, July 4-9, 2011.
Data Placement
> Smart data placement strategies to reduce
application cost
– Data correlation based strategy to reduce
bandwidth cost
– Data usage based strategy to reduce storage cost
> Researchers: Dong Yuan, Jofry Hadi SUTANTO,
Antonio Giardina
Publications
> D. Yuan, Y. Yang, X. Liu, J. Chen, A Data Placement Strategy in
Scientific Cloud Workflows, Future Generation Computer Systems,
Elsevier, vol. 26(8), pp. 1200-1214, 2010.
Data Replication
> To cost-effectively assure data reliability in the cloud
– Dynamic replication strategy
– Proactively checking based replication strategy
> Researchers: Wenhao Li, Dong Yuan
Publications
> W. Li, Y. Yang and D. Yuan, A Novel Cost-effective Dynamic Data
Replication Strategy for Reliability in Cloud Data Centres. Proc. of
International Conference on Cloud and Green Computing (CGC2011),
pages 496-502, Sydney, Australia, Dec. 2011.
> W. Li, Y. Yang, J. Chen and D. Yuan, A Cost-Effective Mechanism for
Cloud Data Reliability Management based on Proactive Replica
Checking. Proc. of 12th IEEE/ACM International Symposium on Cluster,
Cloud and Grid Computing (CCGrid2012), pages 564-571, Ottawa,
Canada, May 2012.
Dr. Xiao Liu
http://www.ict.swin.edu.au/personal/xliu/
Performance Management in
Scientific Workflows
Research Topics
25
Workflow QoS
> QoS dimensions
– time, cost, fidelity, reliability, security …
> QoS of Cloud Services
> Workflow QoS
– the overall QoS for a collection of cloud services
– but not simply add up!
26
Temporal QoS
> System performance
– Response time
– Throughput
> Temporal constraints
– Global constraints: deadlines
– Local constraints: milestones, individual activity durations
> Satisfactory temporal QoS
– High performance: fast response, high throughput
– On-time completion: low temporal violation rate
27
Problem Analysis
> Setting temporal constraints
– Prerequisite: effective forecasting of activity durations
> Monitoring temporal consistency state
– Monitor workflow execution state
– Detect potential temporal violations
> Temporal violation handling
– Where to conduct violation handling
– What strategies to be used
Temporal Framework
28
Forecasting Activity Durations
> Statistical time-series pattern based forecasting strategies
> Selected Publications:
– X. Liu, Z. Ni, D. Yuan, Y. Jiang, Z. Wu, J. Chen, Y. Yang, A Novel
Statistical Time-Series Pattern based Interval Forecasting Strategy
for Activity Durations in Workflow Systems, Journal of Systems and
Software (JSS), vol. 84, no. 3, Pages 354-376, March 2011.
– X. Liu, J. Chen, K. Liu and Y. Yang, Forecasting Duration Intervals of
Scientific Workflow Activities based on Time-Series Patterns, Proc.
of 4th IEEE International Conference on e-Science (e-Science08),
pages 23-30, Indianapolis, USA, Dec. 2008.
29
Setting Temporal Constraints
> Probability based temporal consistency model
> Time analysis based on Stochastic Petri Nets
> Selected Publications:
– X. Liu, Z. Ni, J. Chen, Y. Yang, A Probabilistic Strategy for Temporal
Constraint Management in Scientific Workflow Systems,
Concurrency and Computation: Practice and Experience (CCPE),
Wiley, 23(16):1893-1919, Nov. 2011 .
– X. Liu, J. Chen and Y. Yang, A Probabilistic Strategy for Setting
Temporal Constraints in Scientific Workflows, Proc. 6th International
Conference on Business Process Management (BPM2008), Lecture
Notes in Computer Science, Vol. 5240, pages 180-195, Milan, Italy,
Sept. 2008.
30
Temporal Consistency Monitoring
> Minimum (Probability) Time Redundancy based Checkpoint Selection
Strategy
> Temporal Dependency based Checkpoint Selection Strategy
> Selected Publications:
– X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal
Violations in Scientific Workflows: Where and How. IEEE
Transactions on Software Engineering, 37(6):805-825, Nov./Dec.
2011.
– J. Chen and Y. Yang, Temporal Dependency based Checkpoint
Selection for Dynamic Verification of Temporal Constraints in
Scientific Workflow Systems. ACM Transactions on Software
Engineering and Methodology, 20(3), 2011
Violation Handling
> Violation Handling Point Selection
> (Probability) Time deficit allocation
> Workflow local rescheduling strategy – ACO, GA, PSO
> Selected Publications:
– X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen and Y. Yang, A Novel General Framework
for Automatic and Cost-Effective Handling of Recoverable Temporal Violations in
Scientific Workflow Systems, Journal of Systems and Software, vol. 84, no. 3, pp.
492-509, 2011
32
33
Gaofeng Zhang
gzhang@swin.edu.au
Security and Privacy Protection
in the Cloud
Research Topics
Background
> Data Security vs. Data Privacy
> Privacy in cloud computing
– Massive data store and compute in open cloud environment
– Customers cannot control inside cloud
The severity of privacy risk in cloud computing
 One specific privacy risk in cloud computing
– Indirectly private information (collectively information)
– Normal service processes and functions (not disruption)
The approach: noise obfuscation for privacy protection
Privacy Protection in Cloud
> Roles in the view of privacy in regular IT system
– Privacy owner, Privacy user and Privacy theft
Privacy owner
Privacy theft
Privacy user
Keep safe
between Privacy
owner and
Privacy
user!
Privacy Protection in Cloud
> Roles in the view of privacy in Cloud
– Privacy owner, privacy user and privacy theft
Privacy owner
Privacy theft
Privacy user
Virtualisation
disable the
“keeping safe
between Privacy
owner and Privacy
user!”
Noise Obfuscation(1)
> Background
– Massive data stores and computes in open cloud environments.
– Customers cannot control inside cloud.
> Main idea: “Dilute” real private information with noise information
– Not noise signal!
Noise Obfuscation(2)
> A Motivating example:
– One customer, who often travels to one city in Australia, like ‘Sydney’, checks the
weather report regularly from a weather service in cloud environments before
departure. The frequent appearance of service requests about the weather report for
‘Sydney’ can reveal the privacy that the customer usually goes to ‘Sydney’. But if a
system aids the customer to inject other requests like ‘Perth’ or ‘Darwin’ into the
‘Sydney’ queue, the service provider cannot distinguish which ones are real and
which ones are ‘noise’ as it just sees a similar style of service request. These
requests should be responded and cannot reveal the location privacy of the
customer. In such cases, the privacy can be protected by noise obfuscation in
general.
From ‘data’ privacy to ‘process’ privacy!
> Noise Generation
– Historical probability based noise generation strategy
– Time-series pattern based noise generation strategy
– Association probability based noise generation strategy
– ……
> Noise Utilisation
– Trust model and injection strategy for noise obfuscation
– ……
> Noise Cooperation Mechanism
– Privacy protection framework under noise obfuscation
Research Topics
Publications
> G. Zhang, Y. Yang and J. Chen, A Historical Probability based Noise Generation
Strategy for Privacy Protection in Cloud Computing. Journal of Computer and
System Sciences, Elsevier, 78(5):1374-1381, Sept. 2012.
> G. Zhang, Y. Yang, D. Yuan and J. Chen, A Trust-based Noise Injection
Strategy for Privacy Protection in Cloud Computing. Software: Practice and
Experience , Wiley, 42(4):431-445, Apr. 2012.
> G. Zhang, Y. Yang, X. Liu and J. Chen, A Time-series Pattern based Noise
Generation Strategy for Privacy Protection in Cloud Computing. Proc. of 12th
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
(CCGrid2012), pages 458-465, Ottawa, Canada, May 2012.
> G. Zhang, X. Zhang, Y. Yang, C. Liu and J. Chen, An Association Probability
based Noise Generation Strategy for Privacy Protection in Cloud Computing.
Proc. 10th International Conference on Service Oriented Computing
(ICSoC2012), pages 639-647, Shanghai, China, Nov. 2012. (accepted on
13/7/2012)
41
Dahai Cao
dcao@swin.edu.au
Cloud Workflow System
Design and Development
Research Topics
SwinCloud – Cloud Computing Testbed
> SwinCloud
42
General cloud workflow reference model
Prototype : SwinDeW-C (Peer-to-Peer)Ⅰ
> SwinDeW-C
44
Prototype : SwinFlow-Cloud (Centralised)Ⅱ
Cloud workflow implementation
> Client system
– Process definition tools
– Rule editor
– Organisation modelling tools
– Office calendar management tools
– Authority group tools
– User management tools
– Form designing tools
– Tool agent definition tools
– Simulation tools
New Progress
> Successfully deploy on the Amazon Cloud
> Eucalyptus: the cloud infrastructure platform
A Book
End
> Questions?

More Related Content

What's hot

Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problemsinside-BigData.com
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...Amanda Whitmire
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Robert Grossman
 
Trust threads : Active Curation and Publishing in SEAD
Trust threads : Active Curation and Publishing in SEADTrust threads : Active Curation and Publishing in SEAD
Trust threads : Active Curation and Publishing in SEADBeth Plale
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...SEAD
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsJason Hattrick-Simpers
 
Trust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceTrust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceBeth Plale
 
accelerating-data-driven
accelerating-data-drivenaccelerating-data-driven
accelerating-data-drivenJoshua Chudy
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
A Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremA Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremAnthonyOtuonye
 
Integrating compression technique for data mining
Integrating compression technique for data  miningIntegrating compression technique for data  mining
Integrating compression technique for data miningDr.Manmohan Singh
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardPacificResearchPlatform
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 

What's hot (20)

Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Massive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World ProblemsMassive-Scale Analytics Applied to Real-World Problems
Massive-Scale Analytics Applied to Real-World Problems
 
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
IDCC Workshop: Analysing DMPs to inform research data services: lessons from ...
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
Trust threads : Active Curation and Publishing in SEAD
Trust threads : Active Curation and Publishing in SEADTrust threads : Active Curation and Publishing in SEAD
Trust threads : Active Curation and Publishing in SEAD
 
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...Data Sets, Ensemble Cloud Computing, and the University Library:Getting the ...
Data Sets, Ensemble Cloud Computing, and the University Library: Getting the ...
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
Trust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail ScienceTrust threads: Provenance for Data Reuse in Long Tail Science
Trust threads: Provenance for Data Reuse in Long Tail Science
 
Medical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructuresMedical image analysis, retrieval and evaluation infrastructures
Medical image analysis, retrieval and evaluation infrastructures
 
accelerating-data-driven
accelerating-data-drivenaccelerating-data-driven
accelerating-data-driven
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
A Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE TheoremA Model Design of Big Data Processing using HACE Theorem
A Model Design of Big Data Processing using HACE Theorem
 
Integrating compression technique for data mining
Integrating compression technique for data  miningIntegrating compression technique for data  mining
Integrating compression technique for data mining
 
UWA Research Week 2016
UWA Research Week 2016UWA Research Week 2016
UWA Research Week 2016
 
Challenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL modelChallenges in medical imaging and the VISCERAL model
Challenges in medical imaging and the VISCERAL model
 
British Library Datasets Programme Feb 2011
British Library Datasets Programme Feb 2011British Library Datasets Programme Feb 2011
British Library Datasets Programme Feb 2011
 
NERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie BardNERSC, AI and the Superfacility, Debbie Bard
NERSC, AI and the Superfacility, Debbie Bard
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 

Similar to Data storage in Cloud computing

Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Daniel S. Katz
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaUniversity of Washington
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceLizLyon
 
Your Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumYour Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumAnnemiekvdKuil
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012IUPUI
 
Paul Jeffreys - Research Integrity: Institutional Responsibility
Paul Jeffreys - Research Integrity: Institutional ResponsibilityPaul Jeffreys - Research Integrity: Institutional Responsibility
Paul Jeffreys - Research Integrity: Institutional ResponsibilityJisc
 
April_2024_Top_10_Read_Articles_in_D.pdf
April_2024_Top_10_Read_Articles_in_D.pdfApril_2024_Top_10_Read_Articles_in_D.pdf
April_2024_Top_10_Read_Articles_in_D.pdfijdms
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGGeoffrey Fox
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudeSAT Journals
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Dr. Aparna Varde
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data CenterGilles Fedak
 
Cloud middleware and services-a systematic mapping review
Cloud middleware and services-a systematic mapping reviewCloud middleware and services-a systematic mapping review
Cloud middleware and services-a systematic mapping reviewjournalBEEI
 

Similar to Data storage in Cloud computing (20)

Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)Working towards Sustainable Software for Science (an NSF and community view)
Working towards Sustainable Software for Science (an NSF and community view)
 
Observlets
Observlets Observlets
Observlets
 
User engagement in research data curation
User engagement in research data curationUser engagement in research data curation
User engagement in research data curation
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
SomeSlides
SomeSlidesSomeSlides
SomeSlides
 
UK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalfaceUK Digital Curation Centre: enabling research data management at the coalface
UK Digital Curation Centre: enabling research data management at the coalface
 
Your Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.DatacentrumYour Research Data Management with the support of 3TU.Datacentrum
Your Research Data Management with the support of 3TU.Datacentrum
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
Paul Jeffreys - Research Integrity: Institutional Responsibility
Paul Jeffreys - Research Integrity: Institutional ResponsibilityPaul Jeffreys - Research Integrity: Institutional Responsibility
Paul Jeffreys - Research Integrity: Institutional Responsibility
 
April_2024_Top_10_Read_Articles_in_D.pdf
April_2024_Top_10_Read_Articles_in_D.pdfApril_2024_Top_10_Read_Articles_in_D.pdf
April_2024_Top_10_Read_Articles_in_D.pdf
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
NIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWGNIST Big Data Public Working Group NBD-PWG
NIST Big Data Public Working Group NBD-PWG
 
Introduction to Research Data Management
Introduction to Research Data ManagementIntroduction to Research Data Management
Introduction to Research Data Management
 
Anonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloudAnonymization of data using mapreduce on cloud
Anonymization of data using mapreduce on cloud
 
Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management Knowledge Discovery in Environmental Management
Knowledge Discovery in Environmental Management
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Big Data, Beyond the Data Center
Big Data, Beyond the Data CenterBig Data, Beyond the Data Center
Big Data, Beyond the Data Center
 
Resume
ResumeResume
Resume
 
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
Introduction to Research Data Management - 2014-02-26 - Mathematical, Physica...
 
Cloud middleware and services-a systematic mapping review
Cloud middleware and services-a systematic mapping reviewCloud middleware and services-a systematic mapping review
Cloud middleware and services-a systematic mapping review
 

Recently uploaded

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 

Recently uploaded (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 

Data storage in Cloud computing

  • 1. Overview of Cloud Computing and Workflow Research in NGSP Group Dr. Dong YUAN Research Fellow Swinburne University of Technology Melbourne, Australia
  • 2. Outline > SUCCESS Centre and NGSP Group > Background: Big Data, Cloud Computing and Workflow > Research Topics – Data Management in Cloud Computing – Performance Management in Scientific Workflows – Security and Privacy Protection in the Cloud – SwinDeW-C Cloud Workflow System
  • 3. The Centre of SUCCESS > SUCCESS: Swinburne University Centre for Computing and Engineering Software Systems – SUCCESS is the “NO.1” Software Engineering Centre in Australia – SUCCESS is one of the 7 Tire 1 Centres at Swinburne University of Technology (Times World Ranking: 351- 400, Academic Ranking of World Universities: 301- 400) > The ambition of the Centre is to become the top centre for software research in the Southern Hemisphere within the next five years. 3
  • 4. SUCCESS > Research Focus Areas – Knowledge and Data Intensive Systems – Nature of Software – Next Generation Software Platforms – SE Education and IBL/RBL – Software Analysis and Testing – Software R&D Group > http://www.swinburne.edu.au/ict/success/research- expertise/ 4
  • 5. NGSP (Small) Group Overview > We conduct research into cloud computing and workflow technologies for complex software systems and services. > Members: Leader: Prof Yun Yang (PC Member for ICSE 07/08, FSE09 ICSE 10/11/12) Researchers: Dr Xiao Liu (Postdoc, China) Dr Dong Yuan (Postdoc) Gaofeng Zhang Wenhao Li Dahai Cao Jofry Hadi SUTANTO Antonio Giardina Others: Prof John Grundy Prof Chengfei Liu 5 Visitors: Prof Lee Osterweil Prof Lori Clarke Prof Ivan Stojmenovic Prof Paola Inverardi Prof Amit Sheth Prof Wil van der Aalst Prof Hai Jin Prof Hai Zhuge
  • 6. > Primary projects: – (Cloud) workflow technology: Scheduling and temporal analysis in cloud workflows • ARC LP0990393 (Y Yang, R Kotagiri, J Chen, C Liu) – Cloud computing: Intermediate data management in cloud computing • ARC DP110101340 (Y Yang, J Chen, J Grundy) > Secondary project: – Management control systems for effective information sharing and security in government organisations • ARC LP110100228 (S Cugenasen, Y Yang) R&D Projects – Grants 6
  • 7. > SwinDeW workflow family including SwinDeW-C – Architectures / Models (D Cao) – Scheduling / Data and service management (D Yuan, X Liu) – Verification / Exception handling (X Liu) > Cloud computing: – Data management (D Yuan, X Liu, W Li) – Privacy and Security (G Zhang, X Zhang, C Liu) R&D Projects – Overview 7
  • 8. > J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on Software Engineering and Methodology, 20(3), 2011 > X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805- 825, Nov./Dec. 2011. > D. Yuan, Y. Yang, X. Liu and J. Chen, On demand Minimum Cost Benchmarking for‑ Intermediate Datasets Storage in Scientific Cloud Workflow Systems. Journal of Parallel and Distributed Computing, 71:(316-332), 2011 > J. Chen and Y. Yang, Localising Temporal Constraints in Scientific Workflows. Journal of Computer and System Sciences, Elsevier, 76(6):464-474, Sept. 2010 > G. Zhang, Y. Yang and J. Chen, A Historical Probability based Noise Generation Strategy for Privacy Protection in Cloud Computing. Journal of Computer and System Sciences, Elsevier, published online, Dec. 2011. > Another 8 A* papers are currently under review… Some Recent ERA A* Ranked Publications 8
  • 9. Part 1: Outline > SUCCESS Centre and NGSP Group > Background: Big Data, Cloud Computing and Workflow > Research Topics – Data Management in Cloud Computing – Performance Management in Scientific Workflows – Security and Privacy Protection in the Cloud – SwinDeW-C Cloud Workflow System
  • 10. Big Data > Data explosion – TB (1012 ), PB(1015 ), exabyte (EB, 1018 ), zettabyte (ZB, 1021 ), yottabyte (YB,1024 ) – The total amount of global data in 2010: – Google processes ? data everyday in 2009: – Every day, Facebook 10T, Twitter 7T, Youtube 4.5T > Moore's law vs. data explosion speed – Application data double every year over the next decade and further - [Szalay et al. Nature, 2006] > Buzzwords: data storage, data processing, parallel, distributed, virtualisation, commodity machines, energy consumption, data centres, utility computing, software (everything) as a service 10 1.2 ZB 24 PB
  • 11. 11 Example: Pulsar Searching > Astrophysics: pulsar searching > Pulsars: the collapsed cores of stars that were once more massive than 6-10 times the mass of the Sun > http://astronomy.swin.edu.au/cosmos/P/Pulsar > Parkes Radio Telescope (http://www.parkes.atnf.csiro.au/) > Swinburne Astrophysics group (http://astronomy.swinburne.edu.au/) has been conducting pulsar searching surveys (http://astronomy.swin.edu.au/pulsar/) based on the observation data from Parkes Radio Telescope. > Typical scientific workflow which involves a large number of data and computation intensive activities. For a single searching process, the average data volume (not including the raw stream data from the telescope) is over 4 terabytes and the average execution time is about 23 hours on Swinburne high performance supercomputing facility (http://astronomy.swinburne.edu.au/supercomputing/). left: Image of the Crab Nebula taken with the Palomar telescope right: A close up of the Crab Pulsar from the Hubble Space Telescope Credit: Jeff Hester and Paul Scowen (Arizona State University) and NASA
  • 12. Pulsar Searching Workflow 12 Dr. Willem van Straten
  • 13. Benefits of Clouds > No upfront infrastructure investment – No procuring hardware, setup, hosting, power, etc.. > On demand access – Lease what you need and when you need.. > Efficient Resource Allocation – Globally shared infrastructure … > Nice Pricing – Based on Usage, QoS, Supply and Demand, Loyalty, … > Application Acceleration – Parallelism for large-scale data analysis… > Highly Availability, Scalable, and Energy Efficient > Supports Creation of 3rd Party Services & Seamless offering – Builds on infrastructure and follows similar Business model as Cloud 13
  • 14. SwinDeW Workflow Series SwinDeW – Swinburne Decentralised Workflow - foundation prototype based on p2p – SwinDeW – past – SwinDeW-S (for Services) – past – SwinDeW-B (for BPEL4WS) – past – SwinDeW-G (for Grid) – past – SwinDeW-A (for Agents) – past – SwinDeW-V (for Verification) – current – SwinDeW-C (for Cloud) – current
  • 15. Part 1: Outline > SUCCESS Centre and NGSP Group > Background: Big Data, Cloud Computing and Workflow > Research Topics – Data Management in Cloud Computing – Performance Management in Scientific Workflows – Security and Privacy Protection in the Cloud – SwinDeW-C Cloud Workflow System
  • 16. 16 Dr. Dong Yuan http://www.ict.swin.edu.au/personal/dyuan/ Data Management in Cloud Computing Research Topics
  • 17. Data Management in Cloud Computing > Scientific applications in cloud computing – Computation and data intensive applications – Excessive computation and storage resources – Pay-as-you-go model > Three aspects of data management in the cloud – Data storage – Data placement – Data replication
  • 18. Data Storage > Developing smart data storage strategies for reducing the cost of storing big data in the cloud – Data regeneration (computation and storage trade-off) – Data de-duplication – Data compression > Researcher: Dong Yuan
  • 19. Publications > D. Yuan, Y. Yang, X. Liu, J. Chen, On demand Minimum Cost Benchmarking for‑ Intermediate Datasets Storage in Scientific Cloud Workflow Systems, Journal of Parallel and Distributed Computing, Elsevier, vol. 71(2), pp. 316-332, 2011. > D. Yuan, Y. Yang, X. Liu, G. Zhang, J. Chen, A Data Dependency Based Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems, Concurrency and Computation: Practice and Experience, Wiley, 24(9), pp. 956-976, Jun. 2012. > D. Yuan, Y. Yang, X. Liu, J. Chen, A Cost-Effective Strategy for Intermediate Data Storage in Scientific Cloud Workflow Systems, Proc. of 24th IEEE International Parallel & Distributed Processing Symposium (IPDPS10), Atlanta, USA, Apr. 2010. > D. Yuan, Y. Yang, X. Liu and J. Chen, A Local-Optimisation based Strategy for Cost-Effective Datasets Storage of Scientific Applications in the Cloud, Proc. of 4th IEEE International Conference on Cloud Computing (Cloud2011), Washington DC, USA, July 4-9, 2011.
  • 20. Data Placement > Smart data placement strategies to reduce application cost – Data correlation based strategy to reduce bandwidth cost – Data usage based strategy to reduce storage cost > Researchers: Dong Yuan, Jofry Hadi SUTANTO, Antonio Giardina
  • 21. Publications > D. Yuan, Y. Yang, X. Liu, J. Chen, A Data Placement Strategy in Scientific Cloud Workflows, Future Generation Computer Systems, Elsevier, vol. 26(8), pp. 1200-1214, 2010.
  • 22. Data Replication > To cost-effectively assure data reliability in the cloud – Dynamic replication strategy – Proactively checking based replication strategy > Researchers: Wenhao Li, Dong Yuan
  • 23. Publications > W. Li, Y. Yang and D. Yuan, A Novel Cost-effective Dynamic Data Replication Strategy for Reliability in Cloud Data Centres. Proc. of International Conference on Cloud and Green Computing (CGC2011), pages 496-502, Sydney, Australia, Dec. 2011. > W. Li, Y. Yang, J. Chen and D. Yuan, A Cost-Effective Mechanism for Cloud Data Reliability Management based on Proactive Replica Checking. Proc. of 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2012), pages 564-571, Ottawa, Canada, May 2012.
  • 24. Dr. Xiao Liu http://www.ict.swin.edu.au/personal/xliu/ Performance Management in Scientific Workflows Research Topics
  • 25. 25 Workflow QoS > QoS dimensions – time, cost, fidelity, reliability, security … > QoS of Cloud Services > Workflow QoS – the overall QoS for a collection of cloud services – but not simply add up!
  • 26. 26 Temporal QoS > System performance – Response time – Throughput > Temporal constraints – Global constraints: deadlines – Local constraints: milestones, individual activity durations > Satisfactory temporal QoS – High performance: fast response, high throughput – On-time completion: low temporal violation rate
  • 27. 27 Problem Analysis > Setting temporal constraints – Prerequisite: effective forecasting of activity durations > Monitoring temporal consistency state – Monitor workflow execution state – Detect potential temporal violations > Temporal violation handling – Where to conduct violation handling – What strategies to be used
  • 29. Forecasting Activity Durations > Statistical time-series pattern based forecasting strategies > Selected Publications: – X. Liu, Z. Ni, D. Yuan, Y. Jiang, Z. Wu, J. Chen, Y. Yang, A Novel Statistical Time-Series Pattern based Interval Forecasting Strategy for Activity Durations in Workflow Systems, Journal of Systems and Software (JSS), vol. 84, no. 3, Pages 354-376, March 2011. – X. Liu, J. Chen, K. Liu and Y. Yang, Forecasting Duration Intervals of Scientific Workflow Activities based on Time-Series Patterns, Proc. of 4th IEEE International Conference on e-Science (e-Science08), pages 23-30, Indianapolis, USA, Dec. 2008. 29
  • 30. Setting Temporal Constraints > Probability based temporal consistency model > Time analysis based on Stochastic Petri Nets > Selected Publications: – X. Liu, Z. Ni, J. Chen, Y. Yang, A Probabilistic Strategy for Temporal Constraint Management in Scientific Workflow Systems, Concurrency and Computation: Practice and Experience (CCPE), Wiley, 23(16):1893-1919, Nov. 2011 . – X. Liu, J. Chen and Y. Yang, A Probabilistic Strategy for Setting Temporal Constraints in Scientific Workflows, Proc. 6th International Conference on Business Process Management (BPM2008), Lecture Notes in Computer Science, Vol. 5240, pages 180-195, Milan, Italy, Sept. 2008. 30
  • 31. Temporal Consistency Monitoring > Minimum (Probability) Time Redundancy based Checkpoint Selection Strategy > Temporal Dependency based Checkpoint Selection Strategy > Selected Publications: – X. Liu, Y. Yang, Y. Jiang and J. Chen, Preventing Temporal Violations in Scientific Workflows: Where and How. IEEE Transactions on Software Engineering, 37(6):805-825, Nov./Dec. 2011. – J. Chen and Y. Yang, Temporal Dependency based Checkpoint Selection for Dynamic Verification of Temporal Constraints in Scientific Workflow Systems. ACM Transactions on Software Engineering and Methodology, 20(3), 2011
  • 32. Violation Handling > Violation Handling Point Selection > (Probability) Time deficit allocation > Workflow local rescheduling strategy – ACO, GA, PSO > Selected Publications: – X. Liu, Z. Ni, Z. Wu, D. Yuan, J. Chen and Y. Yang, A Novel General Framework for Automatic and Cost-Effective Handling of Recoverable Temporal Violations in Scientific Workflow Systems, Journal of Systems and Software, vol. 84, no. 3, pp. 492-509, 2011 32
  • 33. 33 Gaofeng Zhang gzhang@swin.edu.au Security and Privacy Protection in the Cloud Research Topics
  • 34. Background > Data Security vs. Data Privacy > Privacy in cloud computing – Massive data store and compute in open cloud environment – Customers cannot control inside cloud The severity of privacy risk in cloud computing  One specific privacy risk in cloud computing – Indirectly private information (collectively information) – Normal service processes and functions (not disruption) The approach: noise obfuscation for privacy protection
  • 35. Privacy Protection in Cloud > Roles in the view of privacy in regular IT system – Privacy owner, Privacy user and Privacy theft Privacy owner Privacy theft Privacy user Keep safe between Privacy owner and Privacy user!
  • 36. Privacy Protection in Cloud > Roles in the view of privacy in Cloud – Privacy owner, privacy user and privacy theft Privacy owner Privacy theft Privacy user Virtualisation disable the “keeping safe between Privacy owner and Privacy user!”
  • 37. Noise Obfuscation(1) > Background – Massive data stores and computes in open cloud environments. – Customers cannot control inside cloud. > Main idea: “Dilute” real private information with noise information – Not noise signal!
  • 38. Noise Obfuscation(2) > A Motivating example: – One customer, who often travels to one city in Australia, like ‘Sydney’, checks the weather report regularly from a weather service in cloud environments before departure. The frequent appearance of service requests about the weather report for ‘Sydney’ can reveal the privacy that the customer usually goes to ‘Sydney’. But if a system aids the customer to inject other requests like ‘Perth’ or ‘Darwin’ into the ‘Sydney’ queue, the service provider cannot distinguish which ones are real and which ones are ‘noise’ as it just sees a similar style of service request. These requests should be responded and cannot reveal the location privacy of the customer. In such cases, the privacy can be protected by noise obfuscation in general. From ‘data’ privacy to ‘process’ privacy!
  • 39. > Noise Generation – Historical probability based noise generation strategy – Time-series pattern based noise generation strategy – Association probability based noise generation strategy – …… > Noise Utilisation – Trust model and injection strategy for noise obfuscation – …… > Noise Cooperation Mechanism – Privacy protection framework under noise obfuscation Research Topics
  • 40. Publications > G. Zhang, Y. Yang and J. Chen, A Historical Probability based Noise Generation Strategy for Privacy Protection in Cloud Computing. Journal of Computer and System Sciences, Elsevier, 78(5):1374-1381, Sept. 2012. > G. Zhang, Y. Yang, D. Yuan and J. Chen, A Trust-based Noise Injection Strategy for Privacy Protection in Cloud Computing. Software: Practice and Experience , Wiley, 42(4):431-445, Apr. 2012. > G. Zhang, Y. Yang, X. Liu and J. Chen, A Time-series Pattern based Noise Generation Strategy for Privacy Protection in Cloud Computing. Proc. of 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid2012), pages 458-465, Ottawa, Canada, May 2012. > G. Zhang, X. Zhang, Y. Yang, C. Liu and J. Chen, An Association Probability based Noise Generation Strategy for Privacy Protection in Cloud Computing. Proc. 10th International Conference on Service Oriented Computing (ICSoC2012), pages 639-647, Shanghai, China, Nov. 2012. (accepted on 13/7/2012)
  • 41. 41 Dahai Cao dcao@swin.edu.au Cloud Workflow System Design and Development Research Topics
  • 42. SwinCloud – Cloud Computing Testbed > SwinCloud 42
  • 43. General cloud workflow reference model
  • 44. Prototype : SwinDeW-C (Peer-to-Peer)Ⅰ > SwinDeW-C 44
  • 45. Prototype : SwinFlow-Cloud (Centralised)Ⅱ
  • 46. Cloud workflow implementation > Client system – Process definition tools – Rule editor – Organisation modelling tools – Office calendar management tools – Authority group tools – User management tools – Form designing tools – Tool agent definition tools – Simulation tools
  • 47. New Progress > Successfully deploy on the Amazon Cloud > Eucalyptus: the cloud infrastructure platform

Editor's Notes

  1. Location example
  2. Location example
  3. Location example
  4. Road map of research