SlideShare a Scribd company logo
1 of 26
Download to read offline
EEDC
                           34330
                                   Self-Adapting, Energy-
Execution                          Conserving Distributed
Environments for                        File Systems
Distributed
Computing
European Master in Distributed
Computing - EMDC




                                          EEDC Presentation
                                    Mário Almeida– 4knahs[@]gmail.com
                                          www.marioalmeida.eu
Outline
●   Introduction                     ●   Conclusions
     ○ Green Computing               ●   References
     ○ Distributed File Systems
     ○ DFS issues
●   Hadoop Distributed File System
     ○ Overview
     ○ Evaluation
●   Green HDFS
     ○ Overview
     ○ Design
     ○ Goal
     ○ Energy-management
        policies
     ○ Machine learning
     ○ Evaluation
                             *
Introduction - Green Computing
●   Environmentally sustainable computing with minimal
    impact on the environment.
●   Reduction of the energy consumption, the GreenHouse
    Gas emissions and the operational costs.




                            *
Introduction - Distributed FS
●   A Distributed File System (DFS) is any file system that
    allows access to files from multiple hosts sharing via a
    computer network.
●   May include facilities for transparent replication and
    fault tolerance.




                               *
Introduction - DFS Issues
●   Distributed File Systems are often built to run on a large
    number of commodity servers.
●   Which means that:
     ○ it generates heat and consumes large amounts of
       energy.
     ○ costs are dependent on the initial acquisition costs
       and power, cooling, etc.




                                *
Introduction - DFS Issues
●   Common approach:
     ○ Scale-Down -Transitioning servers into low power
       consumption states.
     ○ Other approaches not exclusive to DFS might
       include renewable energy, free cooling, etc.




                             *
HDFS Overview
●   Hadoop Distributed File System (HDFS) is the primary
    storage system used by Hadoop applications.

●   HDFS creates multiple replicas of data blocks and
    distributes them on compute nodes throughout a cluster
    of enable reliable, extremely rapid computations.




                              *
HDFS Evaluation

In 2010, a detailed analysis of files was done in a
production Yahoo! Hadoop cluster with the following
characteristics:

●   2600 servers
●   34 million files
●   Over 5 PB of data
●   3 months of observation




                              *
HDFS Evaluation

Key observations:
●   Files are heterogeneous in access and lifespan patterns.
●   60% of data is "cold" or dormant.
●   95-98% of files have a very short "hotness" lifespan of
    less than 3 days.
●   90% of files were dormant or "cold" for more than 18
    days.
●   Majority of the data had a news-server-like access
    pattern.




                              *
GHDFS Overview

●   Self-Adaptive - depends only on HDFS and file access
    patterns
●   Applies Data-Classification techniques
●   Energy-Aware placement of data
●   Trades cost, performance and power by separating
    cluster into logical zones.




                             *
GHDFS Design


    Hot Zone               Cold Zone

  Files currently       Files with low to
  accessed and             rare access
  newly created
                        Low energy use
High energy usage        and Sleeping
 and performance            mode


                    *
GHDFS - Management Policies

GreenHDFS uses three different management
policies:

●   FMP - File Migration Policy           Hot Zone

●   SCP - Server Power Conserver Policy
                                          Cold Zone
●   FRP - File Reversal Policy




                              *
GHDFS - File Migration Policy
●   FMP monitors the dormancy of files
●   Runs in the Hot Zone

●   Gives higher storage effiency for the Hot Zone as less
    accessed files are moved to the Cold Zone

                      Coldness > Threshold



    Hot Zone                                  Cold Zone

                      Hotness > Threshold



                                *
GHDFS - Power Conserver Policy
 ●   SCP runs in the ColdZone
 ●   Determines which servers can go to stanby/sleep mode.

 ●   Uses hardware techniques to transfer CPU, Disks and
     FRAM into low power state.

 ●   Wakes the server up only if:
     ○ Data on that server is accessed
     ○ New data needs to be placed on that server




                         Cold
                         Zone

                             *
GHDFS - File Reversal Policy
 ●   FRP runs in the ColdZone.
 ●   Ensures QoS, bandwidth and response time is well
     managed in case a file becomes popular.




                     #accesses > Threshold
 Hot Zone                                    Cold Zone



                               *
GHDFS - Machine Learning
●   Designing and developing algorithms that allow
    computers to evolve behaviors based on empirical
    data.
●   Recognize patterns and make decisions based on data.




                             *
GHDFS - Machine Learning
GHDFS uses:
 ● Supervised machine learning.
 ● A variant of Multiple Linear Regression to find the
   statistical correlation between directory and file
   attributions.
 ● Training data preparation - audit logs and metadata.
 ● Predicts the files Lifespan, Size and Heat upon creation
   of file.

It works because there is a high correlation between the
directory hierarchy and file attributes in a well-laid out and
partitioned name space!!



                                *
GHDFS - Machine Learning




               *
GHDFS - Evaluation




               *
GHDFS - Evaluation




               *
GHDFS - Evaluation




               *
GHDFS - Evaluation




               *
GHDFS - Evaluation

●   Energy consumption reduced by 24% and saved $2.1
    millions saved in energy costs per annum (38000
    servers).

●   Maximizes the usage of the power budget by allowing
    the infrastructure to expand. More Hot Zone servers
    offer more availability and performance.




                             *
Conclusions
●   Machine learning can be applied for a predictive self-
    managed energy control system that achieves better
    results than reactive approaches.

●   Good Energy Management Policies can result in high
    savings in energy consumption.

●   Data-Classification techniques can help achieving a
    better energy-aware placement of data in Distributed
    File Systems.

●   The presented techniques applied in conjunction to
    other more common green computing technologies can
    impact significantly the maintenance costs of the cluster.
                               *
References

●   GreenHDFS : Torwards an Energy-Conserving Storage-
    Efficient, Hybrid Hadoop Compute Cluster
●   Evaluation and Analysis of GreenHDFS: A Self-
    Adaptive, Energy-Conserving Variant of the Hadoop
    Distributed File System
●   Predictive Data and Energy Management in
    GreenHDFS
●   The Hadoop Distributed File System
●   Introduction to Machine Learning (Adaptive Computation
    and Machine Learning)




                             *
EEDC
                           34330
                                   Self-Adapting, Energy-
Execution                          Conserving Distributed
Environments for                        File Systems
Distributed
Computing
European Master in Distributed
Computing - EMDC




                                          EEDC Presentation
                                    Mário Almeida– 4knahs[@]gmail.com
                                          www.marioalmeida.eu

More Related Content

What's hot

Distribution File System DFS Technologies
Distribution File System DFS TechnologiesDistribution File System DFS Technologies
Distribution File System DFS TechnologiesRaphael Ejike
 
Distributed file system
Distributed file systemDistributed file system
Distributed file systemNaza hamed Jan
 
Distributed File Systems: An Overview
Distributed File Systems: An OverviewDistributed File Systems: An Overview
Distributed File Systems: An OverviewAnant Narayanan
 
Unit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File SystemUnit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File SystemNandakumar P
 
Distributed File System
Distributed File SystemDistributed File System
Distributed File SystemNtu
 
Distributed file system
Distributed file systemDistributed file system
Distributed file systemJanani S
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems Maurvi04
 
11 distributed file_systems
11 distributed file_systems11 distributed file_systems
11 distributed file_systemslongly
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirementsAbDul ThaYyal
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systemsAbDul ThaYyal
 
file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada umardanjumamaiwada
 
File models and file accessing models
File models and file accessing modelsFile models and file accessing models
File models and file accessing modelsishmecse13
 
Chapter 10 - File System Interface
Chapter 10 - File System InterfaceChapter 10 - File System Interface
Chapter 10 - File System InterfaceWayne Jones Jnr
 
Presentation on nfs,afs,vfs
Presentation on nfs,afs,vfsPresentation on nfs,afs,vfs
Presentation on nfs,afs,vfsPrakriti Dubey
 
File service architecture and network file system
File service architecture and network file systemFile service architecture and network file system
File service architecture and network file systemSukhman Kaur
 
4.file service architecture (1)
4.file service architecture (1)4.file service architecture (1)
4.file service architecture (1)AbDul ThaYyal
 

What's hot (20)

Distribution File System DFS Technologies
Distribution File System DFS TechnologiesDistribution File System DFS Technologies
Distribution File System DFS Technologies
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Distributed file systems dfs
Distributed file systems   dfsDistributed file systems   dfs
Distributed file systems dfs
 
Distributed File Systems: An Overview
Distributed File Systems: An OverviewDistributed File Systems: An Overview
Distributed File Systems: An Overview
 
Unit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File SystemUnit 3.1 cs6601 Distributed File System
Unit 3.1 cs6601 Distributed File System
 
Distributed File System
Distributed File SystemDistributed File System
Distributed File System
 
Distributed file system
Distributed file systemDistributed file system
Distributed file system
 
Distributed File Systems
Distributed File Systems Distributed File Systems
Distributed File Systems
 
11 distributed file_systems
11 distributed file_systems11 distributed file_systems
11 distributed file_systems
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
 
12. dfs
12. dfs12. dfs
12. dfs
 
Chapter 8 distributed file systems
Chapter 8 distributed file systemsChapter 8 distributed file systems
Chapter 8 distributed file systems
 
file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada file sharing semantics by Umar Danjuma Maiwada
file sharing semantics by Umar Danjuma Maiwada
 
File models and file accessing models
File models and file accessing modelsFile models and file accessing models
File models and file accessing models
 
Coda file system
Coda file systemCoda file system
Coda file system
 
5.distributed file systems
5.distributed file systems5.distributed file systems
5.distributed file systems
 
Chapter 10 - File System Interface
Chapter 10 - File System InterfaceChapter 10 - File System Interface
Chapter 10 - File System Interface
 
Presentation on nfs,afs,vfs
Presentation on nfs,afs,vfsPresentation on nfs,afs,vfs
Presentation on nfs,afs,vfs
 
File service architecture and network file system
File service architecture and network file systemFile service architecture and network file system
File service architecture and network file system
 
4.file service architecture (1)
4.file service architecture (1)4.file service architecture (1)
4.file service architecture (1)
 

Viewers also liked

Machine Learning in the age of Big Data
Machine Learning in the age of Big DataMachine Learning in the age of Big Data
Machine Learning in the age of Big DataDaniel Sârbe
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storageadunne
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSMSunView Software, Inc.
 
Recommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and HadoopRecommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and HadoopDataWorks Summit
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsWayne Jones Jnr
 
7 distributed and real systems
7 distributed and real systems7 distributed and real systems
7 distributed and real systemsmyrajendra
 
Distributed system & its characteristic
Distributed system & its characteristicDistributed system & its characteristic
Distributed system & its characteristicAkash Rai
 
Data Storage Formats in Hadoop
Data Storage Formats in HadoopData Storage Formats in Hadoop
Data Storage Formats in HadoopBotond Balázs
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDomino Data Lab
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceLukas Masuch
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed SystemsRupsee
 

Viewers also liked (12)

Machine Learning in the age of Big Data
Machine Learning in the age of Big DataMachine Learning in the age of Big Data
Machine Learning in the age of Big Data
 
Wuala, P2P Online Storage
Wuala, P2P Online StorageWuala, P2P Online Storage
Wuala, P2P Online Storage
 
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM[Webinar] How Big Data and Machine Learning Are Transforming ITSM
[Webinar] How Big Data and Machine Learning Are Transforming ITSM
 
Recommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and HadoopRecommender System at Scale Using HBase and Hadoop
Recommender System at Scale Using HBase and Hadoop
 
Chapter 17 - Distributed File Systems
Chapter 17 - Distributed File SystemsChapter 17 - Distributed File Systems
Chapter 17 - Distributed File Systems
 
7 distributed and real systems
7 distributed and real systems7 distributed and real systems
7 distributed and real systems
 
Distributed system & its characteristic
Distributed system & its characteristicDistributed system & its characteristic
Distributed system & its characteristic
 
Data Storage Formats in Hadoop
Data Storage Formats in HadoopData Storage Formats in Hadoop
Data Storage Formats in Hadoop
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up Seattle
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 

Similar to Self-Adapting, Energy-Conserving Distributed File Systems

Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfsshrey mehrotra
 
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelOSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelNETWAYS
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Mellanox Technologies
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Junping Du
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFSUSE Italy
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Ontico
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxAnkitChauhan817826
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingRutuja751147
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Doug O'Flaherty
 
Evaluation and analysis of green hdfs a self-adaptive, energy-conserving var...
Evaluation and analysis of green hdfs  a self-adaptive, energy-conserving var...Evaluation and analysis of green hdfs  a self-adaptive, energy-conserving var...
Evaluation and analysis of green hdfs a self-adaptive, energy-conserving var...João Gabriel Lima
 
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache SparkIRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache SparkIRJET Journal
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopYahoo Developer Network
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringErik Krogen
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2BradDesAulniers2
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.pptvijayapraba1
 

Similar to Self-Adapting, Energy-Conserving Distributed File Systems (20)

Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo SeidelOSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
OSDC 2012 | Extremes Wolken Dateisystem!? by Dr. Udo Seidel
 
Unit-3.pptx
Unit-3.pptxUnit-3.pptx
Unit-3.pptx
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions Big Data Benchmarking with RDMA solutions
Big Data Benchmarking with RDMA solutions
 
Training
TrainingTraining
Training
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMFGestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
Gestione gerarchica dei dati con SUSE Enterprise Storage e HPE DMF
 
Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...Key trends in Big Data and new reference architecture from Hewlett Packard En...
Key trends in Big Data and new reference architecture from Hewlett Packard En...
 
Unit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptxUnit-1 Introduction to Big Data.pptx
Unit-1 Introduction to Big Data.pptx
 
Introduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud ComputingIntroduction to Data Storage and Cloud Computing
Introduction to Data Storage and Cloud Computing
 
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
Introducing IBM Spectrum Scale 4.2 and Elastic Storage Server 3.5
 
Evaluation and analysis of green hdfs a self-adaptive, energy-conserving var...
Evaluation and analysis of green hdfs  a self-adaptive, energy-conserving var...Evaluation and analysis of green hdfs  a self-adaptive, energy-conserving var...
Evaluation and analysis of green hdfs a self-adaptive, energy-conserving var...
 
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache SparkIRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
IRJET- A Novel Approach to Process Small HDFS Files with Apache Spark
 
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in HadoopOctober 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
October 2016 HUG: The Pillars of Effective Data Archiving and Tiering in Hadoop
 
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage TieringHadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
Hadoop Meetup Jan 2019 - Router-Based Federation and Storage Tiering
 
Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2Elastic storage in the cloud session 5224 final v2
Elastic storage in the cloud session 5224 final v2
 
HDFS_architecture.ppt
HDFS_architecture.pptHDFS_architecture.ppt
HDFS_architecture.ppt
 

More from Mário Almeida

Empirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingEmpirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingMário Almeida
 
Android reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeAndroid reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeMário Almeida
 
High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)Mário Almeida
 
Flume impact of reliability on scalability
Flume impact of reliability on scalabilityFlume impact of reliability on scalability
Flume impact of reliability on scalabilityMário Almeida
 
Dimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsDimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsMário Almeida
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelizationMário Almeida
 
Man-In-The-Browser attacks
Man-In-The-Browser attacksMan-In-The-Browser attacks
Man-In-The-Browser attacksMário Almeida
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News AggregatorMário Almeida
 
Exploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsExploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsMário Almeida
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksMário Almeida
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytraceMário Almeida
 
Architecting a cloud scale identity fabric
Architecting a cloud scale identity fabricArchitecting a cloud scale identity fabric
Architecting a cloud scale identity fabricMário Almeida
 

More from Mário Almeida (14)

Empirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application SchedulingEmpirical Study of Android Alarm Usage for Application Scheduling
Empirical Study of Android Alarm Usage for Application Scheduling
 
Android reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skypeAndroid reverse engineering - Analyzing skype
Android reverse engineering - Analyzing skype
 
Spark
SparkSpark
Spark
 
High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)High-Availability of YARN (MRv2)
High-Availability of YARN (MRv2)
 
Flume impact of reliability on scalability
Flume impact of reliability on scalabilityFlume impact of reliability on scalability
Flume impact of reliability on scalability
 
Dimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache SimulationsDimemas and Multi-Level Cache Simulations
Dimemas and Multi-Level Cache Simulations
 
Smith waterman algorithm parallelization
Smith waterman algorithm parallelizationSmith waterman algorithm parallelization
Smith waterman algorithm parallelization
 
Man-In-The-Browser attacks
Man-In-The-Browser attacksMan-In-The-Browser attacks
Man-In-The-Browser attacks
 
Flume-based Independent News Aggregator
Flume-based Independent News AggregatorFlume-based Independent News Aggregator
Flume-based Independent News Aggregator
 
Exploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed SystemsExploiting Availability Prediction in Distributed Systems
Exploiting Availability Prediction in Distributed Systems
 
High Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing NetworksHigh Availability of Services in Wide-Area Shared Computing Networks
High Availability of Services in Wide-Area Shared Computing Networks
 
Instrumenting parsecs raytrace
Instrumenting parsecs raytraceInstrumenting parsecs raytrace
Instrumenting parsecs raytrace
 
Architecting a cloud scale identity fabric
Architecting a cloud scale identity fabricArchitecting a cloud scale identity fabric
Architecting a cloud scale identity fabric
 
SOAP vs REST
SOAP vs RESTSOAP vs REST
SOAP vs REST
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Self-Adapting, Energy-Conserving Distributed File Systems

  • 1. EEDC 34330 Self-Adapting, Energy- Execution Conserving Distributed Environments for File Systems Distributed Computing European Master in Distributed Computing - EMDC EEDC Presentation Mário Almeida– 4knahs[@]gmail.com www.marioalmeida.eu
  • 2. Outline ● Introduction ● Conclusions ○ Green Computing ● References ○ Distributed File Systems ○ DFS issues ● Hadoop Distributed File System ○ Overview ○ Evaluation ● Green HDFS ○ Overview ○ Design ○ Goal ○ Energy-management policies ○ Machine learning ○ Evaluation *
  • 3. Introduction - Green Computing ● Environmentally sustainable computing with minimal impact on the environment. ● Reduction of the energy consumption, the GreenHouse Gas emissions and the operational costs. *
  • 4. Introduction - Distributed FS ● A Distributed File System (DFS) is any file system that allows access to files from multiple hosts sharing via a computer network. ● May include facilities for transparent replication and fault tolerance. *
  • 5. Introduction - DFS Issues ● Distributed File Systems are often built to run on a large number of commodity servers. ● Which means that: ○ it generates heat and consumes large amounts of energy. ○ costs are dependent on the initial acquisition costs and power, cooling, etc. *
  • 6. Introduction - DFS Issues ● Common approach: ○ Scale-Down -Transitioning servers into low power consumption states. ○ Other approaches not exclusive to DFS might include renewable energy, free cooling, etc. *
  • 7. HDFS Overview ● Hadoop Distributed File System (HDFS) is the primary storage system used by Hadoop applications. ● HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster of enable reliable, extremely rapid computations. *
  • 8. HDFS Evaluation In 2010, a detailed analysis of files was done in a production Yahoo! Hadoop cluster with the following characteristics: ● 2600 servers ● 34 million files ● Over 5 PB of data ● 3 months of observation *
  • 9. HDFS Evaluation Key observations: ● Files are heterogeneous in access and lifespan patterns. ● 60% of data is "cold" or dormant. ● 95-98% of files have a very short "hotness" lifespan of less than 3 days. ● 90% of files were dormant or "cold" for more than 18 days. ● Majority of the data had a news-server-like access pattern. *
  • 10. GHDFS Overview ● Self-Adaptive - depends only on HDFS and file access patterns ● Applies Data-Classification techniques ● Energy-Aware placement of data ● Trades cost, performance and power by separating cluster into logical zones. *
  • 11. GHDFS Design Hot Zone Cold Zone Files currently Files with low to accessed and rare access newly created Low energy use High energy usage and Sleeping and performance mode *
  • 12. GHDFS - Management Policies GreenHDFS uses three different management policies: ● FMP - File Migration Policy Hot Zone ● SCP - Server Power Conserver Policy Cold Zone ● FRP - File Reversal Policy *
  • 13. GHDFS - File Migration Policy ● FMP monitors the dormancy of files ● Runs in the Hot Zone ● Gives higher storage effiency for the Hot Zone as less accessed files are moved to the Cold Zone Coldness > Threshold Hot Zone Cold Zone Hotness > Threshold *
  • 14. GHDFS - Power Conserver Policy ● SCP runs in the ColdZone ● Determines which servers can go to stanby/sleep mode. ● Uses hardware techniques to transfer CPU, Disks and FRAM into low power state. ● Wakes the server up only if: ○ Data on that server is accessed ○ New data needs to be placed on that server Cold Zone *
  • 15. GHDFS - File Reversal Policy ● FRP runs in the ColdZone. ● Ensures QoS, bandwidth and response time is well managed in case a file becomes popular. #accesses > Threshold Hot Zone Cold Zone *
  • 16. GHDFS - Machine Learning ● Designing and developing algorithms that allow computers to evolve behaviors based on empirical data. ● Recognize patterns and make decisions based on data. *
  • 17. GHDFS - Machine Learning GHDFS uses: ● Supervised machine learning. ● A variant of Multiple Linear Regression to find the statistical correlation between directory and file attributions. ● Training data preparation - audit logs and metadata. ● Predicts the files Lifespan, Size and Heat upon creation of file. It works because there is a high correlation between the directory hierarchy and file attributes in a well-laid out and partitioned name space!! *
  • 18. GHDFS - Machine Learning *
  • 23. GHDFS - Evaluation ● Energy consumption reduced by 24% and saved $2.1 millions saved in energy costs per annum (38000 servers). ● Maximizes the usage of the power budget by allowing the infrastructure to expand. More Hot Zone servers offer more availability and performance. *
  • 24. Conclusions ● Machine learning can be applied for a predictive self- managed energy control system that achieves better results than reactive approaches. ● Good Energy Management Policies can result in high savings in energy consumption. ● Data-Classification techniques can help achieving a better energy-aware placement of data in Distributed File Systems. ● The presented techniques applied in conjunction to other more common green computing technologies can impact significantly the maintenance costs of the cluster. *
  • 25. References ● GreenHDFS : Torwards an Energy-Conserving Storage- Efficient, Hybrid Hadoop Compute Cluster ● Evaluation and Analysis of GreenHDFS: A Self- Adaptive, Energy-Conserving Variant of the Hadoop Distributed File System ● Predictive Data and Energy Management in GreenHDFS ● The Hadoop Distributed File System ● Introduction to Machine Learning (Adaptive Computation and Machine Learning) *
  • 26. EEDC 34330 Self-Adapting, Energy- Execution Conserving Distributed Environments for File Systems Distributed Computing European Master in Distributed Computing - EMDC EEDC Presentation Mário Almeida– 4knahs[@]gmail.com www.marioalmeida.eu