SlideShare a Scribd company logo
1
2
3
4
5
6
Starting our complex problem
Awesome
Distribute file system
Lightning-Fast Cluster Computing
2
3
About goal
This workshop will help in understanding compute grid, it’s model and it’s
implementation. We have tried my best to explain the concepts in detail. The
programming language used for demo in here is Scala. And we then apply
this model to settle some problems we can see where this is simple and
useful for us.
1
• Lost of Data (Terabytes or Petabytes)
• Big data is the term of collection of data sets so large and complex that it
becomes difficult to process using on-hand database management tools or
traditional data processing application. The challenges include capture, curation,
storage, search, sharing, transfer, analysis and visualization.
• Systems/ Enterprises generate huge amount of data form Terabytes to and even
Petabytes of information.
NYSE generates about one terabyte of new trade
data/day to perform stock trading analytics to
determine trends for optimal trades.
7
• 2,500 exabytes of new information in 2013
with internet as primary driver
• Digital universe grew by 62% last year to
800K petabytes and will grow to 1.2
zettabytes this years
8
9
Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even
petabytes—of information.
Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big
data must be used as it streams into your enterprise in order to maximize its value.
Big data is any type of data - structured and unstructured data such as text, sensor data,
audio, video, click streams, log files and more. New insights are found when analyzing these
data types together.
1
2
3
4
10
Recommendation engines
Ad targeting
Search quality
Abuse and Click fraud detection
1
2
3
4
11
Customer churn prevention
Network performance optimization
Calling data record analysis
Analyzing network to predict failure
1
2
3
4
12
Health information exchange
Gene sequencing
Healthcare service quality improvements
Drug safety
1
2
3
4
13
Modeling true risk
Threat analysis
Fraud detection
Credit scoring and analysis
2
Apache Hadoop is a framework that allows for
distributed processing of large data sets across
clusters of commodity computers using a simple
programming model.
It is an Open-source Data Management with
scale-out storage & distributed processing.
19
1 2 3 4
20
21
22
• Splits a task across processors
• “Near” the data & assembles results
• Self-healing, high bandwidth
• Clustered storage
• JobTracker manages the TaskTrackers
• Distributed across nodes
• Natively redundant
• NameNode tracks locations.
23
3
Apache Spark is an open-source cluster-computing
framework for real time processing developed by the Apache
Software Foundation.
Spark provides an interface for programming entire clusters
with implicit data parallelism and fault-tolerance.
It was built on top of Hadoop MapReduce and it extends the
MapReduce model to efficiently use more types of
computations.
25
27
28
29
Map, flatMap, Filter, … Collect, Take, count, …
Resilient Distributed Datasets (RDD): Collection that can be operated in parallel.
4
31
Hadoop Mapreduce Spark
Time to sort 100TB data
32
Most active open source community in big data
200+ developers, 50+ companies contributing
33
34
4
Big data and computing grid

More Related Content

What's hot

Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
Adam Doyle
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
Multisoft Virtual Academy
 
BigData Analysis
BigData AnalysisBigData Analysis
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
Suman Saurabh
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
boorad
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
Muhammad Rifqi
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Tyrone Systems
 
Big Data
Big DataBig Data
Big Data
Neha Mehta
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
Qubole
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
SpringPeople
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Kristof Jozsa
 
Bigdata
BigdataBigdata
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
Amrit Chhetri
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Sitaram Kotnis
 
Thilga
ThilgaThilga
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
RojaT4
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Karan Desai
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
Kamalika Dutta
 

What's hot (20)

Great Expectations Presentation
Great Expectations PresentationGreat Expectations Presentation
Great Expectations Presentation
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
BigData Analysis
BigData AnalysisBigData Analysis
BigData Analysis
 
Big data analytics with Apache Hadoop
Big data analytics with Apache  HadoopBig data analytics with Apache  Hadoop
Big data analytics with Apache Hadoop
 
Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013Big Data Analysis Patterns - TriHUG 6/27/2013
Big Data Analysis Patterns - TriHUG 6/27/2013
 
Open source stak of big data techs open suse asia
Open source stak of big data techs   open suse asiaOpen source stak of big data techs   open suse asia
Open source stak of big data techs open suse asia
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data
Big DataBig Data
Big Data
 
5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance 5 Factors Impacting Your Big Data Project's Performance
5 Factors Impacting Your Big Data Project's Performance
 
Top Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practicesTop Big data Analytics tools: Emerging trends and Best practices
Top Big data Analytics tools: Emerging trends and Best practices
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Bigdata
BigdataBigdata
Bigdata
 
BigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Thilga
ThilgaThilga
Thilga
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 

Similar to Big data and computing grid

Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
almaraniabwmalk
 
BDA ( haoop ).pptx
BDA ( haoop ).pptxBDA ( haoop ).pptx
BDA ( haoop ).pptx
MangeshShukla3
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
Nathan Bijnens
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
HostedbyConfluent
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Companies and Apache Software
Big Data Companies and Apache SoftwareBig Data Companies and Apache Software
Big Data Companies and Apache Software
Bob Marcus
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
SoftServe
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Denodo
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
confluent
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
DATAVERSITY
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
AWS Germany
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
Debajani Mohanty
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
exponential-inc
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to work
Principled Technologies
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
Leandro Totino Pereira
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
RAHULRAHU8
 
White Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseWhite Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum Database
EMC
 
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
JAYAPRAKASH JPINFOTECH
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 

Similar to Big data and computing grid (20)

Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Lecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.pptLecture 5 - Big Data and Hadoop Intro.ppt
Lecture 5 - Big Data and Hadoop Intro.ppt
 
BDA ( haoop ).pptx
BDA ( haoop ).pptxBDA ( haoop ).pptx
BDA ( haoop ).pptx
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, ConfluentApache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
Apache Kafka and the Data Mesh | Ben Stopford and Michael Noll, Confluent
 
Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Big Data Companies and Apache Software
Big Data Companies and Apache SoftwareBig Data Companies and Apache Software
Big Data Companies and Apache Software
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Evolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in MotionEvolution from EDA to Data Mesh: Data in Motion
Evolution from EDA to Data Mesh: Data in Motion
 
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
ADV Slides: The Evolution of the Data Platform and What It Means to Enterpris...
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
Keynote Address at 2013 CloudCon: Future of Big Data by Richard McDougall (In...
 
DX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to workDX2000 from NEC lets you put big data to work
DX2000 from NEC lets you put big data to work
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
White Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum DatabaseWhite Paper: Advanced Cyber Analytics with Greenplum Database
White Paper: Advanced Cyber Analytics with Greenplum Database
 
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
Privacy-Preserving Multi-keyword Top-k Similarity Search Over Encrypted Data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 

Recently uploaded

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 

Recently uploaded (20)

DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 

Big data and computing grid

  • 1.
  • 2. 1 2 3 4 5 6 Starting our complex problem Awesome Distribute file system Lightning-Fast Cluster Computing 2
  • 3. 3 About goal This workshop will help in understanding compute grid, it’s model and it’s implementation. We have tried my best to explain the concepts in detail. The programming language used for demo in here is Scala. And we then apply this model to settle some problems we can see where this is simple and useful for us.
  • 4. 1
  • 5. • Lost of Data (Terabytes or Petabytes) • Big data is the term of collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing application. The challenges include capture, curation, storage, search, sharing, transfer, analysis and visualization. • Systems/ Enterprises generate huge amount of data form Terabytes to and even Petabytes of information. NYSE generates about one terabyte of new trade data/day to perform stock trading analytics to determine trends for optimal trades.
  • 6.
  • 7. 7 • 2,500 exabytes of new information in 2013 with internet as primary driver • Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 zettabytes this years
  • 8. 8
  • 9. 9 Enterprises are awash with ever-growing data of all types, easily amassing terabytes—even petabytes—of information. Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together.
  • 10. 1 2 3 4 10 Recommendation engines Ad targeting Search quality Abuse and Click fraud detection
  • 11. 1 2 3 4 11 Customer churn prevention Network performance optimization Calling data record analysis Analyzing network to predict failure
  • 12. 1 2 3 4 12 Health information exchange Gene sequencing Healthcare service quality improvements Drug safety
  • 13. 1 2 3 4 13 Modeling true risk Threat analysis Fraud detection Credit scoring and analysis
  • 14.
  • 15.
  • 16.
  • 17. 2
  • 18.
  • 19. Apache Hadoop is a framework that allows for distributed processing of large data sets across clusters of commodity computers using a simple programming model. It is an Open-source Data Management with scale-out storage & distributed processing. 19
  • 20. 1 2 3 4 20
  • 21. 21
  • 22. 22 • Splits a task across processors • “Near” the data & assembles results • Self-healing, high bandwidth • Clustered storage • JobTracker manages the TaskTrackers • Distributed across nodes • Natively redundant • NameNode tracks locations.
  • 23. 23
  • 24. 3
  • 25. Apache Spark is an open-source cluster-computing framework for real time processing developed by the Apache Software Foundation. Spark provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance. It was built on top of Hadoop MapReduce and it extends the MapReduce model to efficiently use more types of computations. 25
  • 26.
  • 27. 27
  • 28. 28
  • 29. 29 Map, flatMap, Filter, … Collect, Take, count, … Resilient Distributed Datasets (RDD): Collection that can be operated in parallel.
  • 30. 4
  • 31. 31 Hadoop Mapreduce Spark Time to sort 100TB data
  • 32. 32 Most active open source community in big data 200+ developers, 50+ companies contributing
  • 33. 33
  • 34. 34
  • 35. 4