SlideShare a Scribd company logo
1 of 22
MapReduce In The Cloud
          Infinispan Distributed Task Execution Framework




Manik Surtani
May 3rd 2011, JUDCon - Boston
Welcome to the Data Deluge
Background

• Emergence of data beyond human scale
• Outgrows current platforms in scale, structure, processing time
• Abundance of unstructured, machine generated data
• Does not fit into current software paradigms
• Not confined to Twitter, Facebook and Google only
• Need new platforms built for Big Data from ground up
Big Data

• How did Big Data happen?
• Exponential changes in storage, bandwidth and data
  creation
• Data exhaust
• Drowning in data and not knowing what to do with it
• Leverage, not discard Big Data
• We are not even aware of data revolution around us
Hard disk cost & capacity history
         1000000                                                                                     1000000

           100000                                                                                        100000

             10000
                                                                                                          10000
              1000
                                                                                                           1000
                100
                                                                                                            100
                  10

                   1                                                                                         10


                 0.1                                                                                          1
                    1980               1985               1990     1995   2000       2005         2010
                                                Cost per GB in US$               Capacity in MB

Source: http://en.wikipedia.org/wiki/History_of_hard_disk_drives
“There was 5 exabytes of
                information created between the
               dawn of civilization through 2003....




                                                                                                      Eric Schmidt, August 4th, 2010

Source: http://www.readwriteweb.com/archives/google_ceo_schmidt_people_arent_ready_for_the_tech.php
“There was 5 exabytes of
                information created between the
               dawn of civilization through 2003....

                                                                 ... that much information is
                                                                now created every two days”


                                                                                                      Eric Schmidt, August 4th, 2010

Source: http://www.readwriteweb.com/archives/google_ceo_schmidt_people_arent_ready_for_the_tech.php
Big Data challenges

• Powerful platforms available today, however...
• Need to be genius to build systems on top of them
• Equivalent to programming client/server apps in assembly
  language
• Is there a need for a new language?
• Confusion is as big as Big Data
• Simpler yet equally powerful solutions needed
“Infinispan without
MapReduce is like owning a
  Ferrari without a
Driving License”
           - Vladimir Blagojevic
Infinispan Big Data goals

• Why waste a Ferrari?
• Humble beginnings
• Simplicity without sacrificing power
• Capitalize on existing Infinispan infrastructure
• Reuse of current programming abstractions
• Two frameworks: distributed executors and MapReduce
Infinispan Distributed Execution
           Framework

• Leverage familiar ExecutorService, Callable abstractions
• Expand it to distributed, parallel computing paradigm
• Looks like a regular ExecutorService
• Feels like a regular ExecutorService
• The magic that goes on within Infinispan is completely
  transparent to users
• Nevertheless, users can experience it :-)
So simple it fits on one slide
public interface DistributedExecutorService extends ExecutorService {

      <T, K> Future<T> submit(Callable<T> task, K... input);

      <T> List<Future<T>> submitEverywhere(Callable<T> task);

      <T, K > List<Future<T>> submitEverywhere(Callable<T> task, K... input);
 }


 public interface DistributedCallable<K, V, T> extends Callable<T> {

     void setEnvironment(Cache<K, V> cache, Set<K> inputKeys);

}
However,behind the scenes..
Do not forget Gene Amdahl

                                                                   Speedup = 1/(p/n)+(1-p)


                                                              However, problems that increase the percentage of
                                                              parallel time with their size are more scalable than
                                                              problems with fixed percentage of parallel time




                                                                                                 p = parallel fraction
                                                                                                 n = number of processors

Source: https://computing.llnl.gov/tutorials/parallel_comp/
π approximation
Infinispan MapReduce

• We already have a data grid!
• Leverages Infinispan’s DIST mode
• Cache data is input for MapReduce tasks
• Task components: Mapper, Reducer, Collator
• MapReduceTask cohering them together
MapReduce model




Source: http://labs.google.com/papers/mapreduce.html
Mapper, Reducer, Collator
public interface Mapper<KIn, VIn, KOut, VOut> extends Serializable {

     void map(KIn key, VIn value, Collector<KOut, VOut> collector);

}


public interface Reducer<KOut, VOut> extends Serializable {

     VOut reduce(KOut reducedKey, Iterator<VOut> iter);

}

public interface Collator<KOut, VOut, R> {

     R collate(Map<KOut, VOut> reducedResults);

}
Roadmap

• Improve task execution container
• Failover, execution policies
• Make sure it scales to terabytes and petabytes
• Integration with Hibernate OGM
• Analytics and BI tools
• Do we need data analysis language?
Parting thoughts

• Data revolution is here, today!
• Profound socio-economic impact
• Do not sleep through it
• Infinispan as a platform for Big Data
• Join us in these exciting endeavours
Questions?
    http://www.infinispan.org
    http://blog.infinispan.org
http://www.twitter.com/infinispan
    #infinispan on FreeNode

         Rate this talk!
http://spkr8.com/t/7384

More Related Content

Similar to MapReduce In The Cloud Infinispan Distributed Task Execution Framework

Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveHien Luu
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Softwareelliando dias
 
Iftf reconfiguring reality 20171010 v3
Iftf reconfiguring reality 20171010 v3Iftf reconfiguring reality 20171010 v3
Iftf reconfiguring reality 20171010 v3ISSIP
 
Model-based Testing: Today And Tomorrow
Model-based Testing: Today And TomorrowModel-based Testing: Today And Tomorrow
Model-based Testing: Today And TomorrowBob Binder
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...MaRS Discovery District
 
Small, Medium and Big Data
Small, Medium and Big DataSmall, Medium and Big Data
Small, Medium and Big DataPierre De Wilde
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)Ashok Rangaswamy
 
Is the Browser a Transitional Technology?
Is the Browser a Transitional Technology?Is the Browser a Transitional Technology?
Is the Browser a Transitional Technology?Allen Wirfs-Brock
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and ElasticElasticsearch
 
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Denis Parra Santander
 
Microsoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessMicrosoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessFrederik De Bruyne
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011Joe Stein
 
Video AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryVideo AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryAlbert Y. C. Chen
 
BCO 117 IT Software for Business Lecture Reference Notes.docx
BCO 117 IT Software for Business Lecture Reference Notes.docxBCO 117 IT Software for Business Lecture Reference Notes.docx
BCO 117 IT Software for Business Lecture Reference Notes.docxjesuslightbody
 
(Practical) Beyond Responsive Web Design (WordCamp Miami 2011)
(Practical) Beyond Responsive Web Design (WordCamp Miami 2011)(Practical) Beyond Responsive Web Design (WordCamp Miami 2011)
(Practical) Beyond Responsive Web Design (WordCamp Miami 2011)arborwebsolutions
 
Gala Webminar September 2013
Gala Webminar September 2013Gala Webminar September 2013
Gala Webminar September 2013pangeanic
 
Entertainment Architectures 2011
Entertainment Architectures 2011Entertainment Architectures 2011
Entertainment Architectures 2011George Dolbier
 
Let the dinosaurs finally die out with a little help of a service worker
Let the dinosaurs finally die out with a little help of a service workerLet the dinosaurs finally die out with a little help of a service worker
Let the dinosaurs finally die out with a little help of a service workerKacper Sokołowski
 

Similar to MapReduce In The Cloud Infinispan Distributed Task Execution Framework (20)

Big Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's PerspectiveBig Data Story - From An Engineer's Perspective
Big Data Story - From An Engineer's Perspective
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Iftf reconfiguring reality 20171010 v3
Iftf reconfiguring reality 20171010 v3Iftf reconfiguring reality 20171010 v3
Iftf reconfiguring reality 20171010 v3
 
Model-based Testing: Today And Tomorrow
Model-based Testing: Today And TomorrowModel-based Testing: Today And Tomorrow
Model-based Testing: Today And Tomorrow
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
 
14 turing wics
14 turing wics14 turing wics
14 turing wics
 
Small, Medium and Big Data
Small, Medium and Big DataSmall, Medium and Big Data
Small, Medium and Big Data
 
BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)BDI- The Beginning (Big data training in Coimbatore)
BDI- The Beginning (Big data training in Coimbatore)
 
Is the Browser a Transitional Technology?
Is the Browser a Transitional Technology?Is the Browser a Transitional Technology?
Is the Browser a Transitional Technology?
 
T-Mobile and Elastic
T-Mobile and ElasticT-Mobile and Elastic
T-Mobile and Elastic
 
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
Identifying Relevant Messages in a Twitter-based Citizen Channel for Natural ...
 
Microsoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of businessMicrosoft Dynamics Academic Alliance: How to win future of business
Microsoft Dynamics Academic Alliance: How to win future of business
 
jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
 
Video AI for Media and Entertainment Industry
Video AI for Media and Entertainment IndustryVideo AI for Media and Entertainment Industry
Video AI for Media and Entertainment Industry
 
BCO 117 IT Software for Business Lecture Reference Notes.docx
BCO 117 IT Software for Business Lecture Reference Notes.docxBCO 117 IT Software for Business Lecture Reference Notes.docx
BCO 117 IT Software for Business Lecture Reference Notes.docx
 
(Practical) Beyond Responsive Web Design (WordCamp Miami 2011)
(Practical) Beyond Responsive Web Design (WordCamp Miami 2011)(Practical) Beyond Responsive Web Design (WordCamp Miami 2011)
(Practical) Beyond Responsive Web Design (WordCamp Miami 2011)
 
160921 exponentiality wtf
160921 exponentiality wtf160921 exponentiality wtf
160921 exponentiality wtf
 
Gala Webminar September 2013
Gala Webminar September 2013Gala Webminar September 2013
Gala Webminar September 2013
 
Entertainment Architectures 2011
Entertainment Architectures 2011Entertainment Architectures 2011
Entertainment Architectures 2011
 
Let the dinosaurs finally die out with a little help of a service worker
Let the dinosaurs finally die out with a little help of a service workerLet the dinosaurs finally die out with a little help of a service worker
Let the dinosaurs finally die out with a little help of a service worker
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

MapReduce In The Cloud Infinispan Distributed Task Execution Framework

  • 1.
  • 2. MapReduce In The Cloud Infinispan Distributed Task Execution Framework Manik Surtani May 3rd 2011, JUDCon - Boston
  • 3. Welcome to the Data Deluge
  • 4. Background • Emergence of data beyond human scale • Outgrows current platforms in scale, structure, processing time • Abundance of unstructured, machine generated data • Does not fit into current software paradigms • Not confined to Twitter, Facebook and Google only • Need new platforms built for Big Data from ground up
  • 5. Big Data • How did Big Data happen? • Exponential changes in storage, bandwidth and data creation • Data exhaust • Drowning in data and not knowing what to do with it • Leverage, not discard Big Data • We are not even aware of data revolution around us
  • 6. Hard disk cost & capacity history 1000000 1000000 100000 100000 10000 10000 1000 1000 100 100 10 1 10 0.1 1 1980 1985 1990 1995 2000 2005 2010 Cost per GB in US$ Capacity in MB Source: http://en.wikipedia.org/wiki/History_of_hard_disk_drives
  • 7. “There was 5 exabytes of information created between the dawn of civilization through 2003.... Eric Schmidt, August 4th, 2010 Source: http://www.readwriteweb.com/archives/google_ceo_schmidt_people_arent_ready_for_the_tech.php
  • 8. “There was 5 exabytes of information created between the dawn of civilization through 2003.... ... that much information is now created every two days” Eric Schmidt, August 4th, 2010 Source: http://www.readwriteweb.com/archives/google_ceo_schmidt_people_arent_ready_for_the_tech.php
  • 9. Big Data challenges • Powerful platforms available today, however... • Need to be genius to build systems on top of them • Equivalent to programming client/server apps in assembly language • Is there a need for a new language? • Confusion is as big as Big Data • Simpler yet equally powerful solutions needed
  • 10. “Infinispan without MapReduce is like owning a Ferrari without a Driving License” - Vladimir Blagojevic
  • 11. Infinispan Big Data goals • Why waste a Ferrari? • Humble beginnings • Simplicity without sacrificing power • Capitalize on existing Infinispan infrastructure • Reuse of current programming abstractions • Two frameworks: distributed executors and MapReduce
  • 12. Infinispan Distributed Execution Framework • Leverage familiar ExecutorService, Callable abstractions • Expand it to distributed, parallel computing paradigm • Looks like a regular ExecutorService • Feels like a regular ExecutorService • The magic that goes on within Infinispan is completely transparent to users • Nevertheless, users can experience it :-)
  • 13. So simple it fits on one slide public interface DistributedExecutorService extends ExecutorService { <T, K> Future<T> submit(Callable<T> task, K... input); <T> List<Future<T>> submitEverywhere(Callable<T> task); <T, K > List<Future<T>> submitEverywhere(Callable<T> task, K... input); } public interface DistributedCallable<K, V, T> extends Callable<T> { void setEnvironment(Cache<K, V> cache, Set<K> inputKeys); }
  • 15. Do not forget Gene Amdahl Speedup = 1/(p/n)+(1-p) However, problems that increase the percentage of parallel time with their size are more scalable than problems with fixed percentage of parallel time p = parallel fraction n = number of processors Source: https://computing.llnl.gov/tutorials/parallel_comp/
  • 17. Infinispan MapReduce • We already have a data grid! • Leverages Infinispan’s DIST mode • Cache data is input for MapReduce tasks • Task components: Mapper, Reducer, Collator • MapReduceTask cohering them together
  • 19. Mapper, Reducer, Collator public interface Mapper<KIn, VIn, KOut, VOut> extends Serializable { void map(KIn key, VIn value, Collector<KOut, VOut> collector); } public interface Reducer<KOut, VOut> extends Serializable { VOut reduce(KOut reducedKey, Iterator<VOut> iter); } public interface Collator<KOut, VOut, R> { R collate(Map<KOut, VOut> reducedResults); }
  • 20. Roadmap • Improve task execution container • Failover, execution policies • Make sure it scales to terabytes and petabytes • Integration with Hibernate OGM • Analytics and BI tools • Do we need data analysis language?
  • 21. Parting thoughts • Data revolution is here, today! • Profound socio-economic impact • Do not sleep through it • Infinispan as a platform for Big Data • Join us in these exciting endeavours
  • 22. Questions? http://www.infinispan.org http://blog.infinispan.org http://www.twitter.com/infinispan #infinispan on FreeNode Rate this talk! http://spkr8.com/t/7384

Editor's Notes

  1. \n
  2. Why Vladimir can&amp;#x2019;t make it\n\n
  3. Not into surfing, although I believe this picture captures a feeling about the data deluge quite well.\n* Big and scary\n* Inevitable\n* Kinda interesting, challenging\n* And full of opportunity\n\n\n
  4. We are in the era of big data, data beyond human scale\nBeyond the scale of most simple systems\nCloud helps, with cheap, on-demand storage and processing\nAnd while social networks are a good example of this, it isn&amp;#x2019;t just tied to social networks. We see aspects of big data crop up everywhere.\n \n\n\n
  5. Even prior to cloud and cheap,on-demand storage/processing, the cost of traditional storage has been dropping so fast that you may as well store everything you find out. Just in case. Like GMail. :)\n\n
  6. Wikipedia sums this up well, on this exponential scale of how the price of storage has been dropping.\n
  7. \n
  8. And this quote by Eric Schmidt gives you an idea of the scale and volume of data we&amp;#x2019;re dealing with today.\n
  9. We have things that can deal with big data.\nWe have the tools, but they are non-trivial.\nHadoop, for example, is tough to set up, hard to maintain, use, etc.\nM/R: is that enough? Do we need a higher level language to express this? IMO, yes. But for now, this is what we have.\n
  10. \n
  11. Infinispan and big data\nwrapped around 2 separate but related APIs:\nDistributed Executors and Map/Reduce\n(Map/Reduce implemented over DistExec)\n
  12. Start with DistExec. This is a simplistic API, extending familiar JDK abstractions.\nIt encapsulates the use case of wanting to distribute workloads on the grid, potentially with some data affinity. \nLets take a look at the API.\n\n
  13. \n
  14. This allows you to parallelise tasks across the grid. Essentially it takes JDK7&amp;#x2019;s Fork/Join concepts and distributes it across multi-nodes.\n
  15. But this suffers from the same issues that Fork/Join suffers from - at some point it is just too expensive to break up tasks, distribute it, etc.\nSo care must always be taken when considering distributing task execution.\n
  16. Example: Pi approximation.\nThis is an old brute-force algorithm using random points on the circle. The more &amp;#x201C;tasks&amp;#x201D; that get kicked off, the more accurate our approximation.\n\nLook at Impl, try with 1 node and try with 6 nodes\n\n
  17. \n
  18. \n
  19. \n
  20. That was distexec. So that&amp;#x2019;s cool to some degree, lets now look at Map/Reduce.\nM/R based on Google&amp;#x2019;s paper, allows you to perform a task over a data set in parallel, and pull back &amp;#x201C;reduced&amp;#x201D; results.\nIn the case of Infinispan, data is DIST anyway. So your global data set is disjoint.\n\n
  21. Google&amp;#x2019;s Map/Reduce model, from Google&amp;#x2019;s paper\nInput -&gt; Mapped -&gt; Reduced -&gt; Collated.\n
  22. 3 interfaces in Infinispan&amp;#x2019;s map/reduce API\n\nExplain word-count demo\nDemo on 1 node\nDemo on 6 nodes\n
  23. \n
  24. \n
  25. * Currently in an experimental phase\n* We need to do failover - behaviour as cluster size changes\n* Test on very large volumes\n* OGM - could make use of this for distributed queries\n* Integration with analysis tools\n* Data language? LINQ?\n
  26. \n
  27. \n