SlideShare a Scribd company logo
1 of 18
Download to read offline
Transforming Mobile Marketing & Advertising™




                        Harnessing s for Big Data
                        Analytics

                                                                   Jobin Wilson
                                                                   jobin.wilson@flytxt.com




                                                                                             Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Who am I ?

   • Architect @ Flytxt (Big Data Analytics & Automation)

   • Passionate about data, distributed computing , machine learning

   • Previously

        •Virtualization & Cloud Lifecycle Management(BMC)

               • Designed and Implemented Cloud Life Cycle Management Interface for BMC

        • Large Scale Data Centre Automation(AOL)

               • Implemented Centralized Data Center Management Framework for AOL

        •Workflow Systems & Automation (Accenture)

               • Implemented Service Management Suit for various customers




                                                                                          Confidential
             Copyright © 2010 Flytxt B.V. All rights reserved.
Session Agenda!

• Data – What's the big deal?

• What is Hadoop( & What it is not  )

• Map-Reduce Model & HDFS

• Hadoop Ecosystem & Tools

• Lets get started!

• Q&A




                                                                    3   Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Five computers & a 640k ;-)


                                                             "I think there is a world market
                                                             for about five computers"
      Moore’s Law
                                                                        Thomas Watson 1943,
                                                                        Chairman of the board of IBM




       "640k ought to be enough for
       anybody"


                          Attributed to
                          Bill Gates in 1981.




                                                                                                       Confidential
         Copyright © 2010 Flytxt B.V. All rights reserved.
Data Explosion !




                                                             Confidential
         Copyright © 2010 Flytxt B.V. All rights reserved.
Do I also know what you might do next summer?


                                        •     Does your travel company know you visited Goa &
                                              Cochin twice in the last two years?

                                        •     Collaborative Filtering




                                        •     Lots of Data + Statistics = WOW!!!

                                        •     BTW, don’t worry about the eqn 




                                                                                                Confidential
        Copyright © 2010 Flytxt B.V. All rights reserved.
Don‟t throw away data just because it doesn't „fit‟


 •   relational tuples, log files, semi structured textual data (e.g., e-mail),pictures
     , videos

 •   User generated data & System generated data

 •   Applications need more than structured data

 •   My application is not “Dumb” any more!!

 •   “I keep saying that the sexy job in the next 10 years will be
      statisticians, and I’m not kidding.” - Hal Varian (Google’s chief economist)




                                                                                          Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Lets get to business!!

What is Apache Hadoop ?

•   Apache Hadoop is an open-source system to
    reliably store and process extremely large data sets
    across many commodity computers.

•   originally developed to support Nutch search engine
    project.

•   scales linearly with data size or analysis complexity

•   Scale-out ,shared nothing architecture

•   inspired by Google's MapReduce and Google File
    System (GFS) papers




                                                                   Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Basics of Hadoop


 •   Two Core Components – HDFS & Map-Reduce

 •   Machines are un-reliable

 •   Separates distributed fault-tolerant computing code from application
     logic.

 •   No need to worry about identity of a machine

 •   lets you interact with a cluster, not a bunch of machines.

 •   Analysis workloads span across multiple machines

 •   runs as a cloud(cluster) & possibly on a cloud (EC2)




                                                                            Confidential
               Copyright © 2010 Flytxt B.V. All rights reserved.
Lead Actors


•   Name Node – Book keeping metadata server

•   Secondary Name Node – Assistant to Name Node

•   Job Tracker – Scheduler

•   Task Tracker - Task execution

•   Data Node - Block storage




                                                                    Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
HDFS Write Model




                                                            Confidential
        Copyright © 2010 Flytxt B.V. All rights reserved.
Map-Reduce Model




                                                          Confidential
      Copyright © 2010 Flytxt B.V. All rights reserved.
Map-Reduce Execution Flow




                                                          Confidential
      Copyright © 2010 Flytxt B.V. All rights reserved.
Hadoop Ecosystem
•   Oozie – Open-source workflow/coordination
    service to manage data processing jobs for Apache
    Hadoop™ - Developed at Yahoo!

•   HBase – Column-store database based on
    Google’s BigTable. Holds extremely large data sets
    (Petabytes)

•   Hive – SQL based data warehousing app with
    features for analyzing very large data sets -
    Developed at Facebook

•   Zoo Keeper – Distributed consensus engine
    providing Leader election, service
    discovery, distributed locking / mutual exclusion

•   Pig - platform for analyzing large data sets that
    consists of a high-level language for expressing
    data analysis steps

•   Ganglia - a scalable distributed monitoring system
    for high-performance computing systems such as
    clusters and Grids
                                                                       Confidential
                   Copyright © 2010 Flytxt B.V. All rights reserved.
Hadoop is not a “Holy Grail”

•   Not a substitute for a database

•   MapReduce is not always the best algorithm

•   HDFS is not a substitute for a
    High Availability SAN-hosted FS

•   HDFS is not a Posix file system

•   Not a place to learn Java programming

•   Not a place to learn Unix/Linux system administration

•   Not a place to learn basics of networking




                                                                    Confidential
                Copyright © 2010 Flytxt B.V. All rights reserved.
Notable Users of Hadoop
(Source: http://en.wikipedia.org/wiki/Hadoop)



     • A9.com                               • Meebo
     • AOL                                  • Metaweb
     • EHarmony                             • The New York Times
     • eBay                                 • Rackspace
     • Facebook                             • StumbleUpon
     • Fox Interactive Media                • Twitter
     • IBM                                  • Yahoo
     • Last.fm                              • Amazon
     • LinkedIn




                                                                        Confidential
                    Copyright © 2010 Flytxt B.V. All rights reserved.
Q&A




                                                    www.flytxt.com
                                                    Confidential
Copyright © 2010 Flytxt B.V. All rights reserved.
THANK YOU
      contact us : dev2dev@flytxt.com/ jobin.wilson@flytxt.com




                                                                 www.flytxt.com
                                                                 Confidential   18
Copyright © 2010 Flytxt B.V. All rights reserved.

More Related Content

Viewers also liked

20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 eZeeshan Huq
 
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011  p5_and_p6_principal's_dialogue_collated_for_uploading2011  p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploadingalanpillay79
 
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2Cl introduction of p1_&_p2
Cl introduction of p1_&_p2alanpillay79
 
Recommendation engines : Matching items to users
Recommendation engines : Matching items to usersRecommendation engines : Matching items to users
Recommendation engines : Matching items to usersjobinwilson
 
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011alanpillay79
 
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 dZeeshan Huq
 
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011alanpillay79
 
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon BostonBuilding apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Bostonamansk
 
Brightwater Engineering General Presentation
Brightwater Engineering General PresentationBrightwater Engineering General Presentation
Brightwater Engineering General Presentationfletcher_mat
 
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01Mukesh Thakur
 
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence ReportPharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence ReportViedoc
 
Program Komuniti Tone Plus
Program Komuniti Tone PlusProgram Komuniti Tone Plus
Program Komuniti Tone PlusVun Chee Vui
 
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Viedoc
 
IT & Big Data 2012 Report
IT & Big Data 2012 ReportIT & Big Data 2012 Report
IT & Big Data 2012 ReportViedoc
 
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea DecalogoMauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea DecalogoMauricio Escalante
 
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportCFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportViedoc
 
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 dZeeshan Huq
 

Viewers also liked (20)

20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e20130412 brand management chapter 5 iba 45 e
20130412 brand management chapter 5 iba 45 e
 
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
2011  p5_and_p6_principal's_dialogue_collated_for_uploading2011  p5_and_p6_principal's_dialogue_collated_for_uploading
2011 p5_and_p6_principal's_dialogue_collated_for_uploading
 
Monavie Presentation
Monavie PresentationMonavie Presentation
Monavie Presentation
 
Cl introduction of p1_&_p2
Cl introduction of p1_&_p2Cl introduction of p1_&_p2
Cl introduction of p1_&_p2
 
Recommendation engines : Matching items to users
Recommendation engines : Matching items to usersRecommendation engines : Matching items to users
Recommendation engines : Matching items to users
 
P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011P1 & p2_cl_powerpoint_slides_2011
P1 & p2_cl_powerpoint_slides_2011
 
Viral marketing
Viral marketingViral marketing
Viral marketing
 
20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d20140128 buyer behavior iba mba48 d
20140128 buyer behavior iba mba48 d
 
TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011TL P1 & P2 parent's briefing 2011
TL P1 & P2 parent's briefing 2011
 
Building apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon BostonBuilding apps with HBase - Big Data TechCon Boston
Building apps with HBase - Big Data TechCon Boston
 
Brightwater Engineering General Presentation
Brightwater Engineering General PresentationBrightwater Engineering General Presentation
Brightwater Engineering General Presentation
 
Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01Pptpollution 111024083127-phpapp01
Pptpollution 111024083127-phpapp01
 
Budjettikone
BudjettikoneBudjettikone
Budjettikone
 
Pharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence ReportPharmapack 2012 Competitive Intelligence Report
Pharmapack 2012 Competitive Intelligence Report
 
Program Komuniti Tone Plus
Program Komuniti Tone PlusProgram Komuniti Tone Plus
Program Komuniti Tone Plus
 
Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010Rapport de veille_salon_texworld_paris_2010
Rapport de veille_salon_texworld_paris_2010
 
IT & Big Data 2012 Report
IT & Big Data 2012 ReportIT & Big Data 2012 Report
IT & Big Data 2012 Report
 
Mauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea DecalogoMauricio Escalante Tarea Decalogo
Mauricio Escalante Tarea Decalogo
 
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence ReportCFIA 2012 Food Industry ingredients Competitive Intelligence Report
CFIA 2012 Food Industry ingredients Competitive Intelligence Report
 
20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d20140117 buyer behavior iba mba48 d
20140117 buyer behavior iba mba48 d
 

Similar to Harnessing hadoop for big data analytics v0.1

Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stackFlytxt
 
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)Peter Lubbers
 
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...Taras Filatov
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresDATAVERSITY
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Adam Muise
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldSean Roberts
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)AI4BD GmbH
 
SharePoint from the Forms-Eye View
SharePoint from the Forms-Eye ViewSharePoint from the Forms-Eye View
SharePoint from the Forms-Eye ViewSteve Weissman
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev DayBuilding a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Dayjavier ramirez
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Cloudera, Inc.
 
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Adrian Treacy
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorialmarkgrover
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperabilityparker01
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Rohit Agrawal
 
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of ThingsVisualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of ThingsMia Yuan Cao
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 

Similar to Harnessing hadoop for big data analytics v0.1 (20)

Leveraging open source for big data stack
Leveraging open source for big data stackLeveraging open source for big data stack
Leveraging open source for big data stack
 
HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)HTML5--The 30,000' View (A fast-paced overview of HTML5)
HTML5--The 30,000' View (A fast-paced overview of HTML5)
 
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
Mobile Backend Apps and APIs meetup London overview of BaaS APIs and discussi...
 
Html5 Flyover
Html5 FlyoverHtml5 Flyover
Html5 Flyover
 
Putting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data StoresPutting Business Intelligence to Work on Hadoop Data Stores
Putting Business Intelligence to Work on Hadoop Data Stores
 
Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015Moving to a data-centric architecture: Toronto Data Unconference 2015
Moving to a data-centric architecture: Toronto Data Unconference 2015
 
S2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real WorldS2DS London 2015 - Hadoop Real World
S2DS London 2015 - Hadoop Real World
 
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)Open web platform talk by daniel hladky at rif 2012 (19 april 2012   moscow)
Open web platform talk by daniel hladky at rif 2012 (19 april 2012 moscow)
 
SharePoint from the Forms-Eye View
SharePoint from the Forms-Eye ViewSharePoint from the Forms-Eye View
SharePoint from the Forms-Eye View
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Building a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev DayBuilding a modern data platform on AWS. Utrecht AWS Dev Day
Building a modern data platform on AWS. Utrecht AWS Dev Day
 
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
Hadoop World 2011: Advancing Disney’s Data Infrastructure with Hadoop - Matt ...
 
Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11Tw Technology Radar Qtb Sep11
Tw Technology Radar Qtb Sep11
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
 
IBM Watson
IBM WatsonIBM Watson
IBM Watson
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
 
Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1Introduction to Big Data & Hadoop Architecture - Module 1
Introduction to Big Data & Hadoop Architecture - Module 1
 
Plug 20110217
Plug   20110217Plug   20110217
Plug 20110217
 
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of ThingsVisualizing IoT: Rapid Business Data Discovery for the Internet of Things
Visualizing IoT: Rapid Business Data Discovery for the Internet of Things
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 

Recently uploaded

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 

Recently uploaded (20)

The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 

Harnessing hadoop for big data analytics v0.1

  • 1. Transforming Mobile Marketing & Advertising™ Harnessing s for Big Data Analytics Jobin Wilson jobin.wilson@flytxt.com Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 2. Who am I ? • Architect @ Flytxt (Big Data Analytics & Automation) • Passionate about data, distributed computing , machine learning • Previously •Virtualization & Cloud Lifecycle Management(BMC) • Designed and Implemented Cloud Life Cycle Management Interface for BMC • Large Scale Data Centre Automation(AOL) • Implemented Centralized Data Center Management Framework for AOL •Workflow Systems & Automation (Accenture) • Implemented Service Management Suit for various customers Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 3. Session Agenda! • Data – What's the big deal? • What is Hadoop( & What it is not  ) • Map-Reduce Model & HDFS • Hadoop Ecosystem & Tools • Lets get started! • Q&A 3 Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 4. Five computers & a 640k ;-) "I think there is a world market for about five computers" Moore’s Law Thomas Watson 1943, Chairman of the board of IBM "640k ought to be enough for anybody" Attributed to Bill Gates in 1981. Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 5. Data Explosion ! Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 6. Do I also know what you might do next summer? • Does your travel company know you visited Goa & Cochin twice in the last two years? • Collaborative Filtering • Lots of Data + Statistics = WOW!!! • BTW, don’t worry about the eqn  Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 7. Don‟t throw away data just because it doesn't „fit‟ • relational tuples, log files, semi structured textual data (e.g., e-mail),pictures , videos • User generated data & System generated data • Applications need more than structured data • My application is not “Dumb” any more!! • “I keep saying that the sexy job in the next 10 years will be statisticians, and I’m not kidding.” - Hal Varian (Google’s chief economist) Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 8. Lets get to business!! What is Apache Hadoop ? • Apache Hadoop is an open-source system to reliably store and process extremely large data sets across many commodity computers. • originally developed to support Nutch search engine project. • scales linearly with data size or analysis complexity • Scale-out ,shared nothing architecture • inspired by Google's MapReduce and Google File System (GFS) papers Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 9. Basics of Hadoop • Two Core Components – HDFS & Map-Reduce • Machines are un-reliable • Separates distributed fault-tolerant computing code from application logic. • No need to worry about identity of a machine • lets you interact with a cluster, not a bunch of machines. • Analysis workloads span across multiple machines • runs as a cloud(cluster) & possibly on a cloud (EC2) Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 10. Lead Actors • Name Node – Book keeping metadata server • Secondary Name Node – Assistant to Name Node • Job Tracker – Scheduler • Task Tracker - Task execution • Data Node - Block storage Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 11. HDFS Write Model Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 12. Map-Reduce Model Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 13. Map-Reduce Execution Flow Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 14. Hadoop Ecosystem • Oozie – Open-source workflow/coordination service to manage data processing jobs for Apache Hadoop™ - Developed at Yahoo! • HBase – Column-store database based on Google’s BigTable. Holds extremely large data sets (Petabytes) • Hive – SQL based data warehousing app with features for analyzing very large data sets - Developed at Facebook • Zoo Keeper – Distributed consensus engine providing Leader election, service discovery, distributed locking / mutual exclusion • Pig - platform for analyzing large data sets that consists of a high-level language for expressing data analysis steps • Ganglia - a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 15. Hadoop is not a “Holy Grail” • Not a substitute for a database • MapReduce is not always the best algorithm • HDFS is not a substitute for a High Availability SAN-hosted FS • HDFS is not a Posix file system • Not a place to learn Java programming • Not a place to learn Unix/Linux system administration • Not a place to learn basics of networking Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 16. Notable Users of Hadoop (Source: http://en.wikipedia.org/wiki/Hadoop) • A9.com • Meebo • AOL • Metaweb • EHarmony • The New York Times • eBay • Rackspace • Facebook • StumbleUpon • Fox Interactive Media • Twitter • IBM • Yahoo • Last.fm • Amazon • LinkedIn Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 17. Q&A www.flytxt.com Confidential Copyright © 2010 Flytxt B.V. All rights reserved.
  • 18. THANK YOU contact us : dev2dev@flytxt.com/ jobin.wilson@flytxt.com www.flytxt.com Confidential 18 Copyright © 2010 Flytxt B.V. All rights reserved.