SlideShare a Scribd company logo
1 of 51
Brief
BIG DATA & HADOOP
Alchetron.com
Free Social Encyclopedia
BIG DATA
HADOOP
HDFS
MAP-REDUCE
ALCHETRON
FEEDBACKS
Q/A
BIG DATA & HADOOP
+
To understand BIG DATA
we will have to
understand data first !!!
THIS DRAWING WAS CREATED 40,000 YEARS
AGO  THIS WAS THE FIRST TIME WHEN
HUMANS STARTED RECORDING DATA 
AS TIME PASSED WE STARTED CREATING MORE
DATA AS YOU CAN SEE IN THIS PIC WHICH IS
3000-10,000 YEARS OLD
STONE TABLETS 
This man invented
printing machine in
1439 that means
more data is
collected than
before
Johannes Gutenberg
100 crore books
printed till 18th
century & my
dear friends you
are still not born
…..
THIS GUY INVENTS INTERNET IN 1991
SIR Tim Berners-Lee Invents Internet in 1991 now
with internet the amount of data generated
by mankind explodes !!
30 years of mobile Technology
30 years of mobile Technology
Next 20 years Computing will move on to Microscopic level
Computers wont be in our pockets but inside our body & mind
This is where Technology & Biology will merge which will
multiply and enhance our capabilities a thousand times
30 years of mobile Technology
Technological change will be so
rapid & exponential
With invention of internet + small & less expensive
storage devices !! Data creation explodes
Data generation statisticsDith invention of internet +
small & less expensive storage devices !!
Data creation explodes
2.7 Zetabytes of data exist in the digital universe today
Facebook stores, accesses, and analyzes 50+ Petabytes of user generated
data.
Walmart handles more than 1 million customer transactions every hour,
which is imported into databases estimated to contain more than 2.5
petabytes of data
More than 5 billion people are calling, texting, tweeting and browsing on
mobile phones worldwide.
YouTube users upload 48 hours of new video every minute of the day.
In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a
day
With invention of internet data creation explodesSO WHAT IS BIG DATA ??
Every day, we create 2.5 quintillion bytes of data — so much
that 90% of the data in the world today has been created in
the last two years alone. This data comes from everywhere :
sensors used to gather climate information, posts to social
media sites, digital pictures and videos, purchase transaction
records, and cell phone GPS signals to name a few.
This data is big data.
With invention of internet data creation explodes
With invention of internet data creation explodes
With invention of internet data creation explodes
With invention of internet data creation explodes
Who will manage BIG DATA
HADOOP
Open Source Apache Project
Written in Java
Runs on
Linux, Mac OS/X, Windows, and Solaris
Commodity hardware
Contents
• History of Hadoop
• The current applications of Hadoop
• Hadoop HDFS + MAP-REDUCE
• Other hadoop projects
Fun Fact of Hadoop
"The name my kid gave a stuffed yellow
elephant. Short, relatively easy to spell
and pronounce, meaningless, and not used
elsewhere: those are my naming criteria.
---- Doug Cutting, Hadoop project
creator
History of Hadoop
Apache Nutch
Doug Cutting
“Map-reduce”
2004
“It is an important technique!”
Extended
The great journey begins…
History of Hadoop
• Yahoo! became the primary contributor in
2006
History of Hadoop
• Yahoo! deployed large scale science clusters in
2007.
• Tons of Yahoo! Research papers emerge:
– WWW
– CIKM
– SIGIR
• Yahoo! began running major production jobs
in Q1 2008.
Hadoop consists of 2 parts.
They are HDFS & MapReduce.
HDFS
Namenodes & Datanodes are nothing but machines which helps the
client to store data.
Metadata is stored in namenode & actual data is stored in
datanodes
A TaskTracker is a daemon and works on datanode and is a node in
the cluster that accepts tasks - Map, Reduce and Shuffle operations -
from a Jobtracker.
A JobTracker is a daemon and works on namenode
and also farms out MapReduce tasks to specific nodes in the cluster,
ideally the nodes that have the data, or at least are in the same
rack.
Map-Reduce Architecture
Map-reduce is basically a data processing
engine
To understand it deeply you should know
java coding with experience
Lets try to learn the architecture of map-
reduce
An example
BORED   ALMOST THERE
BORED   ALMOST THERE
JUST ONE MORE CODE
Another Example code
Now a days (as per latest job market)…
• Software Developer Intern - IBM - Somers, NY +3 locations- Agile development - Big data / Hadoop /
data analytics a plus
• Software Developer - IBM - San Jose, CA +4 locations - include Hadoop-powered distributed parallel data
processing system, big data analytics ... multiple technologies, including Hadoop
Other Hadoop Projects Ecosystem
•Hadoop Core
– Distributed File System
– MapReduce Framework
•Pig (initiated by Yahoo!)
– Parallel Programming Language and Runtime
•Hbase (initiated by Powerset)
– Table storage for semi-structured data
•Zookeeper (initiated by Yahoo!)
– Coordinating distributed systems
•Hive (initiated by Facebook)
– SQL-like query language and metastore
TYPICAL HADOOP CLUSTER HANDLING & PROCESSING PETA BYTES OF DATA
1000 TB = 1 PETA BYTE APPROX..
Now a days…
Who use Hadoop?
• Amazon/A9
• Alchetron
• Fox interactive media
• Google
• IBM
• Facebook
• Quantcast
• Rackspace/Mailtrust
• Veoh
• Yahoo!
• More at http://wiki.apache.org/hadoop/PoweredBy
Lets see how we
Implemented this at
When you visit Alchetron.com
you are interacting
with data processed
with Hadoop
When you visit
Alchetron.com
you are interacting
with data processed
with Hadoop!!
Search
Index
Search
Index
When you visit
Alchetron.com
you are interacting
with data processed
with Hadoop !!
Organizing
data
Content
Filtering
References
• For more information:
– http://hadoop.apache.org/
– http://developer.yahoo.com/hadoop/
– http://alchetron.com/What-is-Big-data-1530-W
– http://alchetron.com/Big-Data-Hadoop-260-W

More Related Content

Similar to Big data & Hadoop & How we use it at Alchetron

Similar to Big data & Hadoop & How we use it at Alchetron (20)

A brief history of "big data"
A brief history of "big data"A brief history of "big data"
A brief history of "big data"
 
Big data
Big dataBig data
Big data
 
BigData primer
BigData primerBigData primer
BigData primer
 
Big data
Big dataBig data
Big data
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
00 hadoop welcome_transcript
00 hadoop welcome_transcript00 hadoop welcome_transcript
00 hadoop welcome_transcript
 
Big data-denis-rothman
Big data-denis-rothmanBig data-denis-rothman
Big data-denis-rothman
 
1 mapreduce-fest
1 mapreduce-fest1 mapreduce-fest
1 mapreduce-fest
 
Addressing dm-cloud
Addressing dm-cloudAddressing dm-cloud
Addressing dm-cloud
 
Next generation technology
Next generation technologyNext generation technology
Next generation technology
 
Hadoop technology doc
Hadoop technology docHadoop technology doc
Hadoop technology doc
 
Hadoop HDFS.ppt
Hadoop HDFS.pptHadoop HDFS.ppt
Hadoop HDFS.ppt
 
Big Data
Big DataBig Data
Big Data
 
1. what is hadoop part 1
1. what is hadoop   part 11. what is hadoop   part 1
1. what is hadoop part 1
 
Big Data with Modern R & Spark
Big Data with Modern R & SparkBig Data with Modern R & Spark
Big Data with Modern R & Spark
 
Big data
Big dataBig data
Big data
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
big-data-notes1.ppt
big-data-notes1.pptbig-data-notes1.ppt
big-data-notes1.ppt
 
Intro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and MapreduceIntro to BigData , Hadoop and Mapreduce
Intro to BigData , Hadoop and Mapreduce
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 

Recently uploaded

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Recently uploaded (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Big data & Hadoop & How we use it at Alchetron

  • 1. Brief BIG DATA & HADOOP Alchetron.com Free Social Encyclopedia
  • 3. BIG DATA & HADOOP + To understand BIG DATA we will have to understand data first !!!
  • 4. THIS DRAWING WAS CREATED 40,000 YEARS AGO  THIS WAS THE FIRST TIME WHEN HUMANS STARTED RECORDING DATA 
  • 5. AS TIME PASSED WE STARTED CREATING MORE DATA AS YOU CAN SEE IN THIS PIC WHICH IS 3000-10,000 YEARS OLD STONE TABLETS 
  • 6. This man invented printing machine in 1439 that means more data is collected than before Johannes Gutenberg
  • 7. 100 crore books printed till 18th century & my dear friends you are still not born …..
  • 8. THIS GUY INVENTS INTERNET IN 1991 SIR Tim Berners-Lee Invents Internet in 1991 now with internet the amount of data generated by mankind explodes !!
  • 9.
  • 10. 30 years of mobile Technology
  • 11. 30 years of mobile Technology
  • 12.
  • 13. Next 20 years Computing will move on to Microscopic level Computers wont be in our pockets but inside our body & mind This is where Technology & Biology will merge which will multiply and enhance our capabilities a thousand times 30 years of mobile Technology
  • 14.
  • 15. Technological change will be so rapid & exponential
  • 16. With invention of internet + small & less expensive storage devices !! Data creation explodes
  • 17. Data generation statisticsDith invention of internet + small & less expensive storage devices !! Data creation explodes 2.7 Zetabytes of data exist in the digital universe today Facebook stores, accesses, and analyzes 50+ Petabytes of user generated data. Walmart handles more than 1 million customer transactions every hour, which is imported into databases estimated to contain more than 2.5 petabytes of data More than 5 billion people are calling, texting, tweeting and browsing on mobile phones worldwide. YouTube users upload 48 hours of new video every minute of the day. In 2008, Google was processing 20,000 terabytes of data (20 petabytes) a day
  • 18. With invention of internet data creation explodesSO WHAT IS BIG DATA ?? Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere : sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data.
  • 19. With invention of internet data creation explodes
  • 20. With invention of internet data creation explodes
  • 21. With invention of internet data creation explodes
  • 22. With invention of internet data creation explodes
  • 23. Who will manage BIG DATA
  • 24. HADOOP Open Source Apache Project Written in Java Runs on Linux, Mac OS/X, Windows, and Solaris Commodity hardware
  • 25. Contents • History of Hadoop • The current applications of Hadoop • Hadoop HDFS + MAP-REDUCE • Other hadoop projects
  • 26. Fun Fact of Hadoop "The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. ---- Doug Cutting, Hadoop project creator
  • 27. History of Hadoop Apache Nutch Doug Cutting “Map-reduce” 2004 “It is an important technique!” Extended The great journey begins…
  • 28. History of Hadoop • Yahoo! became the primary contributor in 2006
  • 29. History of Hadoop • Yahoo! deployed large scale science clusters in 2007. • Tons of Yahoo! Research papers emerge: – WWW – CIKM – SIGIR • Yahoo! began running major production jobs in Q1 2008.
  • 30. Hadoop consists of 2 parts. They are HDFS & MapReduce.
  • 31. HDFS Namenodes & Datanodes are nothing but machines which helps the client to store data. Metadata is stored in namenode & actual data is stored in datanodes
  • 32. A TaskTracker is a daemon and works on datanode and is a node in the cluster that accepts tasks - Map, Reduce and Shuffle operations - from a Jobtracker. A JobTracker is a daemon and works on namenode and also farms out MapReduce tasks to specific nodes in the cluster, ideally the nodes that have the data, or at least are in the same rack.
  • 33.
  • 34.
  • 35. Map-Reduce Architecture Map-reduce is basically a data processing engine To understand it deeply you should know java coding with experience Lets try to learn the architecture of map- reduce
  • 36.
  • 38. BORED   ALMOST THERE
  • 39. BORED   ALMOST THERE JUST ONE MORE CODE
  • 41. Now a days (as per latest job market)… • Software Developer Intern - IBM - Somers, NY +3 locations- Agile development - Big data / Hadoop / data analytics a plus • Software Developer - IBM - San Jose, CA +4 locations - include Hadoop-powered distributed parallel data processing system, big data analytics ... multiple technologies, including Hadoop
  • 42. Other Hadoop Projects Ecosystem •Hadoop Core – Distributed File System – MapReduce Framework •Pig (initiated by Yahoo!) – Parallel Programming Language and Runtime •Hbase (initiated by Powerset) – Table storage for semi-structured data •Zookeeper (initiated by Yahoo!) – Coordinating distributed systems •Hive (initiated by Facebook) – SQL-like query language and metastore
  • 43.
  • 44. TYPICAL HADOOP CLUSTER HANDLING & PROCESSING PETA BYTES OF DATA 1000 TB = 1 PETA BYTE APPROX..
  • 45. Now a days… Who use Hadoop? • Amazon/A9 • Alchetron • Fox interactive media • Google • IBM • Facebook • Quantcast • Rackspace/Mailtrust • Veoh • Yahoo! • More at http://wiki.apache.org/hadoop/PoweredBy
  • 46. Lets see how we Implemented this at
  • 47. When you visit Alchetron.com you are interacting with data processed with Hadoop
  • 48. When you visit Alchetron.com you are interacting with data processed with Hadoop!! Search Index Search Index When you visit Alchetron.com you are interacting with data processed with Hadoop !!
  • 51. References • For more information: – http://hadoop.apache.org/ – http://developer.yahoo.com/hadoop/ – http://alchetron.com/What-is-Big-data-1530-W – http://alchetron.com/Big-Data-Hadoop-260-W