SlideShare a Scribd company logo
Big Data
Valeri Kopaleishvili
Outline
◦what is Big Data ?
◦where this Beg Data come from?
◦4v`s Analysis
◦When dealing with big Data?
◦EXAMPLE : Google
What is big data?
“Every day, we create 2.5 quintillion bytes of data — so
much that 90% of the data in the world today has been
created in the last two years alone. This data comes from
everywhere: sensors used to gather climate information,
posts to social media sites, digital pictures and videos,
purchase transaction records, and cell phone GPS signals to
name a few.
This data is “big data.”
Where Is This “Big Data” Coming From?
12+ TBs
of tweet data
every day
25+ TBs of
log data
every day
?
TBs
of
data
every
day
2+ billion
people on
the Web
by end
2011
30 billion RFID tags
today
(1.3B in 2005)
4.6
billion
camera
phones
world
wide
100s of
millions of
GPS
enabled
devices
sold
annually
76 million smart
meters in 2009…
200M by 2014
Volume
of Tweets
create daily.
12+ terabytes
Variety
of different
types of data.
100’s
Value
With Big Data, We’ve Moved to 4 Vs Analytics
trade events
per second.
5+ million
Velocity
Volume (Scale)
Data Volume
◦ 44x increase from 2009 2020
◦ From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
6
Refers to the vast amounts of data generated every second. We are not talking
Terabytes but Petabytes . If we take all the data generated in the world between the
beginning of time and 2008, the same amount of data will soon be generated every
minute. This makes most data sets too large to store and analyze using traditional
database technology. New big data tools use distributed systems so that we can store
and analyse data across databases that are dotted around anywhere in the world.
Variety (Complexity)
7
To extract knowledge all these types of
data need to be linked together
Refers to the different types of data we can now use. In the past we only
focused on structured data that neatly fitted into tables or relational
databases, such as financial data.
In fact, 80% of the world’s data is unstructured (text, images, video,
voice, etc.) With big data technology we can now analyze and bring
together data of different types such as messages, social media
conversations, photos, sensor data, video or voice recordings.
Velocity (Speed)
Velocity :Refers to the speed at which new data is generated and the speed at
which data moves around. Just think of social media messages going viral in
seconds. Technology allows us now to analyze the data while it is being
generated (sometimes referred to as in-memory analytics), without ever
putting it into databases.
Examples
◦ E-Promotions: Based on your current location, your purchase history, what
you like  send promotions right now for store next to you
◦ Healthcare monitoring: sensors monitoring your activities and body  any
abnormal measurements require immediate reaction
8
Real-time/Fast Data
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
9
Social media and networks
(all of us are generating data)
Scientific instruments
(collecting all sorts of data)
Mobile devices
(tracking all objects all the time)
Sensor technology and networks
(measuring all kinds of data)
Value Then there is another V to take into account when looking at Big
Data: Value! Having access to big data is no good unless we can turn it
into value. Companies are starting to generate amazing value from their
big data.
We currently only see the beginnings of a transformation into a big data
economy. Any business that doesn’t seriously consider the implications
of Big Data runs the risk of being left behind.
Value
Big Data Exploration: Value & Diagram
11
File
Systems
Relational
Data
Content
Management
Email
CRM
Supply
Chain
ERP
RSS Feeds
Cloud
Custom
Sources
Data
Explorer
Application/
Users
Find, Visualize & Understand
all big data to improve
business knowledge
• Greater efficiencies in business
processes
• New insights from combining and
analyzing data types in new
ways
• Develop new business models
with resulting increased market
presence and revenue
Applications for Big Data Analytics
Homeland Security
Finance
Smarter Healthcare
Telecom
Manufacturing
Traffic Control
Trading Analytics
Log Analysis
Search Quality
When dealing with Big Data is
hard
When the operations on data are complex:
◦ Eg. Simple counting is not a complex problem.
◦ Modeling and reasoning with data of different kinds can
get extremely complex
Good news with big-data:
◦ Often, because of the vast amount of data, modeling
techniques can get simpler (e.g., smart counting can
replace complex model-based analytics)…
◦ …as long as we deal with the scale.
Hadoopis an open-source software framework for storing and processing big data in a
distributed fashion on large clusters of commodity hardware.
Suitable for extremely large databases (billions of rows, millions of columns), distributed
across thousands of nodes.
Hadoop Distributed File System (HDFS) is a Java-based file system that provides
scalable and reliable data storage that is designed to large clusters of commodity
servers.
MapReduce is a programming model and an associated implementation for processing and generating
large data sets with a parallel, distributed algorithm on a cluster.
We first wrote the data into HDFS, then created a table and loaded data from HDFS
files to HIVE table.
Thanks!

More Related Content

Similar to big-datagroup6-150317090053-conversion-gate01.pdf

Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/Presentation
Kirtimaan Chhabra
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
Mohit Saini
 
Bigdata
BigdataBigdata
IT FUTURE- Big data
IT FUTURE- Big dataIT FUTURE- Big data
IT FUTURE- Big data
Jenson Sebastian
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
Sonia Baratas Alves
 
What is Big Data?
What is Big Data? What is Big Data?
What is Big Data?
Carlos Martin Hernandez
 
Bigdata (1) converted
Bigdata (1) convertedBigdata (1) converted
Bigdata (1) converted
THILAKAVATHIRAMRAJ
 
new.pptx
new.pptxnew.pptx
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
Muhammad Rumman Islam Nur
 
130214 copy
130214   copy130214   copy
130214 copy
Arpit Arora
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Akshata Humbe
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data!
B Spot
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.
Aditya205306
 
Big data.pptx
Big data.pptxBig data.pptx
Big data.pptx
Honey166829
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
International Federation for Information Technologies in Travel and Tourism (IFITT)
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Hari Priya
 
Big data
Big dataBig data
Big data
Mahmudul Alam
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
Audrey Britton
 

Similar to big-datagroup6-150317090053-conversion-gate01.pdf (20)

Big data Seminar/Presentation
Big data Seminar/PresentationBig data Seminar/Presentation
Big data Seminar/Presentation
 
Big data lecture notes
Big data lecture notesBig data lecture notes
Big data lecture notes
 
Bigdata
BigdataBigdata
Bigdata
 
IT FUTURE- Big data
IT FUTURE- Big dataIT FUTURE- Big data
IT FUTURE- Big data
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
Big data nou
Big data nouBig data nou
Big data nou
 
What is Big Data?
What is Big Data? What is Big Data?
What is Big Data?
 
Bigdata (1) converted
Bigdata (1) convertedBigdata (1) converted
Bigdata (1) converted
 
new.pptx
new.pptxnew.pptx
new.pptx
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
130214 copy
130214   copy130214   copy
130214 copy
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data
Big dataBig data
Big data
 
Let's make money from big data!
Let's make money from big data! Let's make money from big data!
Let's make money from big data!
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.
 
Big data.pptx
Big data.pptxBig data.pptx
Big data.pptx
 
Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...Identifying the new frontier of big data as an enabler for T&T industries: Re...
Identifying the new frontier of big data as an enabler for T&T industries: Re...
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Big data
Big dataBig data
Big data
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 

Recently uploaded

Chapters 3 Contracts.pptx Chapters 3 Contracts.pptx
Chapters 3  Contracts.pptx Chapters 3  Contracts.pptxChapters 3  Contracts.pptx Chapters 3  Contracts.pptx
Chapters 3 Contracts.pptx Chapters 3 Contracts.pptx
Sheldon Byron
 
DIGITAL MARKETING COURSE IN CHENNAI.pptx
DIGITAL MARKETING COURSE IN CHENNAI.pptxDIGITAL MARKETING COURSE IN CHENNAI.pptx
DIGITAL MARKETING COURSE IN CHENNAI.pptx
FarzanaRbcomcs
 
Transferable Skills - Your Roadmap - Part 1 and 2 - Dirk Spencer Senior Recru...
Transferable Skills - Your Roadmap - Part 1 and 2 - Dirk Spencer Senior Recru...Transferable Skills - Your Roadmap - Part 1 and 2 - Dirk Spencer Senior Recru...
Transferable Skills - Your Roadmap - Part 1 and 2 - Dirk Spencer Senior Recru...
Dirk Spencer Corporate Recruiter LION
 
The Impact of Artificial Intelligence on Modern Society.pdf
The Impact of Artificial Intelligence on Modern Society.pdfThe Impact of Artificial Intelligence on Modern Society.pdf
The Impact of Artificial Intelligence on Modern Society.pdf
ssuser3e63fc
 
DOC-20240602-WA0001..pdf DOC-20240602-WA0001..pdf
DOC-20240602-WA0001..pdf DOC-20240602-WA0001..pdfDOC-20240602-WA0001..pdf DOC-20240602-WA0001..pdf
DOC-20240602-WA0001..pdf DOC-20240602-WA0001..pdf
Pushpendra Kumar
 
Exploring Career Paths in Cybersecurity for Technical Communicators
Exploring Career Paths in Cybersecurity for Technical CommunicatorsExploring Career Paths in Cybersecurity for Technical Communicators
Exploring Career Paths in Cybersecurity for Technical Communicators
Ben Woelk, CISSP, CPTC
 
Personal Brand exploration KE.pdf for assignment
Personal Brand exploration KE.pdf for assignmentPersonal Brand exploration KE.pdf for assignment
Personal Brand exploration KE.pdf for assignment
ragingokie
 
New Explore Careers and College Majors 2024.pdf
New Explore Careers and College Majors 2024.pdfNew Explore Careers and College Majors 2024.pdf
New Explore Careers and College Majors 2024.pdf
Dr. Mary Askew
 
Andrea Kate Portfolio Presentation.pdf
Andrea Kate  Portfolio  Presentation.pdfAndrea Kate  Portfolio  Presentation.pdf
Andrea Kate Portfolio Presentation.pdf
andreakaterasco
 
135. Reviewer Certificate in Journal of Engineering
135. Reviewer Certificate in Journal of Engineering135. Reviewer Certificate in Journal of Engineering
135. Reviewer Certificate in Journal of Engineering
Manu Mitra
 
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
yuhofha
 
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
atwvhyhm
 
Operating system. short answes and Interview questions .pdf
Operating system. short answes and Interview questions .pdfOperating system. short answes and Interview questions .pdf
Operating system. short answes and Interview questions .pdf
harikrishnahari6276
 
Digital Marketing Training In Bangalore
Digital  Marketing Training In BangaloreDigital  Marketing Training In Bangalore
Digital Marketing Training In Bangalore
nidm599
 
欧洲杯投注网站-欧洲杯投注网站推荐-欧洲杯投注网站| 立即访问【ac123.net】
欧洲杯投注网站-欧洲杯投注网站推荐-欧洲杯投注网站| 立即访问【ac123.net】欧洲杯投注网站-欧洲杯投注网站推荐-欧洲杯投注网站| 立即访问【ac123.net】
欧洲杯投注网站-欧洲杯投注网站推荐-欧洲杯投注网站| 立即访问【ac123.net】
foismail170
 
Full Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptxFull Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptx
mmorales2173
 
Midterm Contract Law and Adminstration.pptx
Midterm Contract Law and Adminstration.pptxMidterm Contract Law and Adminstration.pptx
Midterm Contract Law and Adminstration.pptx
Sheldon Byron
 
欧洲杯投注app-欧洲杯投注app推荐-欧洲杯投注app| 立即访问【ac123.net】
欧洲杯投注app-欧洲杯投注app推荐-欧洲杯投注app| 立即访问【ac123.net】欧洲杯投注app-欧洲杯投注app推荐-欧洲杯投注app| 立即访问【ac123.net】
欧洲杯投注app-欧洲杯投注app推荐-欧洲杯投注app| 立即访问【ac123.net】
foismail170
 
Heidi Livengood Resume Senior Technical Recruiter / HR Generalist
Heidi Livengood Resume Senior Technical Recruiter / HR GeneralistHeidi Livengood Resume Senior Technical Recruiter / HR Generalist
Heidi Livengood Resume Senior Technical Recruiter / HR Generalist
HeidiLivengood
 
How to create an effective K-POC tutorial
How to create an effective K-POC tutorialHow to create an effective K-POC tutorial
How to create an effective K-POC tutorial
vencislavkaaa
 

Recently uploaded (20)

Chapters 3 Contracts.pptx Chapters 3 Contracts.pptx
Chapters 3  Contracts.pptx Chapters 3  Contracts.pptxChapters 3  Contracts.pptx Chapters 3  Contracts.pptx
Chapters 3 Contracts.pptx Chapters 3 Contracts.pptx
 
DIGITAL MARKETING COURSE IN CHENNAI.pptx
DIGITAL MARKETING COURSE IN CHENNAI.pptxDIGITAL MARKETING COURSE IN CHENNAI.pptx
DIGITAL MARKETING COURSE IN CHENNAI.pptx
 
Transferable Skills - Your Roadmap - Part 1 and 2 - Dirk Spencer Senior Recru...
Transferable Skills - Your Roadmap - Part 1 and 2 - Dirk Spencer Senior Recru...Transferable Skills - Your Roadmap - Part 1 and 2 - Dirk Spencer Senior Recru...
Transferable Skills - Your Roadmap - Part 1 and 2 - Dirk Spencer Senior Recru...
 
The Impact of Artificial Intelligence on Modern Society.pdf
The Impact of Artificial Intelligence on Modern Society.pdfThe Impact of Artificial Intelligence on Modern Society.pdf
The Impact of Artificial Intelligence on Modern Society.pdf
 
DOC-20240602-WA0001..pdf DOC-20240602-WA0001..pdf
DOC-20240602-WA0001..pdf DOC-20240602-WA0001..pdfDOC-20240602-WA0001..pdf DOC-20240602-WA0001..pdf
DOC-20240602-WA0001..pdf DOC-20240602-WA0001..pdf
 
Exploring Career Paths in Cybersecurity for Technical Communicators
Exploring Career Paths in Cybersecurity for Technical CommunicatorsExploring Career Paths in Cybersecurity for Technical Communicators
Exploring Career Paths in Cybersecurity for Technical Communicators
 
Personal Brand exploration KE.pdf for assignment
Personal Brand exploration KE.pdf for assignmentPersonal Brand exploration KE.pdf for assignment
Personal Brand exploration KE.pdf for assignment
 
New Explore Careers and College Majors 2024.pdf
New Explore Careers and College Majors 2024.pdfNew Explore Careers and College Majors 2024.pdf
New Explore Careers and College Majors 2024.pdf
 
Andrea Kate Portfolio Presentation.pdf
Andrea Kate  Portfolio  Presentation.pdfAndrea Kate  Portfolio  Presentation.pdf
Andrea Kate Portfolio Presentation.pdf
 
135. Reviewer Certificate in Journal of Engineering
135. Reviewer Certificate in Journal of Engineering135. Reviewer Certificate in Journal of Engineering
135. Reviewer Certificate in Journal of Engineering
 
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
一比一原版(TMU毕业证)多伦多都会大学毕业证如何办理
 
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
原版制作(RMIT毕业证书)墨尔本皇家理工大学毕业证在读证明一模一样
 
Operating system. short answes and Interview questions .pdf
Operating system. short answes and Interview questions .pdfOperating system. short answes and Interview questions .pdf
Operating system. short answes and Interview questions .pdf
 
Digital Marketing Training In Bangalore
Digital  Marketing Training In BangaloreDigital  Marketing Training In Bangalore
Digital Marketing Training In Bangalore
 
欧洲杯投注网站-欧洲杯投注网站推荐-欧洲杯投注网站| 立即访问【ac123.net】
欧洲杯投注网站-欧洲杯投注网站推荐-欧洲杯投注网站| 立即访问【ac123.net】欧洲杯投注网站-欧洲杯投注网站推荐-欧洲杯投注网站| 立即访问【ac123.net】
欧洲杯投注网站-欧洲杯投注网站推荐-欧洲杯投注网站| 立即访问【ac123.net】
 
Full Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptxFull Sail_Morales_Michael_SMM_2024-05.pptx
Full Sail_Morales_Michael_SMM_2024-05.pptx
 
Midterm Contract Law and Adminstration.pptx
Midterm Contract Law and Adminstration.pptxMidterm Contract Law and Adminstration.pptx
Midterm Contract Law and Adminstration.pptx
 
欧洲杯投注app-欧洲杯投注app推荐-欧洲杯投注app| 立即访问【ac123.net】
欧洲杯投注app-欧洲杯投注app推荐-欧洲杯投注app| 立即访问【ac123.net】欧洲杯投注app-欧洲杯投注app推荐-欧洲杯投注app| 立即访问【ac123.net】
欧洲杯投注app-欧洲杯投注app推荐-欧洲杯投注app| 立即访问【ac123.net】
 
Heidi Livengood Resume Senior Technical Recruiter / HR Generalist
Heidi Livengood Resume Senior Technical Recruiter / HR GeneralistHeidi Livengood Resume Senior Technical Recruiter / HR Generalist
Heidi Livengood Resume Senior Technical Recruiter / HR Generalist
 
How to create an effective K-POC tutorial
How to create an effective K-POC tutorialHow to create an effective K-POC tutorial
How to create an effective K-POC tutorial
 

big-datagroup6-150317090053-conversion-gate01.pdf

  • 2. Outline ◦what is Big Data ? ◦where this Beg Data come from? ◦4v`s Analysis ◦When dealing with big Data? ◦EXAMPLE : Google
  • 3. What is big data? “Every day, we create 2.5 quintillion bytes of data — so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is “big data.”
  • 4. Where Is This “Big Data” Coming From? 12+ TBs of tweet data every day 25+ TBs of log data every day ? TBs of data every day 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009… 200M by 2014
  • 5. Volume of Tweets create daily. 12+ terabytes Variety of different types of data. 100’s Value With Big Data, We’ve Moved to 4 Vs Analytics trade events per second. 5+ million Velocity
  • 6. Volume (Scale) Data Volume ◦ 44x increase from 2009 2020 ◦ From 0.8 zettabytes to 35zb Data volume is increasing exponentially 6 Refers to the vast amounts of data generated every second. We are not talking Terabytes but Petabytes . If we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute. This makes most data sets too large to store and analyze using traditional database technology. New big data tools use distributed systems so that we can store and analyse data across databases that are dotted around anywhere in the world.
  • 7. Variety (Complexity) 7 To extract knowledge all these types of data need to be linked together Refers to the different types of data we can now use. In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the world’s data is unstructured (text, images, video, voice, etc.) With big data technology we can now analyze and bring together data of different types such as messages, social media conversations, photos, sensor data, video or voice recordings.
  • 8. Velocity (Speed) Velocity :Refers to the speed at which new data is generated and the speed at which data moves around. Just think of social media messages going viral in seconds. Technology allows us now to analyze the data while it is being generated (sometimes referred to as in-memory analytics), without ever putting it into databases. Examples ◦ E-Promotions: Based on your current location, your purchase history, what you like  send promotions right now for store next to you ◦ Healthcare monitoring: sensors monitoring your activities and body  any abnormal measurements require immediate reaction 8
  • 9. Real-time/Fast Data The progress and innovation is no longer hindered by the ability to collect data But, by the ability to manage, analyze, summarize, visualize, and discover knowledge from the collected data in a timely manner and in a scalable fashion 9 Social media and networks (all of us are generating data) Scientific instruments (collecting all sorts of data) Mobile devices (tracking all objects all the time) Sensor technology and networks (measuring all kinds of data)
  • 10. Value Then there is another V to take into account when looking at Big Data: Value! Having access to big data is no good unless we can turn it into value. Companies are starting to generate amazing value from their big data. We currently only see the beginnings of a transformation into a big data economy. Any business that doesn’t seriously consider the implications of Big Data runs the risk of being left behind. Value
  • 11. Big Data Exploration: Value & Diagram 11 File Systems Relational Data Content Management Email CRM Supply Chain ERP RSS Feeds Cloud Custom Sources Data Explorer Application/ Users Find, Visualize & Understand all big data to improve business knowledge • Greater efficiencies in business processes • New insights from combining and analyzing data types in new ways • Develop new business models with resulting increased market presence and revenue
  • 12. Applications for Big Data Analytics Homeland Security Finance Smarter Healthcare Telecom Manufacturing Traffic Control Trading Analytics Log Analysis Search Quality
  • 13. When dealing with Big Data is hard When the operations on data are complex: ◦ Eg. Simple counting is not a complex problem. ◦ Modeling and reasoning with data of different kinds can get extremely complex Good news with big-data: ◦ Often, because of the vast amount of data, modeling techniques can get simpler (e.g., smart counting can replace complex model-based analytics)… ◦ …as long as we deal with the scale.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20. Hadoopis an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity hardware. Suitable for extremely large databases (billions of rows, millions of columns), distributed across thousands of nodes.
  • 21. Hadoop Distributed File System (HDFS) is a Java-based file system that provides scalable and reliable data storage that is designed to large clusters of commodity servers.
  • 22.
  • 23.
  • 24. MapReduce is a programming model and an associated implementation for processing and generating large data sets with a parallel, distributed algorithm on a cluster.
  • 25.
  • 26. We first wrote the data into HDFS, then created a table and loaded data from HDFS files to HIVE table.
  • 27.