SlideShare a Scribd company logo
Module 1
Introduction of Big Data
What is Big Data?
Large
datasets, complex and changeable
Cannot be stored, managed, and processed
adequately by traditional software management
and software tools.
Solutions?
• Relying on the Technology that handle , process
and analyze large quantities of data.
Types of Solutions:
• Structured
• Multi-Structured
• Unstructured
The need of Big Data…
• In social networking, media and commerce sites, data
growth increasing exponentially and so on for the future.
Analyzing and managing these data properly is the key to
business expansion and growth.
Data Creation
• Every Year, the
world created
• more and more Zeta
bytes of data
• Managing this data
became crucial to
extract more value
in retail, finance,
media and
publishing.
Challenges
• Hadoop later on came as the
answer of software framework
that would support technologies,
tools and infrastructure and also
more effective and efficient
management and analysis of
large datasets.
• Hadoop tools and services
working together in completing
its tasks, forming a Hadoop
ecosystem.
• Primary component:
• HDFS – Distrubuted file system for storage
• MapReduce –Parallel processing
framework
Typical Hadoop eco-system project might use
• ZooKeeper –Coordination framework
• Hive – SQL-like inferface for querying
• Pig – High level scripting language for
batch processing and ETL
• Hbase – Tabular/column storage (non-
relational, distribution database)
• Flume – Distribution, reliable, service for
collecting,aggregating,
and moving large data
Ecosystem Component
Big Data’s Characteristic
Volume
Variety
Velocity
Complexity
Validity
Stored Data
Processing
- Batch-based stored
- Real-Time Data-stream processing
Batch Based Stored Data Processing
• Process large volumes of data
• Can be periodic or one-time processing
• Batch results are produced after data is
collected, entered and processed
• Separate techniques or programs for input,
processing and output
Real Time Data Processing
•Data captured, processed, and
acted 24/7
•Computing data real-time
• Advantages of Real-time:
System scales elastically based on need
Instant Results
Parallel processing is available
Tools and Techniques for analyzing
big Data
The choice of tools mostly driven by:
Who is going to use the data
+
the business requirement for a particular
scenario
While traditional RDBMSs has many limitations, Big Data provide
alternatives address some of these limitations.
HDFS
Sqoop
Database alternatives:
• Document Stores – Apache CouchDB, MongoDB
• Graph stores –Neo4j
• Key-Value Stores –Apache Cassandra, Riak
• Tabular stores –Apache Hbase
Accessing the Big Data
• SQL like connectivity initiates for big data: Impala, Hive and Stinger
Other NoSQL BI Tools: Datameer, Karmasphere and Platfora
• These big data later can be used in:
-Sandboxes for data science projects
-Analytics development using map reduce
-Analytics-query performance enablers
-Search indexes on multi structured data in Hadoop
- Analyzing multi structured Big data Using search
- Data visualization and in memory data in big data enviroment
Considering Using big Data
Is your scenario really require a Big Data
Solution?
Does the solution need to handle data
variety?
Does the solution require ability to deal
with high data velocity?
Does the solution handle high data
volume?
Can the solution handle complexity?
How can you optimize the solution?
Case
Study:
Social
Media
Analytics
Using people’s history on internet, what they buy, what they search giving a rough view
of attitude on a product.
More, these output can be used to study:
customer satisfaction, churn prediction, financial performance, stock performance.
Hourglass
Is a framework that it much easier to write
incremental Hadoop jobs to perform
computation efficiently
Sekian
dan terima kasih

More Related Content

Similar to Modul_1_Introduction_to_Big_Data.pptx

Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
Rajesh Jayarman
 

Similar to Modul_1_Introduction_to_Big_Data.pptx (20)

What is Hadoop & its Use cases-PromtpCloud
What is Hadoop & its Use cases-PromtpCloudWhat is Hadoop & its Use cases-PromtpCloud
What is Hadoop & its Use cases-PromtpCloud
 
Big data unit 2
Big data unit 2Big data unit 2
Big data unit 2
 
Gilbane Boston 2012 Big Data 101
Gilbane Boston 2012 Big Data 101Gilbane Boston 2012 Big Data 101
Gilbane Boston 2012 Big Data 101
 
Big Data Analytics Using Hadoop
Big Data Analytics Using HadoopBig Data Analytics Using Hadoop
Big Data Analytics Using Hadoop
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Introduction to BIG DATA
Introduction to BIG DATA Introduction to BIG DATA
Introduction to BIG DATA
 
Big Data
Big DataBig Data
Big Data
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
 
Simple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform ConceptSimple, Modular and Extensible Big Data Platform Concept
Simple, Modular and Extensible Big Data Platform Concept
 
Big Data Introduction
Big Data IntroductionBig Data Introduction
Big Data Introduction
 
Foxvalley bigdata
Foxvalley bigdataFoxvalley bigdata
Foxvalley bigdata
 
50 Shades of SQL
50 Shades of SQL50 Shades of SQL
50 Shades of SQL
 
Apache-Hadoop-Slides.pptx
Apache-Hadoop-Slides.pptxApache-Hadoop-Slides.pptx
Apache-Hadoop-Slides.pptx
 
Data analytics & its Trends
Data analytics & its TrendsData analytics & its Trends
Data analytics & its Trends
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 

Recently uploaded

Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
AbrahamGadissa
 

Recently uploaded (20)

The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docxThe Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
The Ultimate Guide to External Floating Roofs for Oil Storage Tanks.docx
 
Courier management system project report.pdf
Courier management system project report.pdfCourier management system project report.pdf
Courier management system project report.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdfA CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT.pdf
 
Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.Quality defects in TMT Bars, Possible causes and Potential Solutions.
Quality defects in TMT Bars, Possible causes and Potential Solutions.
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Digital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdfDigital Signal Processing Lecture notes n.pdf
Digital Signal Processing Lecture notes n.pdf
 
Toll tax management system project report..pdf
Toll tax management system project report..pdfToll tax management system project report..pdf
Toll tax management system project report..pdf
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Event Management System Vb Net Project Report.pdf
Event Management System Vb Net  Project Report.pdfEvent Management System Vb Net  Project Report.pdf
Event Management System Vb Net Project Report.pdf
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Natalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in KrakówNatalia Rutkowska - BIM School Course in Kraków
Natalia Rutkowska - BIM School Course in Kraków
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
fluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answerfluid mechanics gate notes . gate all pyqs answer
fluid mechanics gate notes . gate all pyqs answer
 
İTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering WorkshopİTÜ CAD and Reverse Engineering Workshop
İTÜ CAD and Reverse Engineering Workshop
 
Top 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering ScientistTop 13 Famous Civil Engineering Scientist
Top 13 Famous Civil Engineering Scientist
 

Modul_1_Introduction_to_Big_Data.pptx

  • 2. What is Big Data? Large datasets, complex and changeable Cannot be stored, managed, and processed adequately by traditional software management and software tools.
  • 3. Solutions? • Relying on the Technology that handle , process and analyze large quantities of data. Types of Solutions: • Structured • Multi-Structured • Unstructured
  • 4. The need of Big Data… • In social networking, media and commerce sites, data growth increasing exponentially and so on for the future. Analyzing and managing these data properly is the key to business expansion and growth.
  • 5. Data Creation • Every Year, the world created • more and more Zeta bytes of data • Managing this data became crucial to extract more value in retail, finance, media and publishing.
  • 7. • Hadoop later on came as the answer of software framework that would support technologies, tools and infrastructure and also more effective and efficient management and analysis of large datasets. • Hadoop tools and services working together in completing its tasks, forming a Hadoop ecosystem.
  • 8. • Primary component: • HDFS – Distrubuted file system for storage • MapReduce –Parallel processing framework Typical Hadoop eco-system project might use • ZooKeeper –Coordination framework • Hive – SQL-like inferface for querying • Pig – High level scripting language for batch processing and ETL • Hbase – Tabular/column storage (non- relational, distribution database) • Flume – Distribution, reliable, service for collecting,aggregating, and moving large data Ecosystem Component
  • 10. Stored Data Processing - Batch-based stored - Real-Time Data-stream processing
  • 11. Batch Based Stored Data Processing • Process large volumes of data • Can be periodic or one-time processing • Batch results are produced after data is collected, entered and processed • Separate techniques or programs for input, processing and output
  • 12. Real Time Data Processing •Data captured, processed, and acted 24/7 •Computing data real-time • Advantages of Real-time: System scales elastically based on need Instant Results Parallel processing is available
  • 13. Tools and Techniques for analyzing big Data The choice of tools mostly driven by: Who is going to use the data + the business requirement for a particular scenario
  • 14. While traditional RDBMSs has many limitations, Big Data provide alternatives address some of these limitations. HDFS Sqoop Database alternatives: • Document Stores – Apache CouchDB, MongoDB • Graph stores –Neo4j • Key-Value Stores –Apache Cassandra, Riak • Tabular stores –Apache Hbase
  • 15. Accessing the Big Data • SQL like connectivity initiates for big data: Impala, Hive and Stinger Other NoSQL BI Tools: Datameer, Karmasphere and Platfora • These big data later can be used in: -Sandboxes for data science projects -Analytics development using map reduce -Analytics-query performance enablers -Search indexes on multi structured data in Hadoop - Analyzing multi structured Big data Using search - Data visualization and in memory data in big data enviroment
  • 16. Considering Using big Data Is your scenario really require a Big Data Solution? Does the solution need to handle data variety? Does the solution require ability to deal with high data velocity? Does the solution handle high data volume? Can the solution handle complexity? How can you optimize the solution?
  • 17. Case Study: Social Media Analytics Using people’s history on internet, what they buy, what they search giving a rough view of attitude on a product. More, these output can be used to study: customer satisfaction, churn prediction, financial performance, stock performance.
  • 18. Hourglass Is a framework that it much easier to write incremental Hadoop jobs to perform computation efficiently