WHAT IS BIG DATA?
BY AKHMAD ZAKI ALSAFI
(C-left) 2020
WHY BIG DATA
(FROM BUSINESS AND TECHNICAL PERSPECTIVE)
1. Cost Saving
• Maximixing Big Data Storage
• Using Cluster File System Mechanism
• By Optimising Hadoop
• Using Network Filesystem
2. Time Reductions
• Make Decisions Based on Realtime Data
• Using Real Time Streaming and Real Time Visual Analytical
Tools.
• Like Kafka, Storm, Splunk, etc.
3. New Product Development
• Knowing What Customer Wants from Their Digital Behaviors
• Using Machine Learning tools like Apache Spark, Tensorflow,
Torch, GPU Computation, etc.
• Models like Clustering, Logistic Regression, Deep Learning
4. Understand Market Conditions
• What to promote? What strategy to maintain? What, Which, to
Whom customer loyalty program to give?
• Models like collaborative Filtering, etc.
5. Control Online Reputations
• Using Natural Language Processing to do a Sentiment
Analysis and Make decision to increase customer
engagement.
• Tools like chatbot, etc.
We Do it
Statistically and
With Computation
Heavy Weight!!!
One More!
We Need: BIG DATA CONVERGED
INFRASTRUCTURE Platform For
Unifiying All Tools and Mechanism.
And We Have to Present All The Those
Tools in That Platform In an End User
Friendly Interface, Either As Part of the
Platform if the Platform have it, or by
using another third Party Products
JUST CHECK GARTNER FOR PRODUCTS
STRATEGIES FOR IMPLEMENTING BIG DATA
FOR END USER / CUSTOMER
1. Identify What You Want !
• decide whether you want to:
1. increase the efficiency of customer reps
2. improve operational efficiency
3. increase revenues
4. provide better customer experience
5. improve marketing
2. Leverage Proven Big Data Strategy!
• Performance Management
1. Using business intelligence tools
2. Get historical data from database and store on Hadoop
3. Consists of grouping, aggregating, counting volume and other grouped information.
• Data Exploration
1. Gather information about customer’s behavior
2. Generate new Products and revenue streams
• Social Analytics
1. Sentiment analysis
3. Identify Infrastructural Changes !
• Create Infrastructure that makes integration of data easy
4. Establish Talent Pool!
5. Obsess Over Customer Satisfaction!
6. Ensure Usability!
• The output of Big Data process can be consumed in a format understandable by all staff and
department’s person in charge.
7. Be Agile! (Embrace all circumtances)
WHAT IS BIGDATA
HISTORY OF BIGDATA
HOW BIG DATA ARE STORED?
3 MAIN COMPONENTS
KEY TERMS OF BIG DATA
SOURCE OF BIG DATA AND IT’S FORMAT
HADOOP
A GLANCE OF THE LOCAL FILE SYSTEM
Format
Failure
Theft
Deletion Space Management
HADOOP MECHANISMS
Hadoop FS management
MAP REDUCE
ALGORITHMS
NATURE OF BIG DATA
DISTRIBUTED DATA
FAULT TOLERANCE
DISTRIBUTED PROCESSING
EXAMPLE OF HADOOP BIG DATA STORAGE
APACHE HIVE
FEATURES OF HIVE
1. HDFS Storage
2. Designed for OLAP
3. SQL Interface
4. Fast, Scalable, Extensible
5. Supported by Big Data Execution Engines, like
Apache Spark, Map Reduce, Apache Tez, etc.
ARCHITECTURE OF HIVE
ARCHITECTURE EXPLAINED
1. User Interface
2. Metastore (Store schema, metadata, etc)
3. HiveQL (Querying Metastore for information)
4. Big Data Execution Engine (Distributed Computing Tools spread all over
Hadoop Cluster)
5. HDFS/HBASE, Hadoop basic storage tools.
EXAMPLE OF HADOOP BIG DATA EXECUTION
ENGINE
WHAT IS APACHE SPARK?
• Open Source Software
• Data Processing Engine
• Have Tools for streaming data, transformation of data, and preprocessing data
• In Memory Processing Engine
• Support programming on Java, Scala, Python, and R
APACHE SPARK ECOSYSTEM
BETTER THAN MAP REDUCE
• Map Reduce is I/O Heavy process, need write-in and write-out directly to disk,
Spark is in-memory processing engine.
DISTRIBUTED COMPUTING USING SPARK
SPARK STREAMING
SPARK USE CASE (1) - STREAMING
SPARK USE CASE (2) - ETL
MACHINE LEARNING
WHY MACHINE LEARNING
• Find Pattern on Vast Amount of Data. What Pattern? Behavior Pattern!
• Mining Data for hidden information, hidden pattern
• Mimic Human, in the way of handling communication and semantics.
• Develop complex systems.
ML APPLICATIONS
MOTIVATING EXAMPLE
LEARNING TO FILTER SPAM
THE LEARNING PROCESS
LEARNING ALGORITHMS
MACHINE LEARNING EXAMPLE: EMAIL SPAM
FILTER
DECISION TREE
DECISION TREE EXAMPLE
THANK YOU
(TERIMA KASIH)
(C-Left) 2020
BY AKHMAD ZAKI ALSAFI

What is Big Data ?

  • 1.
    WHAT IS BIGDATA? BY AKHMAD ZAKI ALSAFI (C-left) 2020
  • 2.
    WHY BIG DATA (FROMBUSINESS AND TECHNICAL PERSPECTIVE) 1. Cost Saving • Maximixing Big Data Storage • Using Cluster File System Mechanism • By Optimising Hadoop • Using Network Filesystem 2. Time Reductions • Make Decisions Based on Realtime Data • Using Real Time Streaming and Real Time Visual Analytical Tools. • Like Kafka, Storm, Splunk, etc. 3. New Product Development • Knowing What Customer Wants from Their Digital Behaviors • Using Machine Learning tools like Apache Spark, Tensorflow, Torch, GPU Computation, etc. • Models like Clustering, Logistic Regression, Deep Learning 4. Understand Market Conditions • What to promote? What strategy to maintain? What, Which, to Whom customer loyalty program to give? • Models like collaborative Filtering, etc. 5. Control Online Reputations • Using Natural Language Processing to do a Sentiment Analysis and Make decision to increase customer engagement. • Tools like chatbot, etc. We Do it Statistically and With Computation Heavy Weight!!! One More! We Need: BIG DATA CONVERGED INFRASTRUCTURE Platform For Unifiying All Tools and Mechanism. And We Have to Present All The Those Tools in That Platform In an End User Friendly Interface, Either As Part of the Platform if the Platform have it, or by using another third Party Products JUST CHECK GARTNER FOR PRODUCTS
  • 3.
    STRATEGIES FOR IMPLEMENTINGBIG DATA FOR END USER / CUSTOMER 1. Identify What You Want ! • decide whether you want to: 1. increase the efficiency of customer reps 2. improve operational efficiency 3. increase revenues 4. provide better customer experience 5. improve marketing 2. Leverage Proven Big Data Strategy! • Performance Management 1. Using business intelligence tools 2. Get historical data from database and store on Hadoop 3. Consists of grouping, aggregating, counting volume and other grouped information. • Data Exploration 1. Gather information about customer’s behavior 2. Generate new Products and revenue streams • Social Analytics 1. Sentiment analysis 3. Identify Infrastructural Changes ! • Create Infrastructure that makes integration of data easy 4. Establish Talent Pool! 5. Obsess Over Customer Satisfaction! 6. Ensure Usability! • The output of Big Data process can be consumed in a format understandable by all staff and department’s person in charge. 7. Be Agile! (Embrace all circumtances)
  • 4.
  • 5.
  • 6.
    HOW BIG DATAARE STORED?
  • 7.
  • 8.
    KEY TERMS OFBIG DATA
  • 9.
    SOURCE OF BIGDATA AND IT’S FORMAT
  • 10.
  • 11.
    A GLANCE OFTHE LOCAL FILE SYSTEM Format Failure Theft Deletion Space Management
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
    EXAMPLE OF HADOOPBIG DATA STORAGE APACHE HIVE
  • 20.
    FEATURES OF HIVE 1.HDFS Storage 2. Designed for OLAP 3. SQL Interface 4. Fast, Scalable, Extensible 5. Supported by Big Data Execution Engines, like Apache Spark, Map Reduce, Apache Tez, etc.
  • 21.
  • 22.
    ARCHITECTURE EXPLAINED 1. UserInterface 2. Metastore (Store schema, metadata, etc) 3. HiveQL (Querying Metastore for information) 4. Big Data Execution Engine (Distributed Computing Tools spread all over Hadoop Cluster) 5. HDFS/HBASE, Hadoop basic storage tools.
  • 23.
    EXAMPLE OF HADOOPBIG DATA EXECUTION ENGINE
  • 24.
    WHAT IS APACHESPARK? • Open Source Software • Data Processing Engine • Have Tools for streaming data, transformation of data, and preprocessing data • In Memory Processing Engine • Support programming on Java, Scala, Python, and R
  • 25.
  • 26.
    BETTER THAN MAPREDUCE • Map Reduce is I/O Heavy process, need write-in and write-out directly to disk, Spark is in-memory processing engine.
  • 27.
  • 28.
  • 29.
    SPARK USE CASE(1) - STREAMING
  • 30.
    SPARK USE CASE(2) - ETL
  • 31.
  • 32.
    WHY MACHINE LEARNING •Find Pattern on Vast Amount of Data. What Pattern? Behavior Pattern! • Mining Data for hidden information, hidden pattern • Mimic Human, in the way of handling communication and semantics. • Develop complex systems.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
    MACHINE LEARNING EXAMPLE:EMAIL SPAM FILTER
  • 38.
  • 39.
  • 40.
    THANK YOU (TERIMA KASIH) (C-Left)2020 BY AKHMAD ZAKI ALSAFI