Big-Data Processing utilizing 
Open-Source Technology Stack 
By 
Amir Sedighi 
http://www.linkedin.com/in/amirsedighi 
@amirsedighi 
Linux and Ubuntu 14.10 Release Conf 1
References 
● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e 
2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1 
● http://www.forbes.com/fdc/welcome_mjx.shtml 
● ZYMR Spark Your Real-Time Big Data Analytics 
Linux and Ubuntu 14.10 Release Conf 2 
● http://dataconomy.com 
● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landsca 
pe/ 
● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8 
-9122f7210440&v=qf1&b=&from_search=12 
● https://wiki.apache.org/hadoop/PoweredBy
Data Explosion 
Linux and Ubuntu 14.10 Release Conf 3
Data Explosion 
Linux and Ubuntu 14.10 Release Conf 4
● Big-Data is that everything we do is increasingly 
leaving a digital trace which we (or others) can 
gather, use and analyze. 
– Data Providers 
● Business Companies 
● People 
Linux and Ubuntu 14.10 Release Conf 5
Volume, Velocity, Variety 
● “There was 5 exabytes of 
information created between 
the dawn of civilization 
through 2003, but that much 
information is now created 
every 2 days, and the pace is 
increasing.” Eric Schmidt 
Linux and Ubuntu 14.10 Release Conf 6
Big-Data Processing 
Linux and Ubuntu 14.10 Release Conf 7
How to provide a 
Big-Data processing platform 
using commodity machines? 
Linux and Ubuntu 14.10 Release Conf 8
Vertical or Horizontal? 
Linux and Ubuntu 14.10 Release Conf 9
Scale Up vs Scale Out 
Linux and Ubuntu 14.10 Release Conf 10
Scale Up vs Scale Out 
Linux and Ubuntu 14.10 Release Conf 11
Big-Data Processing 
Open-Source Technology Stack 
Linux and Ubuntu 14.10 Release Conf 12
Map-Reduce 
Linux and Ubuntu 14.10 Release Conf 13
Hadoop Framework 
Linux and Ubuntu 14.10 Release Conf 14
Apache Hadoop Main Projects 
Linux and Ubuntu 14.10 Release Conf 15
Linux and Ubuntu 14.10 Release Conf 16
Data Stores 
Linux and Ubuntu 14.10 Release Conf 17 
● Data Stores 
– KeyValue 
– Graph 
– Columnar 
– Document Store 
– In Memory
Data Transfer 
Linux and Ubuntu 14.10 Release Conf 18 
● Apache Flume 
● Apache Sqoop
Search 
Linux and Ubuntu 14.10 Release Conf 19 
● Elasticsearch 
● Apache SolR
Messaging and Queuing 
Linux and Ubuntu 14.10 Release Conf 20 
● Apache Kafka 
● ZeroMQ
Log Management 
Linux and Ubuntu 14.10 Release Conf 21 
● ELK 
● Logstash 
● FluentD
Stream Processing 
Linux and Ubuntu 14.10 Release Conf 22 
● Apache Storm 
● Apache Samza 
● Apache Spark
Machine Learning 
● Apache Mahout 
Linux and Ubuntu 14.10 Release Conf 23 
● MLLib 
● GraphX
Questions? 
Linux and Ubuntu 14.10 Release Conf 24

Opensource Frameworks and BigData Processing

  • 1.
    Big-Data Processing utilizing Open-Source Technology Stack By Amir Sedighi http://www.linkedin.com/in/amirsedighi @amirsedighi Linux and Ubuntu 14.10 Release Conf 1
  • 2.
    References ● http://www.slideshare.net/BernardMarr/140228-big-data-slide-share?qid=017848e 2-9e2a-4dc3-963c-52b6a90fba2a&v=default&b=&from_search=1 ● http://www.forbes.com/fdc/welcome_mjx.shtml ● ZYMR Spark Your Real-Time Big Data Analytics Linux and Ubuntu 14.10 Release Conf 2 ● http://dataconomy.com ● https://datakulfi.wordpress.com/2013/03/27/big-data-open-source-technology-landsca pe/ ● http://www.slideshare.net/andrefaria/big-data-abc?qid=1ac97e4a-4acc-460a-b3f8 -9122f7210440&v=qf1&b=&from_search=12 ● https://wiki.apache.org/hadoop/PoweredBy
  • 3.
    Data Explosion Linuxand Ubuntu 14.10 Release Conf 3
  • 4.
    Data Explosion Linuxand Ubuntu 14.10 Release Conf 4
  • 5.
    ● Big-Data isthat everything we do is increasingly leaving a digital trace which we (or others) can gather, use and analyze. – Data Providers ● Business Companies ● People Linux and Ubuntu 14.10 Release Conf 5
  • 6.
    Volume, Velocity, Variety ● “There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days, and the pace is increasing.” Eric Schmidt Linux and Ubuntu 14.10 Release Conf 6
  • 7.
    Big-Data Processing Linuxand Ubuntu 14.10 Release Conf 7
  • 8.
    How to providea Big-Data processing platform using commodity machines? Linux and Ubuntu 14.10 Release Conf 8
  • 9.
    Vertical or Horizontal? Linux and Ubuntu 14.10 Release Conf 9
  • 10.
    Scale Up vsScale Out Linux and Ubuntu 14.10 Release Conf 10
  • 11.
    Scale Up vsScale Out Linux and Ubuntu 14.10 Release Conf 11
  • 12.
    Big-Data Processing Open-SourceTechnology Stack Linux and Ubuntu 14.10 Release Conf 12
  • 13.
    Map-Reduce Linux andUbuntu 14.10 Release Conf 13
  • 14.
    Hadoop Framework Linuxand Ubuntu 14.10 Release Conf 14
  • 15.
    Apache Hadoop MainProjects Linux and Ubuntu 14.10 Release Conf 15
  • 16.
    Linux and Ubuntu14.10 Release Conf 16
  • 17.
    Data Stores Linuxand Ubuntu 14.10 Release Conf 17 ● Data Stores – KeyValue – Graph – Columnar – Document Store – In Memory
  • 18.
    Data Transfer Linuxand Ubuntu 14.10 Release Conf 18 ● Apache Flume ● Apache Sqoop
  • 19.
    Search Linux andUbuntu 14.10 Release Conf 19 ● Elasticsearch ● Apache SolR
  • 20.
    Messaging and Queuing Linux and Ubuntu 14.10 Release Conf 20 ● Apache Kafka ● ZeroMQ
  • 21.
    Log Management Linuxand Ubuntu 14.10 Release Conf 21 ● ELK ● Logstash ● FluentD
  • 22.
    Stream Processing Linuxand Ubuntu 14.10 Release Conf 22 ● Apache Storm ● Apache Samza ● Apache Spark
  • 23.
    Machine Learning ●Apache Mahout Linux and Ubuntu 14.10 Release Conf 23 ● MLLib ● GraphX
  • 24.
    Questions? Linux andUbuntu 14.10 Release Conf 24