SlideShare a Scribd company logo
1 of 18
Hadoop for DBA’s



Michael Naumov
DB Expert
michaeln@liveperson.com
LP Facts




           8500+ customers

           450M unique visitors
           per month

           1.3B visits per month

           60TB of new data
           every month
LP Facts

                    Databases in Liveperson

     Oracle – B2B                 SQL Server – B2C



     Hadoop – Raw Data            Vertica – Forecasting / BI



     Cassandra – Application HA   MySQL - Segmentation


     MySQL NDB - ETL              MongoDB – Predictive Targeting
Hadoop in Liveperson



 • 2TB Of data streamed into Hadoop each day

 • 100+ Data Nodes serving our data needs

 • DataNodes are 36GB RAM, 1TBx12 SATA disks, 2
   quad CPU

 • Dozens (and growing) of daily MR jobs

 • 5 different project (and growing) based on Hadoop
   eco-system
Hadoop

What is Hadoop?

•   Open Source project from Apache
•   Able to store and process large amounts of data. Including not only structured
    data but also complex, unstructured data as well.

•   Hadoop is not actually a single product but instead
    a collection of several components

•   Commodity hardware. Low software and hardware costs

•   Shared nothing machines - Scalability
Example Comparison: RDBMS vs. Hadoop


                  Typical Traditional RDBMS   Hadoop

 Data Size        Gigabytes                   Petabytes

 Access           Interactive and Batch       Batch – NOT Interactive

 Updates          Read / Write many times     Write once, Read many times

 Structure        Static Schema               Dynamic Schema

 Scaling          Nonlinear                   Linear

 Query Response   Can be near immediate       Has latency (due to batch processing)
 Time
Hadoop Distributed File System - HDFS


•    HDFS has three types of Nodes

•    Namenode (MasterNode)
      • Distribute files in the cluster
      • Responsible for the replication
        between the datanodes and for file
        blocks location

 •   Datanodes
     • Responsible for actual file store
     • Serving data from files(data) to client

•    BackupNode (version 0.23 and up)
      • It‟s a backup of the NameNode
Hadoop Ecosystem




     •   MapReduce - Framework for writing/executing algorithms
     •   Hbase – Distributed, versioned, column-oriented database
     •   Hive - SQL like interface for large datasets stored in HDFS.
     •   Pig - Pig consists on a high-level language (Pig Latin) for
         expressing data analysis programs
     •   Sqoop - (“SQL-to-Hadoop”) is a straightforward tool. Able to
         Imports/export tables or databases in HDFS/Hive/Hbase
MapReduce

 What is MapReduce?

 • Runs programs (jobs) across many computers
 • Protects against single server failure by re-run failed steps.
 • MR jobs can be written in Java, C, Phyton, Ruby and etc
 • Users only write Map and Reduce functions
    • MAP - Takes a large problem and divides into sub problems.
              Performs the same function on all subsystems
    • REDUCE - Combine the output from all sub-problems


 Example:   $HADOOP_HOME/bin/hadoop jar @HADOOP_HOME/hadoop-streaming.jar 
                    - input myInputDirs 
                    - output myOutputDir 
                    - mapper /bin/cat 
                    - reducer /bin/wc
Hbase

What is Hbase and why should you use HBase?

• Huge volumes of randomly accessed data.
• There is no restrictions on column numbers for rows it‟s dynamic.
• Consider HBase when you‟re loading data by key, searching data by key (or
  range), serving data by key, querying data by key or when storing data by
  row that doesn‟t conform well to a schema.

Hbase dont’s?
• It doesn‟t talk SQL, have an optimizer, support in transactions or joins. If
  you don‟t use any of these in your database application then HBase could
  very well be the perfect fit.

Example:     create ‘blogposts’, ‘post’, ‘image’                          ---create table
             put „blogposts‟, „id1′, „post:title‟, „Hello World‟          ---insert value
             put „blogposts‟, „id1′, „post:body‟, „This is a blog post‟   ---insert value
             put „blogposts‟, „id1′, „image:header‟, „image1.jpg‟         ---insert value
             get „blogposts‟, „id1′                                       ---select records
Hive

What is Hive?

• Built on the MapReduce framework so it generates M/R jobs behind
• Hive is a data warehouse that enables easy data summarization and
  ad-hoc queries via an SQL-like interface for large datasets stored in
  HDFS/Hbase.
• Have partitioning and partition swapping
• Good for random sampling
Example:    CREATE EXTERNAL TABLE vs_hdfs (           select session_id,
            site_id string,                                  get_json_object(concat(tttt, "}"), '$.BY'),
            session_id string,                               get_json_object(concat(tttt, "}"), '$.TEXT') from
            time_stamp bigint,                        (
            visitor_id bigint,                              select session_id,concat("{",
            row_unit string,                                regexp_replace(event, "[{|}]", ""), "}") tttt
            evts string,                               from (
            biz string,                                      select session_id,get_json_object(plne,
            plne string,                                             '$.PLine.evts[*]') pln
            dims string)                                           from vs_hdfs_v1 where site='6964264'
            partitioned by (site string,day string)         and day='20120201' and plne!='{}' limit 10 ) t
            ROW FORMAT DELIMITED FIELDS TERMINATED         LATERAL VIEW explode(split(pln, "},{"))
            BY '001'                                 adTable AS event )t2
            STORED AS SEQUENCEFILE LOCATION
            '/home/data/';
Pig
 What is Pig?
 • Data flow processing
 • Uses Pig Latin query language
 • Highly parallel in order to distribute data processing across many
   servers
 • Combining multiple data sources (Files, Hbase, Hive)

 Example:
Sqoop

 What is Sqoop?

 • It‟s a command line tool for moving data between HDFS and
   relational database systems.
 • You can download drivers for Sqoop from Microsoft and
      •   Import Data/Query results from SQL Server to Hadoop.
      •   Export Data from Hadoop to SQL Server.
 • It‟s like BCP


 Example:
 $bin/sqoop import --connect
    'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' 
     --table lineitem --hive-import

 $bin/sqoop export --connect
     'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table
     lineitem --export-dir /data/lineitemData
Other projects in Hadoop


 There is many other projects we didn‟t talk about

 •   Chukwa
 •   Mahout
 •   Avro
 •   Zookeeper
 •   Fuse
 •   Flume
 •   Oozie
 •   Hue
 •   Hiho
 •   ………………..
Hadoop and Microsoft




 • Bring Hive data directly to Excel through the Microsoft Hive Add-in for Excel
 • Build a PowerPivot/PowerView on top of Hive
 • Instead of manually refreshing a PowerPivot workbook based on Hive on
   their desktop, users can use PowerPivot for SharePoint to schedule a data
   refresh feature to refresh a central copy shared with others, without worrying
   about the time or resources it takes.
 • BI Professionals can build BI Semantic Model or Reporting Services
    Reports on Hive in SQL Server Data tools
HOW DO I GET STARTED

• Microsoft
    •   https://www.hadooponazure.com/
    •   http://www.microsoft.com/download/en/details.aspx?id=27584 (sqoop driver)


• Open Source: http://hadoop.apacha.org
• Vendors
    •   http://www.cloudera.com
    •   http://hortonworks.com
    •   http://mapr.com
@




Demo
Questions?

More Related Content

What's hot

Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemRajkumar Singh
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationSameer Tiwari
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY pptsravya raju
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadooproyans
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystemsunera pathan
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigMilind Bhandarkar
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorialawesomesos
 

What's hot (20)

Hadoop
HadoopHadoop
Hadoop
 
Big Data and Hadoop Ecosystem
Big Data and Hadoop EcosystemBig Data and Hadoop Ecosystem
Big Data and Hadoop Ecosystem
 
HDFS
HDFSHDFS
HDFS
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
 
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Hadoop - Introduction to Hadoop
Hadoop - Introduction to HadoopHadoop - Introduction to Hadoop
Hadoop - Introduction to Hadoop
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
Hive sq lfor-hadoop
Hive sq lfor-hadoopHive sq lfor-hadoop
Hive sq lfor-hadoop
 
Nextag talk
Nextag talkNextag talk
Nextag talk
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop And Their Ecosystem
 Hadoop And Their Ecosystem Hadoop And Their Ecosystem
Hadoop And Their Ecosystem
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
Hadoop Technologies
Hadoop TechnologiesHadoop Technologies
Hadoop Technologies
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
 
Hadoop overview
Hadoop overviewHadoop overview
Hadoop overview
 

Viewers also liked

Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012sqlserver.co.il
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum versionsqlserver.co.il
 
Query handlingbytheserver
Query handlingbytheserverQuery handlingbytheserver
Query handlingbytheserversqlserver.co.il
 
Sql server user group news january 2013
Sql server user group news   january 2013Sql server user group news   january 2013
Sql server user group news january 2013sqlserver.co.il
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cachesqlserver.co.il
 
Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013sqlserver.co.il
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingImpetus Technologies
 

Viewers also liked (8)

Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012Adi Sapir ISUG 123 11/10/2012
Adi Sapir ISUG 123 11/10/2012
 
Products.intro.forum version
Products.intro.forum versionProducts.intro.forum version
Products.intro.forum version
 
Query handlingbytheserver
Query handlingbytheserverQuery handlingbytheserver
Query handlingbytheserver
 
DAC 2012
DAC 2012DAC 2012
DAC 2012
 
Sql server user group news january 2013
Sql server user group news   january 2013Sql server user group news   january 2013
Sql server user group news january 2013
 
Things you can find in the plan cache
Things you can find in the plan cacheThings you can find in the plan cache
Things you can find in the plan cache
 
Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013Windows azure sql_database_security_isug012013
Windows azure sql_database_security_isug012013
 
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop ConsultingAdvanced Hadoop Tuning and Optimization - Hadoop Consulting
Advanced Hadoop Tuning and Optimization - Hadoop Consulting
 

Similar to מיכאל

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Jonathan Seidman
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderDmitry Makarchuk
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight ServiceNeil Mackenzie
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it bettergvernik
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?gvernik
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopHortonworks
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1Sperasoft
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxvishwasgarade1
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010nzhang
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pigSudar Muthu
 

Similar to מיכאל (20)

Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Hadoop and mysql by Chris Schneider
Hadoop and mysql by Chris SchneiderHadoop and mysql by Chris Schneider
Hadoop and mysql by Chris Schneider
 
Hive
HiveHive
Hive
 
Windows Azure HDInsight Service
Windows Azure HDInsight ServiceWindows Azure HDInsight Service
Windows Azure HDInsight Service
 
Hadoop and object stores can we do it better
Hadoop and object stores  can we do it betterHadoop and object stores  can we do it better
Hadoop and object stores can we do it better
 
Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?Hadoop and object stores: Can we do it better?
Hadoop and object stores: Can we do it better?
 
Yahoo! Hack Europe Workshop
Yahoo! Hack Europe WorkshopYahoo! Hack Europe Workshop
Yahoo! Hack Europe Workshop
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Apache Hadoop 1.1
Apache Hadoop 1.1Apache Hadoop 1.1
Apache Hadoop 1.1
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
Horizon for Big Data
Horizon for Big DataHorizon for Big Data
Horizon for Big Data
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
6.hive
6.hive6.hive
6.hive
 

More from sqlserver.co.il

SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3sqlserver.co.il
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2sqlserver.co.il
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Eventssqlserver.co.il
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoresqlserver.co.il
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACsqlserver.co.il
 
SQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: SpatialSQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: Spatialsqlserver.co.il
 
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf FraenkelBi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf Fraenkelsqlserver.co.il
 
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...sqlserver.co.il
 
Extreme performance - IDF UG
Extreme performance - IDF UGExtreme performance - IDF UG
Extreme performance - IDF UGsqlserver.co.il
 
3 extreme performance - databases acceleration using ssd
3   extreme performance - databases acceleration using ssd 3   extreme performance - databases acceleration using ssd
3 extreme performance - databases acceleration using ssd sqlserver.co.il
 
3 extreme performance - databases acceleration using ssd
3   extreme performance - databases acceleration using ssd 3   extreme performance - databases acceleration using ssd
3 extreme performance - databases acceleration using ssd sqlserver.co.il
 
4 extreme performance - part ii
4   extreme performance - part ii4   extreme performance - part ii
4 extreme performance - part iisqlserver.co.il
 
2 extreme performance - smaller is better
2   extreme performance - smaller is better2   extreme performance - smaller is better
2 extreme performance - smaller is bettersqlserver.co.il
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part isqlserver.co.il
 

More from sqlserver.co.il (20)

SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3SQL Explore 2012: P&T Part 3
SQL Explore 2012: P&T Part 3
 
SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2SQL Explore 2012: P&T Part 2
SQL Explore 2012: P&T Part 2
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended EventsSQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
SQL Explore 2012 - Tzahi Hakikat and Keren Bartal: Extended Events
 
SQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStoreSQL Explore 2012 - Michael Zilberstein: ColumnStore
SQL Explore 2012 - Michael Zilberstein: ColumnStore
 
SQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DACSQL Explore 2012 - Meir Dudai: DAC
SQL Explore 2012 - Meir Dudai: DAC
 
SQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: SpatialSQL Explore 2012 - Aviad Deri: Spatial
SQL Explore 2012 - Aviad Deri: Spatial
 
נועם
נועםנועם
נועם
 
עדי
עדיעדי
עדי
 
מיכאל
מיכאלמיכאל
מיכאל
 
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf FraenkelBi303 data warehousing with fast track and pdw - Assaf Fraenkel
Bi303 data warehousing with fast track and pdw - Assaf Fraenkel
 
DBCC - Dubi Lebel
DBCC - Dubi LebelDBCC - Dubi Lebel
DBCC - Dubi Lebel
 
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...Fast transition to sql server 2012 from mssql 2005 2008 for  developers - Dav...
Fast transition to sql server 2012 from mssql 2005 2008 for developers - Dav...
 
ISUG 113: File stream
ISUG 113: File streamISUG 113: File stream
ISUG 113: File stream
 
Extreme performance - IDF UG
Extreme performance - IDF UGExtreme performance - IDF UG
Extreme performance - IDF UG
 
3 extreme performance - databases acceleration using ssd
3   extreme performance - databases acceleration using ssd 3   extreme performance - databases acceleration using ssd
3 extreme performance - databases acceleration using ssd
 
3 extreme performance - databases acceleration using ssd
3   extreme performance - databases acceleration using ssd 3   extreme performance - databases acceleration using ssd
3 extreme performance - databases acceleration using ssd
 
4 extreme performance - part ii
4   extreme performance - part ii4   extreme performance - part ii
4 extreme performance - part ii
 
2 extreme performance - smaller is better
2   extreme performance - smaller is better2   extreme performance - smaller is better
2 extreme performance - smaller is better
 
1 extreme performance - part i
1   extreme performance - part i1   extreme performance - part i
1 extreme performance - part i
 

Recently uploaded

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 

Recently uploaded (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 

מיכאל

  • 1. Hadoop for DBA’s Michael Naumov DB Expert michaeln@liveperson.com
  • 2. LP Facts 8500+ customers 450M unique visitors per month 1.3B visits per month 60TB of new data every month
  • 3. LP Facts Databases in Liveperson Oracle – B2B SQL Server – B2C Hadoop – Raw Data Vertica – Forecasting / BI Cassandra – Application HA MySQL - Segmentation MySQL NDB - ETL MongoDB – Predictive Targeting
  • 4. Hadoop in Liveperson • 2TB Of data streamed into Hadoop each day • 100+ Data Nodes serving our data needs • DataNodes are 36GB RAM, 1TBx12 SATA disks, 2 quad CPU • Dozens (and growing) of daily MR jobs • 5 different project (and growing) based on Hadoop eco-system
  • 5. Hadoop What is Hadoop? • Open Source project from Apache • Able to store and process large amounts of data. Including not only structured data but also complex, unstructured data as well. • Hadoop is not actually a single product but instead a collection of several components • Commodity hardware. Low software and hardware costs • Shared nothing machines - Scalability
  • 6. Example Comparison: RDBMS vs. Hadoop Typical Traditional RDBMS Hadoop Data Size Gigabytes Petabytes Access Interactive and Batch Batch – NOT Interactive Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Scaling Nonlinear Linear Query Response Can be near immediate Has latency (due to batch processing) Time
  • 7. Hadoop Distributed File System - HDFS • HDFS has three types of Nodes • Namenode (MasterNode) • Distribute files in the cluster • Responsible for the replication between the datanodes and for file blocks location • Datanodes • Responsible for actual file store • Serving data from files(data) to client • BackupNode (version 0.23 and up) • It‟s a backup of the NameNode
  • 8. Hadoop Ecosystem • MapReduce - Framework for writing/executing algorithms • Hbase – Distributed, versioned, column-oriented database • Hive - SQL like interface for large datasets stored in HDFS. • Pig - Pig consists on a high-level language (Pig Latin) for expressing data analysis programs • Sqoop - (“SQL-to-Hadoop”) is a straightforward tool. Able to Imports/export tables or databases in HDFS/Hive/Hbase
  • 9. MapReduce What is MapReduce? • Runs programs (jobs) across many computers • Protects against single server failure by re-run failed steps. • MR jobs can be written in Java, C, Phyton, Ruby and etc • Users only write Map and Reduce functions • MAP - Takes a large problem and divides into sub problems. Performs the same function on all subsystems • REDUCE - Combine the output from all sub-problems Example: $HADOOP_HOME/bin/hadoop jar @HADOOP_HOME/hadoop-streaming.jar - input myInputDirs - output myOutputDir - mapper /bin/cat - reducer /bin/wc
  • 10. Hbase What is Hbase and why should you use HBase? • Huge volumes of randomly accessed data. • There is no restrictions on column numbers for rows it‟s dynamic. • Consider HBase when you‟re loading data by key, searching data by key (or range), serving data by key, querying data by key or when storing data by row that doesn‟t conform well to a schema. Hbase dont’s? • It doesn‟t talk SQL, have an optimizer, support in transactions or joins. If you don‟t use any of these in your database application then HBase could very well be the perfect fit. Example: create ‘blogposts’, ‘post’, ‘image’ ---create table put „blogposts‟, „id1′, „post:title‟, „Hello World‟ ---insert value put „blogposts‟, „id1′, „post:body‟, „This is a blog post‟ ---insert value put „blogposts‟, „id1′, „image:header‟, „image1.jpg‟ ---insert value get „blogposts‟, „id1′ ---select records
  • 11. Hive What is Hive? • Built on the MapReduce framework so it generates M/R jobs behind • Hive is a data warehouse that enables easy data summarization and ad-hoc queries via an SQL-like interface for large datasets stored in HDFS/Hbase. • Have partitioning and partition swapping • Good for random sampling Example: CREATE EXTERNAL TABLE vs_hdfs ( select session_id, site_id string, get_json_object(concat(tttt, "}"), '$.BY'), session_id string, get_json_object(concat(tttt, "}"), '$.TEXT') from time_stamp bigint, ( visitor_id bigint, select session_id,concat("{", row_unit string, regexp_replace(event, "[{|}]", ""), "}") tttt evts string, from ( biz string, select session_id,get_json_object(plne, plne string, '$.PLine.evts[*]') pln dims string) from vs_hdfs_v1 where site='6964264' partitioned by (site string,day string) and day='20120201' and plne!='{}' limit 10 ) t ROW FORMAT DELIMITED FIELDS TERMINATED LATERAL VIEW explode(split(pln, "},{")) BY '001' adTable AS event )t2 STORED AS SEQUENCEFILE LOCATION '/home/data/';
  • 12. Pig What is Pig? • Data flow processing • Uses Pig Latin query language • Highly parallel in order to distribute data processing across many servers • Combining multiple data sources (Files, Hbase, Hive) Example:
  • 13. Sqoop What is Sqoop? • It‟s a command line tool for moving data between HDFS and relational database systems. • You can download drivers for Sqoop from Microsoft and • Import Data/Query results from SQL Server to Hadoop. • Export Data from Hadoop to SQL Server. • It‟s like BCP Example: $bin/sqoop import --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table lineitem --hive-import $bin/sqoop export --connect 'jdbc:sqlserver://10.80.181.127;username=dbuser;password=dbpasswd;database=tpch' --table lineitem --export-dir /data/lineitemData
  • 14. Other projects in Hadoop There is many other projects we didn‟t talk about • Chukwa • Mahout • Avro • Zookeeper • Fuse • Flume • Oozie • Hue • Hiho • ………………..
  • 15. Hadoop and Microsoft • Bring Hive data directly to Excel through the Microsoft Hive Add-in for Excel • Build a PowerPivot/PowerView on top of Hive • Instead of manually refreshing a PowerPivot workbook based on Hive on their desktop, users can use PowerPivot for SharePoint to schedule a data refresh feature to refresh a central copy shared with others, without worrying about the time or resources it takes. • BI Professionals can build BI Semantic Model or Reporting Services Reports on Hive in SQL Server Data tools
  • 16. HOW DO I GET STARTED • Microsoft • https://www.hadooponazure.com/ • http://www.microsoft.com/download/en/details.aspx?id=27584 (sqoop driver) • Open Source: http://hadoop.apacha.org • Vendors • http://www.cloudera.com • http://hortonworks.com • http://mapr.com

Editor's Notes

  1. https://www.hadooponazure.com/Account