SlideShare a Scribd company logo
1 of 15
Download to read offline
HIVE
data warehousing using Hadoop




Facebook Data Team
Motivation

 Structured log and dimension data
  – Well known schemas, different serialization formats (binary/text)
  – Rich data structures – nesting/maps/lists

 Query language over structured data
  – SQL helps in easier adoption by business analysts + reduced learning
    curve for everyone
  – Developers love streaming and direct access to map-reduce
  – Query Language brings together SQL and Streaming

 Data Management
  – Tables/Partitions for easy data addressability
  – Abstractions allow optimizations:
        Organize data for large joins/sampling
        Add indices/manage compression/replication transparently
What is HIVE?
 Mgmt. Web UI



                                                               Map Reduce      HDFS


                              Hive CLI
                  Browsing       Queries    DDL


                 Thrift API                Parser
                                                               Execution
                                           Planner
                                                     Hive QL

                                                                 SerDe
                                                          Thrift Jute JSON..
                MetaStore
Dealing with Structured Data

 Type system
  – Primitive types
  – Recursively build up using Composition/Maps/Lists
 Generic (De)Serialization Interface (SerDe)
  – To recursively list schema
  – To recursively access fields within a row object
 Serialization families implement interface
  – Thrift (Binary and Delimited Text), RecordIO, JSON/PADS(?)
 XPath like field expressions
  – profiles.network[@is_primary=1].id
 Inbuilt DDL
  – Define schema over delimited text files
  – Leverages Thrift DDL
Data Model
                                                     #Partitions=32
                                        Schema       Sort-key=uid
                                                     uid
                                         Library




                  Hash         clicks
               Partitioning
                               views        IP
Logical Partitioning                        userId
                                 …
                                            AdId
/hive/clicks
/hive/clicks/ds=2008-03-25     Tables    Dimensions
/hive/clicks/ds=2008-03-25/0

                       HDFS    MetaStore
MetaStore

 Stores Table/Partition properties:
  –   Table schema and SerDe library
  –   Table Location on HDFS
  –   Logical Partitioning keys and types
  –   Sort column
  –   Mapping from columns to well known Dimensions


 Thrift API
  – Current clients in Php (Web Interface), Python (CLI), Java (Query
    Engine), Perl (Tests)
 Stores all properties in text files
Hive CLI

 Implemented in Python
  – uses MetaStore Thrift API
 DDL:
  – create table/drop table/rename table
  – alter table add column etc.
 Browsing:
  – show tables
  – describe table
  – cat table
 Loading Data
  – load data inpath <path1, …> into table <tablename/partition-spec>]
    [bucketed <N> ways by <dimension>]
 Queries
  – Issue queries in Hive QL.
Hive Query Language

 Philosophy
  – SQL like constructs + Hadoop Streaming


 Query Operators in initial version
  –   Projections
  –   Equijoins and Cogroups
  –   Group by
  –   Sampling


 Output of these operators can be:
  – passed to Streaming mappers/reducers
  – can be stored in another Hive Table
  – can be output to HDFS files
Hive Query Language

 Package these capabilities into a more formal SQL like query language
 in next version
 Introduce other important constructs:
  –   Views
  –   Multi table inserts
  –   Order bys
  –   Select distincts
  –   SQL like column expressions
  –   A bunch of other builtin functions
 Still work in progress
Query Language - Examples

  Multi table inserts

  FROM ad_impressions_stg imps
   INSERT INTO ad_legals/ds=2008-03-08 select imps.* where imps.legal = 1
   INSERT INTO ad_non_legals/ds=2008-03-08 select imps.* where imps.legal = 0


  Joins

 FROM ad_impressions imps, ad_dimensions ads
  INSERT INTO ad_legals_joined select imps.*, ads.campaignid
             JOIN ON(imps.adid, ads.adid)
             WHERE imps.legal = 1
Query Language - Examples

 Group By

 FROM ad_legals_joined imps
        INSERT INTO hdfs://hadoop001:9000/user/ads/adid_uu_summary
               select imps.adid, count_distinct(imps.uid)
               group by(imps.adid)
   INSERT INTO hdfs://hadoop001:9000/user/ads/campaignid_uu_summary
               select imps.campaign_id, count_distinct(imps.uid)
               group by(imps.campaignid)
Query Language – HadoopStreaming

 APPLY ON TABLE

 CREATE OPERATOR filter_legal using ‘exec://filter_legal.py’
        (ts date, adid long, uid long)

 FROM (APPLY filter_legal ON TABLE ad_impression)
        INSERT INTO ad_legals where ts >= ‘2008-03-11’ and ts < ‘2008-03-12’


 APPLY can also be applied after JOIN as reducer script

 FROM ad_impressions imps, ad_dimensions ads
      INSERT INTO ad_legals_joined select imps.*, ads.campaignid
                  JOIN ON(imps.adid, ads.adid)
                  APPLY filter_legal BEFORE OUTPUT
Query Language – Views

 Used for expressing
  – Union alls
  – APPLY operators


 Example

 CREATE VIEW actions
 SELECT photo_views.*
 UNION ALL
 SELECT video_views.*
 UNION ALL
 SELECT profile_views.* …
Hive Usage in Facebook

 Applications:
  – Summarization
       Eg: Daily/Weekly aggregations of impression/click counts
  – Ad hoc Analysis
       Eg: how many group admins broken down by state/country
  – Data Mining (Assembling training data)
       Eg: User Engagement as a function of user attributes
 Usage statistics:
  – Total Users: ~40 (about 25% of engineering !)
  – Hive Data (compressed): 22 TB total, ~200GB incoming per day
  – Jobs over last 7 days:
        Total Jobs: 3514, Projections:821, Joins: 152, Aggregates: 800,
        Loaders: 600
     * Aggregates biased because of multi-stage map-reduce
Conclusion

 Release to Open Source in 3-4 months
 People:
  –   Suresh Anthony (suresh@facebook.com)
  –   Jeff Hammerbacher (jeffh@)
  –   Joydeep Sarma (jssarma@)
  –   Ashish Thusoo (athusoo@)
  –   Pete Wyckoff (pwyckoff@)

More Related Content

What's hot

Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
Hortonworks
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
Simplilearn
 
EAD Revision, EAC-CPF introduction
EAD Revision, EAC-CPF introductionEAD Revision, EAC-CPF introduction
EAD Revision, EAC-CPF introduction
timothyryan50
 

What's hot (20)

Apache hive
Apache hiveApache hive
Apache hive
 
SQL Server 2012 Beyond Relational Performance and Scale
SQL Server 2012 Beyond Relational Performance and ScaleSQL Server 2012 Beyond Relational Performance and Scale
SQL Server 2012 Beyond Relational Performance and Scale
 
Microsoft's Hadoop Story
Microsoft's Hadoop StoryMicrosoft's Hadoop Story
Microsoft's Hadoop Story
 
Advanced topics in hive
Advanced topics in hiveAdvanced topics in hive
Advanced topics in hive
 
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
24 Hour of PASS: Taking SQL Server into the Beyond Relational Realm
 
Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012Future of HCatalog - Hadoop Summit 2012
Future of HCatalog - Hadoop Summit 2012
 
ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON ACADGILD:: HADOOP LESSON
ACADGILD:: HADOOP LESSON
 
Hive hcatalog
Hive hcatalogHive hcatalog
Hive hcatalog
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
 
Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014Apache Drill at ApacheCon2014
Apache Drill at ApacheCon2014
 
Couch db
Couch dbCouch db
Couch db
 
CouchDB
CouchDBCouchDB
CouchDB
 
CouchDB – A Database for the Web
CouchDB – A Database for the WebCouchDB – A Database for the Web
CouchDB – A Database for the Web
 
SQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured DataSQLBits X SQL Server 2012 Rich Unstructured Data
SQLBits X SQL Server 2012 Rich Unstructured Data
 
Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)Introduction to ArangoDB (nosql matters Barcelona 2012)
Introduction to ArangoDB (nosql matters Barcelona 2012)
 
Hadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in ActionHadoop and its Ecosystem Components in Action
Hadoop and its Ecosystem Components in Action
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
 
SQL Server 2012 - FileTables
SQL Server 2012 - FileTables SQL Server 2012 - FileTables
SQL Server 2012 - FileTables
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
EAD Revision, EAC-CPF introduction
EAD Revision, EAC-CPF introductionEAD Revision, EAC-CPF introduction
EAD Revision, EAC-CPF introduction
 

Viewers also liked

Reaching new levels of customer service and billing accuracies with advanced ...
Reaching new levels of customer service and billing accuracies with advanced ...Reaching new levels of customer service and billing accuracies with advanced ...
Reaching new levels of customer service and billing accuracies with advanced ...
robgirvan
 
Work clearance management operational switching
Work clearance management   operational switchingWork clearance management   operational switching
Work clearance management operational switching
robgirvan
 
Transforming the customer experience crb
Transforming the customer experience   crbTransforming the customer experience   crb
Transforming the customer experience crb
robgirvan
 
Enterprise DataWarehousing + Management Information
Enterprise DataWarehousing + Management InformationEnterprise DataWarehousing + Management Information
Enterprise DataWarehousing + Management Information
Ajay Kumar Uppal
 
M keynote 820_mc_cue_jones
M keynote 820_mc_cue_jonesM keynote 820_mc_cue_jones
M keynote 820_mc_cue_jones
robgirvan
 
Using hana to add value to electric & gas revenue integrity
Using hana to add value to electric & gas revenue integrityUsing hana to add value to electric & gas revenue integrity
Using hana to add value to electric & gas revenue integrity
robgirvan
 
MLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reducedMLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reduced
Chao Chen
 
Sap hana experiences at southern california edison — bw hana and standalone hana
Sap hana experiences at southern california edison — bw hana and standalone hanaSap hana experiences at southern california edison — bw hana and standalone hana
Sap hana experiences at southern california edison — bw hana and standalone hana
robgirvan
 

Viewers also liked (20)

Reaching new levels of customer service and billing accuracies with advanced ...
Reaching new levels of customer service and billing accuracies with advanced ...Reaching new levels of customer service and billing accuracies with advanced ...
Reaching new levels of customer service and billing accuracies with advanced ...
 
Oracl apps api usages
Oracl apps api usagesOracl apps api usages
Oracl apps api usages
 
Work clearance management operational switching
Work clearance management   operational switchingWork clearance management   operational switching
Work clearance management operational switching
 
Transforming the customer experience crb
Transforming the customer experience   crbTransforming the customer experience   crb
Transforming the customer experience crb
 
Sas university edition install guide mac
Sas university edition install guide macSas university edition install guide mac
Sas university edition install guide mac
 
Verizon rp pci report-2015-en_xg
Verizon rp pci report-2015-en_xgVerizon rp pci report-2015-en_xg
Verizon rp pci report-2015-en_xg
 
Google.value.analysis.for.business.growth
Google.value.analysis.for.business.growthGoogle.value.analysis.for.business.growth
Google.value.analysis.for.business.growth
 
Enterprise DataWarehousing + Management Information
Enterprise DataWarehousing + Management InformationEnterprise DataWarehousing + Management Information
Enterprise DataWarehousing + Management Information
 
Vodafone survey
Vodafone surveyVodafone survey
Vodafone survey
 
Rhel Tuningand Optimizationfor Oracle V11
Rhel Tuningand Optimizationfor Oracle V11Rhel Tuningand Optimizationfor Oracle V11
Rhel Tuningand Optimizationfor Oracle V11
 
Rhhpc Installation Guide 20100524
Rhhpc Installation Guide 20100524Rhhpc Installation Guide 20100524
Rhhpc Installation Guide 20100524
 
Lenovo SAPPHIRE 2016 presentation at SUSE booth
Lenovo SAPPHIRE 2016 presentation at SUSE boothLenovo SAPPHIRE 2016 presentation at SUSE booth
Lenovo SAPPHIRE 2016 presentation at SUSE booth
 
Centerity suse sapphire2016_booth-presentation
Centerity suse sapphire2016_booth-presentationCenterity suse sapphire2016_booth-presentation
Centerity suse sapphire2016_booth-presentation
 
Aws securing data_at_rest_with_encryption (1)
Aws securing data_at_rest_with_encryption (1)Aws securing data_at_rest_with_encryption (1)
Aws securing data_at_rest_with_encryption (1)
 
M keynote 820_mc_cue_jones
M keynote 820_mc_cue_jonesM keynote 820_mc_cue_jones
M keynote 820_mc_cue_jones
 
Using hana to add value to electric & gas revenue integrity
Using hana to add value to electric & gas revenue integrityUsing hana to add value to electric & gas revenue integrity
Using hana to add value to electric & gas revenue integrity
 
MLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reducedMLlib sparkmeetup_8_6_13_final_reduced
MLlib sparkmeetup_8_6_13_final_reduced
 
Sap hana experiences at southern california edison — bw hana and standalone hana
Sap hana experiences at southern california edison — bw hana and standalone hanaSap hana experiences at southern california edison — bw hana and standalone hana
Sap hana experiences at southern california edison — bw hana and standalone hana
 
Sap Busines Suite At IBM event
Sap Busines Suite At IBM eventSap Busines Suite At IBM event
Sap Busines Suite At IBM event
 
TCS SUSE sapphire2016_booth-presentation
TCS SUSE sapphire2016_booth-presentationTCS SUSE sapphire2016_booth-presentation
TCS SUSE sapphire2016_booth-presentation
 

Similar to 20080529dublinpt3

Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalk
nzhang
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
S S
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
elliando dias
 

Similar to 20080529dublinpt3 (20)

02 data warehouse applications with hive
02 data warehouse applications with hive02 data warehouse applications with hive
02 data warehouse applications with hive
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
 
WaterlooHiveTalk
WaterlooHiveTalkWaterlooHiveTalk
WaterlooHiveTalk
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
 
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
Data Analysis with Hadoop and Hive, ChicagoDB 2/21/2011
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at  FacebookHadoop and Hive Development at  Facebook
Hadoop and Hive Development at Facebook
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
SQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle ProfessionalSQL on Hadoop for the Oracle Professional
SQL on Hadoop for the Oracle Professional
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
hive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptxhive_slides_Webinar_Session_1.pptx
hive_slides_Webinar_Session_1.pptx
 
Hadoop - A big data initiative
Hadoop - A big data initiativeHadoop - A big data initiative
Hadoop - A big data initiative
 
CCD-410 Cloudera Study Material
CCD-410 Cloudera Study MaterialCCD-410 Cloudera Study Material
CCD-410 Cloudera Study Material
 
Hadoop: An Industry Perspective
Hadoop: An Industry PerspectiveHadoop: An Industry Perspective
Hadoop: An Industry Perspective
 
1. Apache HIVE
1. Apache HIVE1. Apache HIVE
1. Apache HIVE
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 

More from Jeff Hammerbacher (20)

20120223keystone
20120223keystone20120223keystone
20120223keystone
 
20100714accel
20100714accel20100714accel
20100714accel
 
20100608sigmod
20100608sigmod20100608sigmod
20100608sigmod
 
20100513brown
20100513brown20100513brown
20100513brown
 
20100423sage
20100423sage20100423sage
20100423sage
 
20100418sos
20100418sos20100418sos
20100418sos
 
20100301icde
20100301icde20100301icde
20100301icde
 
20100201hplabs
20100201hplabs20100201hplabs
20100201hplabs
 
20100128ebay
20100128ebay20100128ebay
20100128ebay
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091203gemini
20091203gemini20091203gemini
20091203gemini
 
20091110startup2startup
20091110startup2startup20091110startup2startup
20091110startup2startup
 
20091030nasajpl
20091030nasajpl20091030nasajpl
20091030nasajpl
 
20091027genentech
20091027genentech20091027genentech
20091027genentech
 
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
Mårten Mickos's presentation "Open Source: Why Freedom Makes a Better Busines...
 
20090622 Velocity
20090622 Velocity20090622 Velocity
20090622 Velocity
 
20090422 Www
20090422 Www20090422 Www
20090422 Www
 
20090309berkeley
20090309berkeley20090309berkeley
20090309berkeley
 
20081030linkedin
20081030linkedin20081030linkedin
20081030linkedin
 
20081022cca
20081022cca20081022cca
20081022cca
 

Recently uploaded

Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
dlhescort
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
lizamodels9
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Sheetaleventcompany
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
amitlee9823
 
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
lizamodels9
 

Recently uploaded (20)

Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
Call Girls Zirakpur👧 Book Now📱7837612180 📞👉Call Girl Service In Zirakpur No A...
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024Marel Q1 2024 Investor Presentation from May 8, 2024
Marel Q1 2024 Investor Presentation from May 8, 2024
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
 
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
Russian Call Girls In Rajiv Chowk Gurgaon ❤️8448577510 ⊹Best Escorts Service ...
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 

20080529dublinpt3

  • 1. HIVE data warehousing using Hadoop Facebook Data Team
  • 2. Motivation Structured log and dimension data – Well known schemas, different serialization formats (binary/text) – Rich data structures – nesting/maps/lists Query language over structured data – SQL helps in easier adoption by business analysts + reduced learning curve for everyone – Developers love streaming and direct access to map-reduce – Query Language brings together SQL and Streaming Data Management – Tables/Partitions for easy data addressability – Abstractions allow optimizations: Organize data for large joins/sampling Add indices/manage compression/replication transparently
  • 3. What is HIVE? Mgmt. Web UI Map Reduce HDFS Hive CLI Browsing Queries DDL Thrift API Parser Execution Planner Hive QL SerDe Thrift Jute JSON.. MetaStore
  • 4. Dealing with Structured Data Type system – Primitive types – Recursively build up using Composition/Maps/Lists Generic (De)Serialization Interface (SerDe) – To recursively list schema – To recursively access fields within a row object Serialization families implement interface – Thrift (Binary and Delimited Text), RecordIO, JSON/PADS(?) XPath like field expressions – profiles.network[@is_primary=1].id Inbuilt DDL – Define schema over delimited text files – Leverages Thrift DDL
  • 5. Data Model #Partitions=32 Schema Sort-key=uid uid Library Hash clicks Partitioning views IP Logical Partitioning userId … AdId /hive/clicks /hive/clicks/ds=2008-03-25 Tables Dimensions /hive/clicks/ds=2008-03-25/0 HDFS MetaStore
  • 6. MetaStore Stores Table/Partition properties: – Table schema and SerDe library – Table Location on HDFS – Logical Partitioning keys and types – Sort column – Mapping from columns to well known Dimensions Thrift API – Current clients in Php (Web Interface), Python (CLI), Java (Query Engine), Perl (Tests) Stores all properties in text files
  • 7. Hive CLI Implemented in Python – uses MetaStore Thrift API DDL: – create table/drop table/rename table – alter table add column etc. Browsing: – show tables – describe table – cat table Loading Data – load data inpath <path1, …> into table <tablename/partition-spec>] [bucketed <N> ways by <dimension>] Queries – Issue queries in Hive QL.
  • 8. Hive Query Language Philosophy – SQL like constructs + Hadoop Streaming Query Operators in initial version – Projections – Equijoins and Cogroups – Group by – Sampling Output of these operators can be: – passed to Streaming mappers/reducers – can be stored in another Hive Table – can be output to HDFS files
  • 9. Hive Query Language Package these capabilities into a more formal SQL like query language in next version Introduce other important constructs: – Views – Multi table inserts – Order bys – Select distincts – SQL like column expressions – A bunch of other builtin functions Still work in progress
  • 10. Query Language - Examples Multi table inserts FROM ad_impressions_stg imps INSERT INTO ad_legals/ds=2008-03-08 select imps.* where imps.legal = 1 INSERT INTO ad_non_legals/ds=2008-03-08 select imps.* where imps.legal = 0 Joins FROM ad_impressions imps, ad_dimensions ads INSERT INTO ad_legals_joined select imps.*, ads.campaignid JOIN ON(imps.adid, ads.adid) WHERE imps.legal = 1
  • 11. Query Language - Examples Group By FROM ad_legals_joined imps INSERT INTO hdfs://hadoop001:9000/user/ads/adid_uu_summary select imps.adid, count_distinct(imps.uid) group by(imps.adid) INSERT INTO hdfs://hadoop001:9000/user/ads/campaignid_uu_summary select imps.campaign_id, count_distinct(imps.uid) group by(imps.campaignid)
  • 12. Query Language – HadoopStreaming APPLY ON TABLE CREATE OPERATOR filter_legal using ‘exec://filter_legal.py’ (ts date, adid long, uid long) FROM (APPLY filter_legal ON TABLE ad_impression) INSERT INTO ad_legals where ts >= ‘2008-03-11’ and ts < ‘2008-03-12’ APPLY can also be applied after JOIN as reducer script FROM ad_impressions imps, ad_dimensions ads INSERT INTO ad_legals_joined select imps.*, ads.campaignid JOIN ON(imps.adid, ads.adid) APPLY filter_legal BEFORE OUTPUT
  • 13. Query Language – Views Used for expressing – Union alls – APPLY operators Example CREATE VIEW actions SELECT photo_views.* UNION ALL SELECT video_views.* UNION ALL SELECT profile_views.* …
  • 14. Hive Usage in Facebook Applications: – Summarization Eg: Daily/Weekly aggregations of impression/click counts – Ad hoc Analysis Eg: how many group admins broken down by state/country – Data Mining (Assembling training data) Eg: User Engagement as a function of user attributes Usage statistics: – Total Users: ~40 (about 25% of engineering !) – Hive Data (compressed): 22 TB total, ~200GB incoming per day – Jobs over last 7 days: Total Jobs: 3514, Projections:821, Joins: 152, Aggregates: 800, Loaders: 600 * Aggregates biased because of multi-stage map-reduce
  • 15. Conclusion Release to Open Source in 3-4 months People: – Suresh Anthony (suresh@facebook.com) – Jeff Hammerbacher (jeffh@) – Joydeep Sarma (jssarma@) – Ashish Thusoo (athusoo@) – Pete Wyckoff (pwyckoff@)