SlideShare a Scribd company logo
1 of 16
Introduction to SQL on Hadoop
Samuel Yee
What is RDBMS?
 Stand for “Relational Database Management System”.
 The data in RDBMS is stored in database objects called tables.
 A table is a collection of related data entries and it consists of columns and rows.
 Each RDMS hosts one or more databases.
 Each database consists of one or more tables.
Sample Employee Table on RDBMS
Emp_no DOB First Name Last Name Gender Date Joined
499990 11/3/1963 Khaled Kohling M 10/10/1985
499991 2/26/1962 Pohua Sichman F 1/12/1989
499992 10/12/1960 Siamak Salverda F 5/10/1987
499993 6/4/1963 DeForest Mullainathan M 4/7/1997
499994 2/26/1952 Navin Argence F 4/24/1990
What is SQL?
 Stand for “Structured Query Language”
 SQL lets you access and manipulate databases.
 SQL is an ANSI (American National Standards Institute) standard.
 Major commands include SELECT, UPDATE, DELETE, INSERT, WHERE.
Demo on MySQL
 Querying a relational database with >300K employees hosted on AWS cloud (free-tier)
Data Warehouse
 OLAP database mainly used for analytical purposes, such as
analyzing historical trends and patterns, instead of daily operational
transactions.
 Import from various data sources, typically from different databases
& ERP systems
 The process of importing and manipulating transactional data into
the warehouse is referred as Extraction, Transformation and Loading
(ETL).
 Provide summarized and multi-dimensional views of consolidated
data i.e. Data Cube. Give contexts to various perspectives e.g. time
dimension show the breakdown of sales by year, quarter, month,
day and hour. Product dimension help to see which products bring
in the most revenue etc.
Problems of Relational Database &
Warehouse
 Must have schemas or planned data models i.e. strict data types, difficult to change etc.
 Not suited for unstructured data e.g. social media, story books, news, journals, photos etc.
 99% of real-world data is unstructured.
 Expensive, unadaptable, unable to scale big easily (often measured in Gigabytes or
Terabytes at best), and often require specialized hardware & licensed proprietary software
 Almost all Big Tech companies (Facebook, Google, Yahoo! etc) have long decided that
traditional RDBMS is bad for their data business models that change frequently and
measured in Petabytes, Zettabytes, Exabytes and beyond.
 However, many data analysts are familiar with traditional ETL and BI concepts but not with
Hadoop programming.
Hadoop + NoSQL
 Development of the Hadoop file system (HDFS) and associated NoSQL databases such as
Cassandra and HBase.
 NoSQL stands for “Not Only SQL”.
 Ability to store data in raw formats and decide what-to-do later i.e. Data Lakes
 NoSQL can be schema, schema-less, flexible and adaptable to changes.
 Well suited for both structured and unstructured big data.
 Ability to dynamically expand using cheap commodity hardware and free open-source
software.
 Cost of storing data in a Hadoop solution grows linearly with the volume of data and there is no ultimate
limit.
 Hadoop + NoSQL ecosystem bring back familiar data warehouse and BI concepts to Hadoop
DataModels
Column-Based Model
Unique Name Value
101 ProductName = "Book 101 Title"
ISBN = "111-1111111111"
Authors = [ "Author 1", "Author 2" ]
201 ProductName = "18-Bicycle 201"
Brand = "Brand-Company A"
Color = [ "Red", "Black" ]
ProductCategory = "Bike"
Examples: Cassandra, Hbase, Vertica
Key-Value Model
{
Id = 101
ProductName = "Book 101 Title"
ISBN = "111-1111111111"
Authors = [ "Author 1", "Author 2" ]
ProductCategory = "Book"
}
{
Id = 201
ProductName = "18-Bicycle 201"
Color = [ "Red", "Black" ]
ProductCategory = "Bike"
} Examples: Amazon DynamoDB, Hive, MemcacheDB,
Redis
Document-Based Model
Examples: Apache CouchDB, MongoDB
Graph-Based Model
Examples: Allegro, Neo4J,
InfiniteGraph, OrientDB, Virtuoso,
Stardog
Pig and Hive
 Not everyone can code in Java for Hadoop apps
 Introducing Pig Latin
 High level abstract of Java MapReduce programming
 Introducing Hive
 Early-day nosql data warehouse on Hadoop Filesystem
 Data-warehousing activities on Hadoop e.g. Extract, Transform and Load (ETL)
Demo
 Pig and Hive (ETL) demo on all Shakespeare's literatures (unstructured data)!
Many SQL Processing Engines and NoSQL
DBs on Hadoop Ecosystem
 Apache Impala by Cloudera
 Apache Drill by MapR
 HAWQ (HDFS) and GemFire (in-memory) by Pivotal
 Presto by Facebook
 Apache Spark SQL (Shark)– distributed by almost all Hadoop vendors
 There are much more…….
 Check out https://hadoopecosystemtable.github.io/

More Related Content

What's hot

Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresJiaheng Lu
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013Facundo Farias
 
Database awareness
Database awarenessDatabase awareness
Database awarenesskloia
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014Stratebi
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraCloudera, Inc.
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage SystemSchubert Zhang
 
Processing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approachesProcessing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approachesLeMeniz Infotech
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersEdureka!
 

What's hot (20)

DBPedia-past-present-future
DBPedia-past-present-futureDBPedia-past-present-future
DBPedia-past-present-future
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
Database awareness
Database awarenessDatabase awareness
Database awareness
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
Data and Database
Data and DatabaseData and Database
Data and Database
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
 
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.biz
 
MongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big DataMongoDB & Hadoop - Understanding Your Big Data
MongoDB & Hadoop - Understanding Your Big Data
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Hsqldb tutorial
Hsqldb tutorialHsqldb tutorial
Hsqldb tutorial
 
Hsqldb tutorial
Hsqldb tutorialHsqldb tutorial
Hsqldb tutorial
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
 
Processing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approachesProcessing cassandra datasets with hadoop streaming based approaches
Processing cassandra datasets with hadoop streaming based approaches
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
 
Big Data Analytics for Non-Programmers
Big Data Analytics for Non-ProgrammersBig Data Analytics for Non-Programmers
Big Data Analytics for Non-Programmers
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 

Viewers also liked

Hands-On Lab Data Transformation Services - SQL Server
Hands-On Lab Data Transformation Services - SQL ServerHands-On Lab Data Transformation Services - SQL Server
Hands-On Lab Data Transformation Services - SQL ServerSerra Laercio
 
Southeast Wisconsin award winning First Weber agents
Southeast Wisconsin award winning First Weber agentsSoutheast Wisconsin award winning First Weber agents
Southeast Wisconsin award winning First Weber agentsFirst Weber
 
Truyện ma yểu mệnh Full - Người khăn trắng
Truyện ma yểu mệnh Full - Người khăn trắngTruyện ma yểu mệnh Full - Người khăn trắng
Truyện ma yểu mệnh Full - Người khăn trắngChang HaNa
 
Cach xem tu vi tron doi theo gio sinh
Cach xem tu vi tron doi theo gio sinhCach xem tu vi tron doi theo gio sinh
Cach xem tu vi tron doi theo gio sinhChang HaNa
 
Transforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftTransforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftPerficient, Inc.
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
My world of paintings
My world of paintingsMy world of paintings
My world of paintingsPratim Das
 
20140626 jabatan fungsional umum 919 update24juni2014(1)
20140626 jabatan fungsional umum 919 update24juni2014(1)20140626 jabatan fungsional umum 919 update24juni2014(1)
20140626 jabatan fungsional umum 919 update24juni2014(1)Devid Firman
 
Introducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkIntroducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkSamuel Yee
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation Sally Sadosky
 
Learning Apache Spark by examples
Learning Apache Spark by examplesLearning Apache Spark by examples
Learning Apache Spark by examplesSamuel Yee
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Alex Gorbachev
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 

Viewers also liked (16)

Hands-On Lab Data Transformation Services - SQL Server
Hands-On Lab Data Transformation Services - SQL ServerHands-On Lab Data Transformation Services - SQL Server
Hands-On Lab Data Transformation Services - SQL Server
 
ssis lab
ssis labssis lab
ssis lab
 
Southeast Wisconsin award winning First Weber agents
Southeast Wisconsin award winning First Weber agentsSoutheast Wisconsin award winning First Weber agents
Southeast Wisconsin award winning First Weber agents
 
Truyện ma yểu mệnh Full - Người khăn trắng
Truyện ma yểu mệnh Full - Người khăn trắngTruyện ma yểu mệnh Full - Người khăn trắng
Truyện ma yểu mệnh Full - Người khăn trắng
 
Cach xem tu vi tron doi theo gio sinh
Cach xem tu vi tron doi theo gio sinhCach xem tu vi tron doi theo gio sinh
Cach xem tu vi tron doi theo gio sinh
 
Transforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and MicrosoftTransforming Business in a Digital Era with Big Data and Microsoft
Transforming Business in a Digital Era with Big Data and Microsoft
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
My world of paintings
My world of paintingsMy world of paintings
My world of paintings
 
32 vol64no1
32 vol64no132 vol64no1
32 vol64no1
 
20140626 jabatan fungsional umum 919 update24juni2014(1)
20140626 jabatan fungsional umum 919 update24juni2014(1)20140626 jabatan fungsional umum 919 update24juni2014(1)
20140626 jabatan fungsional umum 919 update24juni2014(1)
 
Introducing MapReduce Programming Framework
Introducing MapReduce Programming FrameworkIntroducing MapReduce Programming Framework
Introducing MapReduce Programming Framework
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
Learning Apache Spark by examples
Learning Apache Spark by examplesLearning Apache Spark by examples
Learning Apache Spark by examples
 
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
Bridging Oracle Database and Hadoop by Alex Gorbachev, Pythian from Oracle Op...
 
7 trends-for-big-data
7 trends-for-big-data7 trends-for-big-data
7 trends-for-big-data
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 

Similar to Introduction to Sql on Hadoop

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutesKaren Lopez
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseAnita Luthra
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraMongoDB
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)GeeksLab Odessa
 
DEE 431 Introduction to Mysql Slide 3
DEE 431 Introduction to Mysql Slide 3DEE 431 Introduction to Mysql Slide 3
DEE 431 Introduction to Mysql Slide 3YOGESH SINGH
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approachesLuxoft
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Anna Shymchenko
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouseStephen Alex
 

Similar to Introduction to Sql on Hadoop (20)

Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the EnterpriseSQL vs NoSQL: Big Data Adoption & Success in the Enterprise
SQL vs NoSQL: Big Data Adoption & Success in the Enterprise
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
NoSQL
NoSQLNoSQL
NoSQL
 
NoSQL Basics - a quick tour
NoSQL Basics - a quick tourNoSQL Basics - a quick tour
NoSQL Basics - a quick tour
 
DEE 431 Introduction to Mysql Slide 3
DEE 431 Introduction to Mysql Slide 3DEE 431 Introduction to Mysql Slide 3
DEE 431 Introduction to Mysql Slide 3
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
DWH & big data architecture approaches
DWH & big data architecture approachesDWH & big data architecture approaches
DWH & big data architecture approaches
 
Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»Владимир Слободянюк «DWH & BigData – architecture approaches»
Владимир Слободянюк «DWH & BigData – architecture approaches»
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
Modern data warehouse
Modern data warehouseModern data warehouse
Modern data warehouse
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Introduction to Sql on Hadoop

  • 1. Introduction to SQL on Hadoop Samuel Yee
  • 2. What is RDBMS?  Stand for “Relational Database Management System”.  The data in RDBMS is stored in database objects called tables.  A table is a collection of related data entries and it consists of columns and rows.  Each RDMS hosts one or more databases.  Each database consists of one or more tables.
  • 3. Sample Employee Table on RDBMS Emp_no DOB First Name Last Name Gender Date Joined 499990 11/3/1963 Khaled Kohling M 10/10/1985 499991 2/26/1962 Pohua Sichman F 1/12/1989 499992 10/12/1960 Siamak Salverda F 5/10/1987 499993 6/4/1963 DeForest Mullainathan M 4/7/1997 499994 2/26/1952 Navin Argence F 4/24/1990
  • 4. What is SQL?  Stand for “Structured Query Language”  SQL lets you access and manipulate databases.  SQL is an ANSI (American National Standards Institute) standard.  Major commands include SELECT, UPDATE, DELETE, INSERT, WHERE.
  • 5. Demo on MySQL  Querying a relational database with >300K employees hosted on AWS cloud (free-tier)
  • 6. Data Warehouse  OLAP database mainly used for analytical purposes, such as analyzing historical trends and patterns, instead of daily operational transactions.  Import from various data sources, typically from different databases & ERP systems  The process of importing and manipulating transactional data into the warehouse is referred as Extraction, Transformation and Loading (ETL).  Provide summarized and multi-dimensional views of consolidated data i.e. Data Cube. Give contexts to various perspectives e.g. time dimension show the breakdown of sales by year, quarter, month, day and hour. Product dimension help to see which products bring in the most revenue etc.
  • 7. Problems of Relational Database & Warehouse  Must have schemas or planned data models i.e. strict data types, difficult to change etc.  Not suited for unstructured data e.g. social media, story books, news, journals, photos etc.  99% of real-world data is unstructured.  Expensive, unadaptable, unable to scale big easily (often measured in Gigabytes or Terabytes at best), and often require specialized hardware & licensed proprietary software  Almost all Big Tech companies (Facebook, Google, Yahoo! etc) have long decided that traditional RDBMS is bad for their data business models that change frequently and measured in Petabytes, Zettabytes, Exabytes and beyond.  However, many data analysts are familiar with traditional ETL and BI concepts but not with Hadoop programming.
  • 8. Hadoop + NoSQL  Development of the Hadoop file system (HDFS) and associated NoSQL databases such as Cassandra and HBase.  NoSQL stands for “Not Only SQL”.  Ability to store data in raw formats and decide what-to-do later i.e. Data Lakes  NoSQL can be schema, schema-less, flexible and adaptable to changes.  Well suited for both structured and unstructured big data.  Ability to dynamically expand using cheap commodity hardware and free open-source software.  Cost of storing data in a Hadoop solution grows linearly with the volume of data and there is no ultimate limit.  Hadoop + NoSQL ecosystem bring back familiar data warehouse and BI concepts to Hadoop
  • 10. Column-Based Model Unique Name Value 101 ProductName = "Book 101 Title" ISBN = "111-1111111111" Authors = [ "Author 1", "Author 2" ] 201 ProductName = "18-Bicycle 201" Brand = "Brand-Company A" Color = [ "Red", "Black" ] ProductCategory = "Bike" Examples: Cassandra, Hbase, Vertica
  • 11. Key-Value Model { Id = 101 ProductName = "Book 101 Title" ISBN = "111-1111111111" Authors = [ "Author 1", "Author 2" ] ProductCategory = "Book" } { Id = 201 ProductName = "18-Bicycle 201" Color = [ "Red", "Black" ] ProductCategory = "Bike" } Examples: Amazon DynamoDB, Hive, MemcacheDB, Redis
  • 13. Graph-Based Model Examples: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog
  • 14. Pig and Hive  Not everyone can code in Java for Hadoop apps  Introducing Pig Latin  High level abstract of Java MapReduce programming  Introducing Hive  Early-day nosql data warehouse on Hadoop Filesystem  Data-warehousing activities on Hadoop e.g. Extract, Transform and Load (ETL)
  • 15. Demo  Pig and Hive (ETL) demo on all Shakespeare's literatures (unstructured data)!
  • 16. Many SQL Processing Engines and NoSQL DBs on Hadoop Ecosystem  Apache Impala by Cloudera  Apache Drill by MapR  HAWQ (HDFS) and GemFire (in-memory) by Pivotal  Presto by Facebook  Apache Spark SQL (Shark)– distributed by almost all Hadoop vendors  There are much more…….  Check out https://hadoopecosystemtable.github.io/