SlideShare a Scribd company logo
“Big Data” for the SQL
professional
Stefan Bauer
stef-bauer.com/2012/12/10/you-need-a-zetta-what
A little about me…
 Data Warehouse Administrator
 Author
 Architect (logical/physical)
 DBA (monitoring, space management, etc)
 SSIS Developer (build it… run it… support it)
 SSAS/SSRS (performance tuning, supporting)
 Performance monitoring (is it all working?)
 I am a geek (Some people have pointed that out about me…
judge for yourself)
What we will cover
 Why do you care (or at least why you should)?
 General overview
 Basic terms (get us on the same page)
 A Look at some of the technology (aka demo)
 Elastic Map Reduce (EMR) jobflow using a hiveql
script
 Redshift – Starting a cluster
 All of the technical parts are in a multi-part
series on my Blog
What kind of blocks do you sort
through?
Interesting technology…
might not be for you
Getting there… might
be something
interesting to start
working out the
details…
You have big data…
and you know it!
What is that Hadoop thing I
keep hearing about?
 A Framework (collection of technologies)
 Complex processing
 Massively parallel
 Large amounts of data
 Commodity hardware
Hadoop … what is it not
 Ad hoc analytics
 Low latency between data arrival,
analysis, and query usage
 “fast” (speed is a relative thing)
 Facebook has interactive queries on Hadoop
framework
 Good for small data
Terms
 Cloud
 Cluster
 Hadoop
 Hadoop Distributed File System (HDFS)
 Hue (Web Interface for Mapreduce/Oozie)
 Mapreduce
 Job Tracker
 Task Trackers (on Data Nodes)
 Oozie (Workflow Management)
Terms
 Pig (Distributed Transformation Scripting)
 Beeswax (Wrapper for Hive)
 Hive
 EDW on (10‟s, 100‟s, 1000‟s servers)
 HiveQL (Based on Ansi SQL)
 Reporting Tools/Business Analytics
 Name Node
 Data Nodes
 Zookeeper (Distributed Configuration Management)
 Cloudera/MapR/Amazon/Hortonworks …
HDFS
Cloudera
Hive
Hiveql
CREATE TABLE output_tbl (type string, cnt int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n'
STORED AS TEXTFILE
LOCATION '${OUTPUT}' ;
INSERT OVERWRITE TABLE output_tbl
SELECT type_in, count(*) as cnt
FROM log_table
GROUP BY type_in;
add jar s3://testing-royall-com/hive/libs/json-serde-1.1.6.jar;
CREATE external TABLE log_table (
message_in string,
level_in int,
ip_in string,
type_in string,
timestamp_in string,
id_in string,
pid_in string,
src_in struct<classname:string, linenumber:int>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "mapping.type_in" = "@type",
"mapping.message_in" = "__message",
"mapping.level_in" = "__level",
"mapping.ip_in" = "__ip",
"mapping.src_in" = "__src",
"mapping.timestamp_in" = "@timestamp",
"mapping.id_in" = "__id",
"mapping.pid_in" = "__pid",
"ignore.malformed.json" = "true")
LOCATION '${INPUT}';
Hiveql
ADD JAR s3://elasticmapreduce/training/lib/hive-contrib-0.8.0.jar ;
CREATE EXTERNAL TABLE wikipedia (
edittime string,
contributor string
)
ROW FORMAT
SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH serdeproperties (
"input.regex" =
".*<revision>.*<timestamp>(.+)</timestamp>.*<contributor>.*<username
>(.*)</username>.*</contributor>.*</revision>.*",
"output.format.string" = "%1$s %2$s"
)
LOCATION '${INPUT}' ;
Hiveql
 Demo – Create/Run EMR
 Demo – Create Redshift cluster
CREATE TABLE big_contributors (contributor string, numedits int)
ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LINES TERMINATED
BY 'n'
STORED AS TEXTFILE
LOCATION '${OUTPUT}' ;
INSERT OVERWRITE TABLE big_contributors
SELECT contributor, COUNT(*) AS numedits
FROM wikipedia
GROUP BY contributor
SORT BY numedits DESC
LIMIT 20 ;
Redshift
What is a column store anyway?
Compression
8k / 64K / 1Mb
Copy Data
 From S3… (or DynamoDB)
copy <table name> from 's3://<s3 file>‟ credentials
'aws_access_key_id=<yourkey>;aws_secret_access_key=
<yourkey>‟ CSV delimted by „|‟;
Check back on the demos…
Questions?
@stefbauer
Stef_Bauer@hotmail.com
Stef-Bauer.com
http://spkr8.com/t/25821

More Related Content

Viewers also liked

Hadoop intro
Hadoop introHadoop intro
Hadoop intro
Stefan Bauer
 
Presentazione dei dati di TagEmiliaRomagna
Presentazione dei dati di TagEmiliaRomagnaPresentazione dei dati di TagEmiliaRomagna
Presentazione dei dati di TagEmiliaRomagna
tagbologna lab
 
Building a state omk program 2011
Building a state omk program 2011Building a state omk program 2011
Building a state omk program 2011
Georgene Bender
 
Trypes
TrypesTrypes
Trypesbymafe
 
Mathematics
MathematicsMathematics
Mathematicsbymafe
 
Golden age
Golden ageGolden age
Golden age
gueste9646c
 
Ala too 2_fevral_2010
Ala too 2_fevral_2010Ala too 2_fevral_2010
Ala too 2_fevral_2010Inash Azim
 
How Can A Parent Find Peace Of Mind
How Can A Parent Find Peace Of MindHow Can A Parent Find Peace Of Mind
How Can A Parent Find Peace Of Mind
tatianasimpson
 
Paj 5103 clinical neuropahtophys ii hn10
Paj 5103 clinical neuropahtophys ii hn10Paj 5103 clinical neuropahtophys ii hn10
Paj 5103 clinical neuropahtophys ii hn10
Ng, HoiKee
 
LISA: Library Instruction Software for Assessment
LISA: Library Instruction Software for AssessmentLISA: Library Instruction Software for Assessment
LISA: Library Instruction Software for Assessment
annielibrarian
 
Congreso Bio Calidad
Congreso Bio CalidadCongreso Bio Calidad
Congreso Bio Calidad
andysign
 
Ian downey elevationburger
Ian downey elevationburgerIan downey elevationburger
Ian downey elevationburgerguesta22580
 
Vita da Labbers: le relazioni online e offline di #TagboLab
Vita da Labbers: le relazioni online e offline di #TagboLabVita da Labbers: le relazioni online e offline di #TagboLab
Vita da Labbers: le relazioni online e offline di #TagboLab
tagbologna lab
 
RE Non statutory guidance under fives
RE Non statutory guidance under fivesRE Non statutory guidance under fives
RE Non statutory guidance under fives
Katherine Lyddon
 
Devon County Show 2012
Devon County Show 2012Devon County Show 2012
Devon County Show 2012
Katherine Lyddon
 
Meditare
MeditareMeditare
Meditarebymafe
 
ICEA - Eco bio turismo
ICEA - Eco bio turismoICEA - Eco bio turismo
ICEA - Eco bio turismo
tagbologna lab
 
Exposicion erika s.ox
Exposicion erika s.oxExposicion erika s.ox
Exposicion erika s.ox
jailander2
 
Women 01
Women 01Women 01
Women 01
bymafe
 
2010年Q1 携帯電話事業者 大手三社の決算サマリー
2010年Q1 携帯電話事業者 大手三社の決算サマリー2010年Q1 携帯電話事業者 大手三社の決算サマリー
2010年Q1 携帯電話事業者 大手三社の決算サマリー
Takashi Ohmoto
 

Viewers also liked (20)

Hadoop intro
Hadoop introHadoop intro
Hadoop intro
 
Presentazione dei dati di TagEmiliaRomagna
Presentazione dei dati di TagEmiliaRomagnaPresentazione dei dati di TagEmiliaRomagna
Presentazione dei dati di TagEmiliaRomagna
 
Building a state omk program 2011
Building a state omk program 2011Building a state omk program 2011
Building a state omk program 2011
 
Trypes
TrypesTrypes
Trypes
 
Mathematics
MathematicsMathematics
Mathematics
 
Golden age
Golden ageGolden age
Golden age
 
Ala too 2_fevral_2010
Ala too 2_fevral_2010Ala too 2_fevral_2010
Ala too 2_fevral_2010
 
How Can A Parent Find Peace Of Mind
How Can A Parent Find Peace Of MindHow Can A Parent Find Peace Of Mind
How Can A Parent Find Peace Of Mind
 
Paj 5103 clinical neuropahtophys ii hn10
Paj 5103 clinical neuropahtophys ii hn10Paj 5103 clinical neuropahtophys ii hn10
Paj 5103 clinical neuropahtophys ii hn10
 
LISA: Library Instruction Software for Assessment
LISA: Library Instruction Software for AssessmentLISA: Library Instruction Software for Assessment
LISA: Library Instruction Software for Assessment
 
Congreso Bio Calidad
Congreso Bio CalidadCongreso Bio Calidad
Congreso Bio Calidad
 
Ian downey elevationburger
Ian downey elevationburgerIan downey elevationburger
Ian downey elevationburger
 
Vita da Labbers: le relazioni online e offline di #TagboLab
Vita da Labbers: le relazioni online e offline di #TagboLabVita da Labbers: le relazioni online e offline di #TagboLab
Vita da Labbers: le relazioni online e offline di #TagboLab
 
RE Non statutory guidance under fives
RE Non statutory guidance under fivesRE Non statutory guidance under fives
RE Non statutory guidance under fives
 
Devon County Show 2012
Devon County Show 2012Devon County Show 2012
Devon County Show 2012
 
Meditare
MeditareMeditare
Meditare
 
ICEA - Eco bio turismo
ICEA - Eco bio turismoICEA - Eco bio turismo
ICEA - Eco bio turismo
 
Exposicion erika s.ox
Exposicion erika s.oxExposicion erika s.ox
Exposicion erika s.ox
 
Women 01
Women 01Women 01
Women 01
 
2010年Q1 携帯電話事業者 大手三社の決算サマリー
2010年Q1 携帯電話事業者 大手三社の決算サマリー2010年Q1 携帯電話事業者 大手三社の決算サマリー
2010年Q1 携帯電話事業者 大手三社の決算サマリー
 

Similar to Sql user group

Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
Cloudera, Inc.
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
Sudar Muthu
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Yahoo Developer Network
 
Hadoop
HadoopHadoop
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
Databricks
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
Eric Nelson
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
ukdpe
 
SQLSat 245 - Por Onde Começar no BigData
SQLSat 245 - Por Onde Começar no BigDataSQLSat 245 - Por Onde Começar no BigData
SQLSat 245 - Por Onde Começar no BigData
Diego Nogare
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
Jim Mlodgenski
 
מיכאל
מיכאלמיכאל
מיכאל
sqlserver.co.il
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
appaji intelhunt
 
Elephant in the room: A DBA's Guide to Hadoop
Elephant in the room: A DBA's Guide to HadoopElephant in the room: A DBA's Guide to Hadoop
Elephant in the room: A DBA's Guide to Hadoop
Stuart Ainsworth
 
Making your RDBMS fast!
Making your RDBMS fast! Making your RDBMS fast!
Making your RDBMS fast!
VictorSzoltysek
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
Edward D. Kim
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
Serkan Özal
 
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
nadine39280
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
Radu Tudoran
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
Long Dao
 
vFabric SQLFire for high performance data
vFabric SQLFire for high performance datavFabric SQLFire for high performance data
vFabric SQLFire for high performance data
VMware vFabric
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
Thanigai Vellore
 

Similar to Sql user group (20)

Sf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBaseSf NoSQL MeetUp: Apache Hadoop and HBase
Sf NoSQL MeetUp: Apache Hadoop and HBase
 
Hands on Hadoop and pig
Hands on Hadoop and pigHands on Hadoop and pig
Hands on Hadoop and pig
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
Hadoop
HadoopHadoop
Hadoop
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
SQL Server 2008 Overview
SQL Server 2008 OverviewSQL Server 2008 Overview
SQL Server 2008 Overview
 
What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?What's New for Developers in SQL Server 2008?
What's New for Developers in SQL Server 2008?
 
SQLSat 245 - Por Onde Começar no BigData
SQLSat 245 - Por Onde Começar no BigDataSQLSat 245 - Por Onde Começar no BigData
SQLSat 245 - Por Onde Começar no BigData
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
מיכאל
מיכאלמיכאל
מיכאל
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Elephant in the room: A DBA's Guide to Hadoop
Elephant in the room: A DBA's Guide to HadoopElephant in the room: A DBA's Guide to Hadoop
Elephant in the room: A DBA's Guide to Hadoop
 
Making your RDBMS fast!
Making your RDBMS fast! Making your RDBMS fast!
Making your RDBMS fast!
 
Hypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.comHypertable Distilled by edydkim.github.com
Hypertable Distilled by edydkim.github.com
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
 
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Hadoop & Zing
Hadoop & ZingHadoop & Zing
Hadoop & Zing
 
vFabric SQLFire for high performance data
vFabric SQLFire for high performance datavFabric SQLFire for high performance data
vFabric SQLFire for high performance data
 
Leveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot ArchitecturesLeveraging Hadoop in Polyglot Architectures
Leveraging Hadoop in Polyglot Architectures
 

Recently uploaded

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 

Recently uploaded (20)

20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 

Sql user group

  • 1.
  • 2. “Big Data” for the SQL professional Stefan Bauer
  • 4.
  • 5. A little about me…  Data Warehouse Administrator  Author  Architect (logical/physical)  DBA (monitoring, space management, etc)  SSIS Developer (build it… run it… support it)  SSAS/SSRS (performance tuning, supporting)  Performance monitoring (is it all working?)  I am a geek (Some people have pointed that out about me… judge for yourself)
  • 6. What we will cover  Why do you care (or at least why you should)?  General overview  Basic terms (get us on the same page)  A Look at some of the technology (aka demo)  Elastic Map Reduce (EMR) jobflow using a hiveql script  Redshift – Starting a cluster  All of the technical parts are in a multi-part series on my Blog
  • 7. What kind of blocks do you sort through? Interesting technology… might not be for you Getting there… might be something interesting to start working out the details… You have big data… and you know it!
  • 8. What is that Hadoop thing I keep hearing about?  A Framework (collection of technologies)  Complex processing  Massively parallel  Large amounts of data  Commodity hardware
  • 9. Hadoop … what is it not  Ad hoc analytics  Low latency between data arrival, analysis, and query usage  “fast” (speed is a relative thing)  Facebook has interactive queries on Hadoop framework  Good for small data
  • 10. Terms  Cloud  Cluster  Hadoop  Hadoop Distributed File System (HDFS)  Hue (Web Interface for Mapreduce/Oozie)  Mapreduce  Job Tracker  Task Trackers (on Data Nodes)  Oozie (Workflow Management)
  • 11. Terms  Pig (Distributed Transformation Scripting)  Beeswax (Wrapper for Hive)  Hive  EDW on (10‟s, 100‟s, 1000‟s servers)  HiveQL (Based on Ansi SQL)  Reporting Tools/Business Analytics  Name Node  Data Nodes  Zookeeper (Distributed Configuration Management)  Cloudera/MapR/Amazon/Hortonworks …
  • 12. HDFS
  • 13.
  • 15.
  • 16.
  • 17. Hive
  • 18. Hiveql CREATE TABLE output_tbl (type string, cnt int) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n' STORED AS TEXTFILE LOCATION '${OUTPUT}' ; INSERT OVERWRITE TABLE output_tbl SELECT type_in, count(*) as cnt FROM log_table GROUP BY type_in; add jar s3://testing-royall-com/hive/libs/json-serde-1.1.6.jar; CREATE external TABLE log_table ( message_in string, level_in int, ip_in string, type_in string, timestamp_in string, id_in string, pid_in string, src_in struct<classname:string, linenumber:int> ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( "mapping.type_in" = "@type", "mapping.message_in" = "__message", "mapping.level_in" = "__level", "mapping.ip_in" = "__ip", "mapping.src_in" = "__src", "mapping.timestamp_in" = "@timestamp", "mapping.id_in" = "__id", "mapping.pid_in" = "__pid", "ignore.malformed.json" = "true") LOCATION '${INPUT}';
  • 19. Hiveql ADD JAR s3://elasticmapreduce/training/lib/hive-contrib-0.8.0.jar ; CREATE EXTERNAL TABLE wikipedia ( edittime string, contributor string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' WITH serdeproperties ( "input.regex" = ".*<revision>.*<timestamp>(.+)</timestamp>.*<contributor>.*<username >(.*)</username>.*</contributor>.*</revision>.*", "output.format.string" = "%1$s %2$s" ) LOCATION '${INPUT}' ;
  • 20. Hiveql  Demo – Create/Run EMR  Demo – Create Redshift cluster CREATE TABLE big_contributors (contributor string, numedits int) ROW FORMAT DELIMITED FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n' STORED AS TEXTFILE LOCATION '${OUTPUT}' ; INSERT OVERWRITE TABLE big_contributors SELECT contributor, COUNT(*) AS numedits FROM wikipedia GROUP BY contributor SORT BY numedits DESC LIMIT 20 ;
  • 21. Redshift What is a column store anyway?
  • 23. Copy Data  From S3… (or DynamoDB) copy <table name> from 's3://<s3 file>‟ credentials 'aws_access_key_id=<yourkey>;aws_secret_access_key= <yourkey>‟ CSV delimted by „|‟;
  • 24. Check back on the demos…
  • 25.