On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)

On the move with Big Data
Hadoop, Pig, Sqoop, SSIS…

Stéphane Fréchette
Thursday February 13, 2014

Who am I?
My name is Stéphane Fréchette

SQL Server MVP - I’m a Database & Business Intelligence Professional and Founder | CEO
of
I have a passion for architecting, designing and building solutions that matter.
Self proclaimed Open Data Hacker/Advocate I founded Gatineau Ouverte a citizen led
initiative which aims to promote open access to civic data of the city of Gatineau.

Twitter: @sfrechette
Blog: stephanefrechette.com
Email: stephanefrechette@ukubu.com

Session Outline
• What is Big Data?
• Apache Hadoop
• Hadoop Ecosystem
• Windows Azure HDInsight
• On the move…
• SSIS, Sqoop, Pig

• Demos
• Resources

Apache Hadoop
• Open-source software framework that allows for the distributed processing
of large data sets across clusters of computers using simple programming
models
• Designed to scale up from single servers to thousands of machines, each
offering local computation and storage

Hadoop Ecosystem
• Core components;
• HDFS (Hadoop Distributed File System) -> Storage
• MapReduce -> Processing

What is Pig?
• Write complex MapReduce jobs using a simple script language (Pig Latin)

• A platform for analyzing large data sets that consists of high-level language
for expressing data analysis programs
• Pig translates and compiles complex MapReduce jobs on the fly

http://pig.apache.org

What is Sqoop?
• Command-line interface application to transfer bulk data between Hadoop
and relational datastores

http://sqoop.apache.org

What is Hive?
• A data warehouse infrastructure built on top of Hadoop for providing data
summarization, query, and analysis
• Provides an SQL-Like language called HiveQL to query data
• Integration between Hadoop and BI and visualization tools

http://hive.apache.org

What is SSIS?
• SQL Server Integration Services is a platform for data integration and
workflow applications. A fast and flexible tool used for data extraction,
transformation, and loading (ETL).
• Contains rich set of built-in tasks and transformations; tools for constructing
packages…
• Used to solve complex business problems

Windows Azure HDInsight
• HDInsight is a Hadoop-based service from Microsoft that brings a 100
percent Apache Hadoop solution to the cloud
• Based on the Hortonworks Data Platform
• Scalable, on-demand service

Demos
(let’s move some data…)

Resources
•
•
•
•
•
•
•
•
•

Apache Projects (list with links) http://bit.ly/MfpLtE
Windows Azure HDInsight http://bit.ly/1dnlAX1
HDInsight Tutorials and Guide http://bit.ly/LWRYol
Hortonworks Sandbox 2.0 http://bit.ly/1gkkCte
Hortonworks Tutorial Gallery http://bit.ly/1nvMAEX
Microsoft JDBC Driver 4.0 for SQL Server http://bit.ly/1kEgJ7O
Microsoft Hive ODBC Driver http://bit.ly/NFkhcH
GitHub: WindowsAzure / azure-content http://bit.ly/1hfthlF
SSIS Custom Task – Disorderly Data (Ken Ross) http://bit.ly/1nvIH2G
• GitHub https://github.com/kzhen/SSISHDFS

Thank You
For attending this session

On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)

Similar to On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...) (20)

More from Stéphane Fréchette

More from Stéphane Fréchette (15)

Recently uploaded

Recently uploaded (20)

On the move with Big Data (Hadoop, Pig, Sqoop, SSIS...)