SlideShare a Scribd company logo
Hadoop is Happening
May 1, 2014
Syncsort Confidential and Proprietary - do not copy or distribute
Agenda
Hadoop Evolution
Use Cases
The Hadoop Ecosystem, from open source to vendor solutions
Tooling, implementation and skillset challenges
Real-World Case Studies
Future of Hadoop
Q&A
2
Syncsort Confidential and Proprietary - do not copy or distribute
Our Guest – Chida from OpenOsmium
20+ years of Enterprise Application Development Experience Focused on Big
Data & Cloud
Founder of Big Data Solution Provider – OpenOsmium
DC Tech Community Organizer of Meetups
– Google Developer Group, Tech Breakfast, NoVA Hadoop User Group
Open Source, Big Data and Cloud Advocate
703-568-7426, chida@openosmium.com
3
Syncsort Confidential and Proprietary - do not copy or distribute
EVOLUTION OF HADOOP
4
Syncsort Confidential and Proprietary - do not copy or distribute
Evolution of Hadoop – Data Volumes are Growing
5
Syncsort Confidential and Proprietary - do not copy or distribute
Evolution of Hadoop – Key Events
6
Next?2000 2004
Search Engine Problem
@ Google
3 White Papers: GFS,
MapReduce, BigTable
MapReduce: Simplified Data
Processing on Large Clusters
Yahoo!
HDFS, MapReduce,
Hbase
2008 2010 2012 2013
MapR
Hortonworks
HHadoop 2.0
Cloudera
Syncsort Confidential and Proprietary - do not copy or distribute
Why Hadoop As a Data Management Platform?
The Reliability of a Mainframe, The
Massive Performance at Scale of an
MPP appliance, The Storage
Capacity of a SAN, All at a
Disruptively Low Price Point
7
Syncsort Confidential and Proprietary - do not copy or distribute
The Economics of Data
8
Cost of managing 1TB of data
Mainframe EDW Hadoop
$20,000 – $100,000 $15,000 – $80,000 $250 – $2,000
Scalability
Performance
Reliability
Agility
Skills Supply
But there’s more…
Syncsort Confidential and Proprietary - do not copy or distribute
Hadoop - The Big Picture
9
Unified computation
provided by
MapReduce
distributed computing
framework
Unified storage
provided by
distributed file
system called HDFS
Commodity
Hardware
Hardware contains
bunch of disks and
cores
Physical
Logical
Storage
Computation
Syncsort Confidential and Proprietary - do not copy or distribute
MapReduce – Football Stadium Analogy
10
Syncsort Confidential and Proprietary - do not copy or distribute
Yesterday’s Architecture
11
Syncsort Confidential and Proprietary - do not copy or distribute
Tomorrow’s Data Architecture
12
Syncsort Confidential and Proprietary - do not copy or distribute
HADOOP USE CASES
13
Syncsort Confidential and Proprietary - do not copy or distribute
Hadoop Use Cases
14
Data Lake
Offload Mainframe Data
& Batch Workloads
Machine Data
Cyber Security
Fraud Detection
Offload ELT from Data WarehouseClickstream / Weblogs, EMR
Social Media Data
Geo Spatial Analyzing
Video and Audio Analytics
Real-Time Processing
Predictive Analytics
Unstructured Data
Active Archive
Multi-media
Leverage “Dark Data”
Sentiment Analysis
Enterprise Data Hub
Syncsort Confidential and Proprietary - do not copy or distribute
Hadoop Use Cases
A Roadmap for Hadoop Success
– Offload batch & ELT workloads from
data warehouse and mainframe
systems into Hadoop
– Develop and active archive, shed
light on dark data
– Build your Enterprise Data Hub
(Data Lake!)
– Leverage new data sources
– Extend BI with data discovery &
exploration
– Deliver next-generation analytics
15
Syncsort Confidential and Proprietary - do not copy or distribute
Sample Use Case: Offload
Phase III:
Optimize & Secure
Phase II:
Offload
Phase I:
Identify
• Identify data & workloads most
suitable for offload
• Focus on those that will deliver
maximum savings &
performance
• Access and move virtually any
data to Hadoop with one tool
• Easily replicate existing
workloads in Hadoop using a
graphical user interface
• Deploy and optimize the
new environment
• Manage & secure all your
data with business class
tools
16
Syncsort Confidential and Proprietary - do not copy or distribute
Phase 2: Deliver ‘Next-generation’ Applications
Advanced – ‘Next-gen’ – Applications for Hadoop
– Semi-structured data analytics
• Clickstream/Weblog, Electronic Medical Records
– Unstructured data analytics
• video, audio, documents, text, social
• Predictive modeling
– Geospatial analysis
– Real-Time Processing
17
Syncsort Confidential and Proprietary - do not copy or distribute
Use Cases Across Industries
Vertical Refine Explore Enrich
Retail & Web
• Log Analysis/Site
Optimization
• Loyalty Program
Optimization
• Brand and Sentiment Analysis
• Market basket analysis
• Dynamic Pricing
• Session & Content
Optimization
• Product recommendation
Telco • Customer profiling • Equipment failure prediction • Location based advertising
Government • Threat Identification • Person of Interest Discovery • Mission work
Finance
• Risk Modeling & Fraud
Identification
• Trade Performance Analytics
• Surveillance and Fraud
Detection
• Customer Risk Analysis
• Real-time upsell, cross sales
marketing offers
Energy
• Smart Grid: Production
Optimization
• Grid Failure Prevention
• Smart Meters
• Individual Power Grid
Manufacturing • Supply Chain Optimization • Customer Churn Analysis
• Dynamic Delivery
• Replacement parts
Healthcare
• Electronic Medical Records
(EMPI)
• Clinical decision support
• Clinical Trials Analysis
• Insurance Premium
Determination
18
Syncsort Confidential and Proprietary - do not copy or distribute
IMPLEMENTATION & SKILLSET
CHALLENGES
19
Syncsort Confidential and Proprietary - do not copy or distribute
Overview of Hadoop Challenges
Hardware??
Skills??
Training??
Rapid change of Hadoop
Ecosystem?
20
Syncsort Confidential and Proprietary - do not copy or distribute
Example 1 - ETL in Hadoop
21
COLLECT PROCESS DISTRIBUTE
Sort
JoinAggregate Copy
Merge
•FS Shell Put
Command•Flume
•Sqoop
HARD
•Pig •HiveQL•Java
HARDER
•Sqoop •FS Shell Get
Command
HARD
Syncsort Confidential and Proprietary - do not copy or distribute 22
Images: http://monkeestv.tripod.com/BatMonkee/
Perception: Just Call the Mainframe Guy…
Example 2 – Mainframe Data Ingestion
Syncsort Confidential and Proprietary - do not copy or distribute
Reality
Example 2 – Mainframe Data Ingestion
23
Every Change = Time, Cost
SMS
Compression
DB Tables,
Flat Files
Filtering ,
Reformatting
Copy, Sort,
Join,
Aggregation
EBCDIC to
ASCII
Cobol
copybooks
Call MF GuySMS
Compression
DB Tables,
Flat Files
Filtering ,
Reformatting
Copy, Sort,
Join,
Aggregation
EBCDIC to
ASCII
Cobol
copybooks
Call MF GuySMS
Compression
DB Tables,
Flat Files
Filtering ,
Reformatting
Copy, Sort,
Join,
Aggregation
EBCDIC to
ASCII
Cobol
copybooks
Image: bottletales.com
Syncsort Confidential and Proprietary - do not copy or distribute
Big Data Team
24
Senior Linux/Unix Admin Hadoop Administrators
Infrastructure Engineers
Java Developers  Hadoop Developers
Object Oriented Developers  Hadoop Developers
Data Analysts
Functional Users  Hadoop Analytics Users
Project Managers!
Chief Data Officer
Executive Management
Syncsort Confidential and Proprietary - do not copy or distribute
Enterprise Adoption Approach
Agile
Ideal Use Case for the company
Proof-of-concept or Pilot
Tech Heavy
Aware of Available Options – Many..
Work with Solution Architects
Infrastructure Analysis
Security Options
Testing.. Testing..
Integrating with current Stack
Cost.. Cost..
Promises Vs Reality
25
Syncsort Confidential and Proprietary - do not copy or distribute
THE HADOOP ECOSYSTEMS –
FROM OPEN SOURCE TO VENDOR TOOLS
26
Syncsort Confidential and Proprietary - do not copy or distribute
Hadoop Distributions
27
Syncsort Confidential and Proprietary - do not copy or distribute 28
Vendor Landscape
Distributions / Platforms
Data Integration/ETL
Search
Document Store
Database / Data Warehouse
Social Operational
XML Database
Graphs
Syncsort Confidential and Proprietary - do not copy or distribute
REAL-WORLD CASE STUDIES
29
Syncsort Confidential and Proprietary - do not copy or distribute
Understanding Mainframe Data at Major US Bank
30
Customer hit a wall after months of manual
effort migrating Mainframe data
• Difficult to find data errors. No Mainframe
application logic that matches Copybook
• Large and complex Copybooks
• Depends on Mainframe team to provide data
• Very manual-intensive ; inadequate
documentation
• Not scalable. Only a few Java + Mainframe
experts could do the work
• Easy to validate Copybooks and find data errors
• Ability to pull data directly from Mainframe
without relying on Mainframe team
• No coding. No scripting. Easier to document,
maintain & reuse
• Enables developers with a broader set of skills
to build complex migration jobs.
+( )
86-page copybook
?Weeks 4 hrs
Before: Manual Effort After: DMX-h + CDH
86-page copybook
30
Syncsort Confidential and Proprietary - do not copy or distribute
Social Security Administration
The Challenge:
– The SSA has an expensive problem with fraudulent claims for benefits,
and they need more and better data to prevent and punish that fraud.
The Office of the Inspector General for the SSA reports that:
– “Nationally, in Fiscal Year 2011, there were more than 103,000
allegations of Social Security fraud, with more than 7,000 criminal
investigations resulting in 1,374 convictions and more than $410 million
in recoveries, fines, restitution, judgments, settlements, and savings.”
Why Hadoop?
– Data Processing Time – 30 hrs on the MF and PoC cluster completed in
2 hrs
– Accuracy – Obituary data is likely more accurate over social media than
current death file
31
Syncsort Confidential and Proprietary - do not copy or distribute
Optimizing the EDW at Large Teradata Customer
32
• Offload ELT processing from Teradata into
CDH using DMX-h
• Implement flexible architecture for staging
and change data capture
• Ability to pull data directly from Mainframe
• No coding. Easier to maintain & reuse
• Enable developers with a broader set of skills
to build complex ETL workflows0
100
200
300
400
ElapsedTime(m)
HiveQL
360 min
DMX-h
15 min
0 4 8 12 16
Development Effort (Weeks)
DMX-h 4 Man weeks
HiveQL 12 Man weeks
Impact on Loans Application Project:
 Cut development time by 1/3
 Reduced complexity. From 140 HiveQL scripts to
12 DMX-h graphical jobs
 Eliminated need for Java user defined functions
 24x faster!
+
Syncsort Confidential and Proprietary - do not copy or distribute
Log File Processing
33
Syncsort Confidential and Proprietary - do not copy or distribute
Video - Placemeter
34
http://vimeo.com/69091237
Syncsort Confidential and Proprietary - do not copy or distribute
What to do next
No one is impartial, but it’s still worth talking to:
– Vendors
– Industry Analysts
– Industry Peers
– People at Meetups
– Practitioners like Chida
35
Syncsort Confidential and Proprietary - do not copy or distribute
Why Hadoop As a Data Management Platform?
The Reliability of a Mainframe, The
Massive Performance at Scale of an
MPP appliance, The Storage
Capacity of a SAN, All at a
Disruptively Low Price Point
36
Syncsort Confidential and Proprietary - do not copy or distribute
Big Data – Projects
37

More Related Content

What's hot

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
Arvind Kalyan
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Febiyan Rachman
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
eXascale Infolab
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
Ahmed Salman
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
C. Scyphers
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
Minhazul Arefin
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
Ghassan Al-Yafie
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
HostedbyConfluent
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
Matthew Dennis
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
nandhiniarumugam619
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
Frans van Noort
 
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
CloudxLab
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
Ryan Tabora
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
Ray Bugg
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
Abdullah Çetin ÇAVDAR
 

What's hot (20)

Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big Data - An Overview
Big Data -  An OverviewBig Data -  An Overview
Big Data - An Overview
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data Course - BigData HUB
Big Data Course - BigData HUBBig Data Course - BigData HUB
Big Data Course - BigData HUB
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
The rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computingThe rise of “Big Data” on cloud computing
The rise of “Big Data” on cloud computing
 
Rob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoop
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
The Future Of Big Data
The Future Of Big DataThe Future Of Big Data
The Future Of Big Data
 
Big data abstract
Big data abstractBig data abstract
Big data abstract
 
Big Data - A brief introduction
Big Data - A brief introductionBig Data - A brief introduction
Big Data - A brief introduction
 
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
Introduction to Big data with Hadoop & Spark | Big Data Hadoop Spark Tutorial...
 
Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Big Data Scotland 2017
Big Data Scotland 2017Big Data Scotland 2017
Big Data Scotland 2017
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 

Similar to Hadoop is Happening

Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
bigdatagurus_meetup
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
Pankajkumar496281
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
IMC Institute
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
Bob Hardaway
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
Arcadia Data
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
Hortonworks
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Precisely
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
Denodo
 
Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
DataWorks Summit/Hadoop Summit
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
Old Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe ITOld Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe IT
Precisely
 
Experiences in Mainframe-to-Splunk Big Data Access
Experiences in Mainframe-to-Splunk Big Data AccessExperiences in Mainframe-to-Splunk Big Data Access
Experiences in Mainframe-to-Splunk Big Data Access
Precisely
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
Joshua Patterson
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
Nagarjuna D.N
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
Arcadia Data
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
Nicolas Morales
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
Capgemini
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Edureka!
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
Ahnku Toh
 

Similar to Hadoop is Happening (20)

Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
 
Lesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptxLesson 1 introduction to_big_data_and_hadoop.pptx
Lesson 1 introduction to_big_data_and_hadoop.pptx
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Big data business case
Big data   business caseBig data   business case
Big data business case
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics Hybrid Cloud Strategy for Big Data and Analytics
Hybrid Cloud Strategy for Big Data and Analytics
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
Old Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe ITOld Dogs, New Tricks: Big Data from and for Mainframe IT
Old Dogs, New Tricks: Big Data from and for Mainframe IT
 
Experiences in Mainframe-to-Splunk Big Data Access
Experiences in Mainframe-to-Splunk Big Data AccessExperiences in Mainframe-to-Splunk Big Data Access
Experiences in Mainframe-to-Splunk Big Data Access
 
GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017GOAI: GPU-Accelerated Data Science DataSciCon 2017
GOAI: GPU-Accelerated Data Science DataSciCon 2017
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
How Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT AnalyticsHow Hewlett Packard Enterprise Gets Real with IoT Analytics
How Hewlett Packard Enterprise Gets Real with IoT Analytics
 
Getting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with BluemixGetting started with Hadoop on the Cloud with Bluemix
Getting started with Hadoop on the Cloud with Bluemix
 
Big Data - A Real Life Revolution
Big Data - A Real Life RevolutionBig Data - A Real Life Revolution
Big Data - A Real Life Revolution
 
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hado...
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
 

More from Precisely

AI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptxAI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptx
Precisely
 
Building a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i SecurityBuilding a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i Security
Precisely
 
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdfOptimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Precisely
 
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdfChaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Precisely
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Precisely
 
Navigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful MigrationNavigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful Migration
Precisely
 
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google ChronicleUnlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Precisely
 
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
Precisely
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Precisely
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
Precisely
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
Precisely
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Precisely
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
Precisely
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Precisely
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Precisely
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Precisely
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
Precisely
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
Precisely
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
Precisely
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
Precisely
 

More from Precisely (20)

AI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptxAI-Ready Data - The Key to Transforming Projects into Production.pptx
AI-Ready Data - The Key to Transforming Projects into Production.pptx
 
Building a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i SecurityBuilding a Multi-Layered Defense for Your IBM i Security
Building a Multi-Layered Defense for Your IBM i Security
 
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdfOptimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
Optimierte Daten und Prozesse mit KI / ML + SAP Fiori.pdf
 
Chaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdfChaining, Looping, and Long Text for Script Development and Automation.pdf
Chaining, Looping, and Long Text for Script Development and Automation.pdf
 
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial IntelligenceRevolutionizing SAP® Processes with Automation and Artificial Intelligence
Revolutionizing SAP® Processes with Automation and Artificial Intelligence
 
Navigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful MigrationNavigating the Cloud: Best Practices for Successful Migration
Navigating the Cloud: Best Practices for Successful Migration
 
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google ChronicleUnlocking the Power of Your IBM i and Z Security Data with Google Chronicle
Unlocking the Power of Your IBM i and Z Security Data with Google Chronicle
 
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdfHow to Build Data Governance Programs That Last - A Business-First Approach.pdf
How to Build Data Governance Programs That Last - A Business-First Approach.pdf
 
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter MassendatenZukuntssichere SAP Prozesse dank automatisierter Massendaten
Zukuntssichere SAP Prozesse dank automatisierter Massendaten
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
 

Recently uploaded

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
abdulrafaychaudhry
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
Boni García
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
abdulrafaychaudhry
 

Recently uploaded (20)

2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)Introduction to Pygame (Lecture 7 Python Game Development)
Introduction to Pygame (Lecture 7 Python Game Development)
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)APIs for Browser Automation (MoT Meetup 2024)
APIs for Browser Automation (MoT Meetup 2024)
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Pro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp BookPro Unity Game Development with C-sharp Book
Pro Unity Game Development with C-sharp Book
 

Hadoop is Happening

  • 2. Syncsort Confidential and Proprietary - do not copy or distribute Agenda Hadoop Evolution Use Cases The Hadoop Ecosystem, from open source to vendor solutions Tooling, implementation and skillset challenges Real-World Case Studies Future of Hadoop Q&A 2
  • 3. Syncsort Confidential and Proprietary - do not copy or distribute Our Guest – Chida from OpenOsmium 20+ years of Enterprise Application Development Experience Focused on Big Data & Cloud Founder of Big Data Solution Provider – OpenOsmium DC Tech Community Organizer of Meetups – Google Developer Group, Tech Breakfast, NoVA Hadoop User Group Open Source, Big Data and Cloud Advocate 703-568-7426, chida@openosmium.com 3
  • 4. Syncsort Confidential and Proprietary - do not copy or distribute EVOLUTION OF HADOOP 4
  • 5. Syncsort Confidential and Proprietary - do not copy or distribute Evolution of Hadoop – Data Volumes are Growing 5
  • 6. Syncsort Confidential and Proprietary - do not copy or distribute Evolution of Hadoop – Key Events 6 Next?2000 2004 Search Engine Problem @ Google 3 White Papers: GFS, MapReduce, BigTable MapReduce: Simplified Data Processing on Large Clusters Yahoo! HDFS, MapReduce, Hbase 2008 2010 2012 2013 MapR Hortonworks HHadoop 2.0 Cloudera
  • 7. Syncsort Confidential and Proprietary - do not copy or distribute Why Hadoop As a Data Management Platform? The Reliability of a Mainframe, The Massive Performance at Scale of an MPP appliance, The Storage Capacity of a SAN, All at a Disruptively Low Price Point 7
  • 8. Syncsort Confidential and Proprietary - do not copy or distribute The Economics of Data 8 Cost of managing 1TB of data Mainframe EDW Hadoop $20,000 – $100,000 $15,000 – $80,000 $250 – $2,000 Scalability Performance Reliability Agility Skills Supply But there’s more…
  • 9. Syncsort Confidential and Proprietary - do not copy or distribute Hadoop - The Big Picture 9 Unified computation provided by MapReduce distributed computing framework Unified storage provided by distributed file system called HDFS Commodity Hardware Hardware contains bunch of disks and cores Physical Logical Storage Computation
  • 10. Syncsort Confidential and Proprietary - do not copy or distribute MapReduce – Football Stadium Analogy 10
  • 11. Syncsort Confidential and Proprietary - do not copy or distribute Yesterday’s Architecture 11
  • 12. Syncsort Confidential and Proprietary - do not copy or distribute Tomorrow’s Data Architecture 12
  • 13. Syncsort Confidential and Proprietary - do not copy or distribute HADOOP USE CASES 13
  • 14. Syncsort Confidential and Proprietary - do not copy or distribute Hadoop Use Cases 14 Data Lake Offload Mainframe Data & Batch Workloads Machine Data Cyber Security Fraud Detection Offload ELT from Data WarehouseClickstream / Weblogs, EMR Social Media Data Geo Spatial Analyzing Video and Audio Analytics Real-Time Processing Predictive Analytics Unstructured Data Active Archive Multi-media Leverage “Dark Data” Sentiment Analysis Enterprise Data Hub
  • 15. Syncsort Confidential and Proprietary - do not copy or distribute Hadoop Use Cases A Roadmap for Hadoop Success – Offload batch & ELT workloads from data warehouse and mainframe systems into Hadoop – Develop and active archive, shed light on dark data – Build your Enterprise Data Hub (Data Lake!) – Leverage new data sources – Extend BI with data discovery & exploration – Deliver next-generation analytics 15
  • 16. Syncsort Confidential and Proprietary - do not copy or distribute Sample Use Case: Offload Phase III: Optimize & Secure Phase II: Offload Phase I: Identify • Identify data & workloads most suitable for offload • Focus on those that will deliver maximum savings & performance • Access and move virtually any data to Hadoop with one tool • Easily replicate existing workloads in Hadoop using a graphical user interface • Deploy and optimize the new environment • Manage & secure all your data with business class tools 16
  • 17. Syncsort Confidential and Proprietary - do not copy or distribute Phase 2: Deliver ‘Next-generation’ Applications Advanced – ‘Next-gen’ – Applications for Hadoop – Semi-structured data analytics • Clickstream/Weblog, Electronic Medical Records – Unstructured data analytics • video, audio, documents, text, social • Predictive modeling – Geospatial analysis – Real-Time Processing 17
  • 18. Syncsort Confidential and Proprietary - do not copy or distribute Use Cases Across Industries Vertical Refine Explore Enrich Retail & Web • Log Analysis/Site Optimization • Loyalty Program Optimization • Brand and Sentiment Analysis • Market basket analysis • Dynamic Pricing • Session & Content Optimization • Product recommendation Telco • Customer profiling • Equipment failure prediction • Location based advertising Government • Threat Identification • Person of Interest Discovery • Mission work Finance • Risk Modeling & Fraud Identification • Trade Performance Analytics • Surveillance and Fraud Detection • Customer Risk Analysis • Real-time upsell, cross sales marketing offers Energy • Smart Grid: Production Optimization • Grid Failure Prevention • Smart Meters • Individual Power Grid Manufacturing • Supply Chain Optimization • Customer Churn Analysis • Dynamic Delivery • Replacement parts Healthcare • Electronic Medical Records (EMPI) • Clinical decision support • Clinical Trials Analysis • Insurance Premium Determination 18
  • 19. Syncsort Confidential and Proprietary - do not copy or distribute IMPLEMENTATION & SKILLSET CHALLENGES 19
  • 20. Syncsort Confidential and Proprietary - do not copy or distribute Overview of Hadoop Challenges Hardware?? Skills?? Training?? Rapid change of Hadoop Ecosystem? 20
  • 21. Syncsort Confidential and Proprietary - do not copy or distribute Example 1 - ETL in Hadoop 21 COLLECT PROCESS DISTRIBUTE Sort JoinAggregate Copy Merge •FS Shell Put Command•Flume •Sqoop HARD •Pig •HiveQL•Java HARDER •Sqoop •FS Shell Get Command HARD
  • 22. Syncsort Confidential and Proprietary - do not copy or distribute 22 Images: http://monkeestv.tripod.com/BatMonkee/ Perception: Just Call the Mainframe Guy… Example 2 – Mainframe Data Ingestion
  • 23. Syncsort Confidential and Proprietary - do not copy or distribute Reality Example 2 – Mainframe Data Ingestion 23 Every Change = Time, Cost SMS Compression DB Tables, Flat Files Filtering , Reformatting Copy, Sort, Join, Aggregation EBCDIC to ASCII Cobol copybooks Call MF GuySMS Compression DB Tables, Flat Files Filtering , Reformatting Copy, Sort, Join, Aggregation EBCDIC to ASCII Cobol copybooks Call MF GuySMS Compression DB Tables, Flat Files Filtering , Reformatting Copy, Sort, Join, Aggregation EBCDIC to ASCII Cobol copybooks Image: bottletales.com
  • 24. Syncsort Confidential and Proprietary - do not copy or distribute Big Data Team 24 Senior Linux/Unix Admin Hadoop Administrators Infrastructure Engineers Java Developers  Hadoop Developers Object Oriented Developers  Hadoop Developers Data Analysts Functional Users  Hadoop Analytics Users Project Managers! Chief Data Officer Executive Management
  • 25. Syncsort Confidential and Proprietary - do not copy or distribute Enterprise Adoption Approach Agile Ideal Use Case for the company Proof-of-concept or Pilot Tech Heavy Aware of Available Options – Many.. Work with Solution Architects Infrastructure Analysis Security Options Testing.. Testing.. Integrating with current Stack Cost.. Cost.. Promises Vs Reality 25
  • 26. Syncsort Confidential and Proprietary - do not copy or distribute THE HADOOP ECOSYSTEMS – FROM OPEN SOURCE TO VENDOR TOOLS 26
  • 27. Syncsort Confidential and Proprietary - do not copy or distribute Hadoop Distributions 27
  • 28. Syncsort Confidential and Proprietary - do not copy or distribute 28 Vendor Landscape Distributions / Platforms Data Integration/ETL Search Document Store Database / Data Warehouse Social Operational XML Database Graphs
  • 29. Syncsort Confidential and Proprietary - do not copy or distribute REAL-WORLD CASE STUDIES 29
  • 30. Syncsort Confidential and Proprietary - do not copy or distribute Understanding Mainframe Data at Major US Bank 30 Customer hit a wall after months of manual effort migrating Mainframe data • Difficult to find data errors. No Mainframe application logic that matches Copybook • Large and complex Copybooks • Depends on Mainframe team to provide data • Very manual-intensive ; inadequate documentation • Not scalable. Only a few Java + Mainframe experts could do the work • Easy to validate Copybooks and find data errors • Ability to pull data directly from Mainframe without relying on Mainframe team • No coding. No scripting. Easier to document, maintain & reuse • Enables developers with a broader set of skills to build complex migration jobs. +( ) 86-page copybook ?Weeks 4 hrs Before: Manual Effort After: DMX-h + CDH 86-page copybook 30
  • 31. Syncsort Confidential and Proprietary - do not copy or distribute Social Security Administration The Challenge: – The SSA has an expensive problem with fraudulent claims for benefits, and they need more and better data to prevent and punish that fraud. The Office of the Inspector General for the SSA reports that: – “Nationally, in Fiscal Year 2011, there were more than 103,000 allegations of Social Security fraud, with more than 7,000 criminal investigations resulting in 1,374 convictions and more than $410 million in recoveries, fines, restitution, judgments, settlements, and savings.” Why Hadoop? – Data Processing Time – 30 hrs on the MF and PoC cluster completed in 2 hrs – Accuracy – Obituary data is likely more accurate over social media than current death file 31
  • 32. Syncsort Confidential and Proprietary - do not copy or distribute Optimizing the EDW at Large Teradata Customer 32 • Offload ELT processing from Teradata into CDH using DMX-h • Implement flexible architecture for staging and change data capture • Ability to pull data directly from Mainframe • No coding. Easier to maintain & reuse • Enable developers with a broader set of skills to build complex ETL workflows0 100 200 300 400 ElapsedTime(m) HiveQL 360 min DMX-h 15 min 0 4 8 12 16 Development Effort (Weeks) DMX-h 4 Man weeks HiveQL 12 Man weeks Impact on Loans Application Project:  Cut development time by 1/3  Reduced complexity. From 140 HiveQL scripts to 12 DMX-h graphical jobs  Eliminated need for Java user defined functions  24x faster! +
  • 33. Syncsort Confidential and Proprietary - do not copy or distribute Log File Processing 33
  • 34. Syncsort Confidential and Proprietary - do not copy or distribute Video - Placemeter 34 http://vimeo.com/69091237
  • 35. Syncsort Confidential and Proprietary - do not copy or distribute What to do next No one is impartial, but it’s still worth talking to: – Vendors – Industry Analysts – Industry Peers – People at Meetups – Practitioners like Chida 35
  • 36. Syncsort Confidential and Proprietary - do not copy or distribute Why Hadoop As a Data Management Platform? The Reliability of a Mainframe, The Massive Performance at Scale of an MPP appliance, The Storage Capacity of a SAN, All at a Disruptively Low Price Point 36
  • 37. Syncsort Confidential and Proprietary - do not copy or distribute Big Data – Projects 37