SlideShare a Scribd company logo
1 of 35
INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY11
Evolving Hadoop for the Data Society
Open Platform for Next-Gen Analytics
vin.sharma
strategy & marketing
open source x open data
INTEL CONFIDENTIAL2
Hope trumps hype
INTEL CONFIDENTIAL3
Virtuous cycle of data-driven innovation
CLOUD
Richer data to
analyze
2.8 Zettabytes of data
generated WW in 20121
CLIENTS
Richer
user experiences
Richer data
from devices
INTELLIGENT SYSTEMS
Sources: (1) IDC Digital Universe 2020, (2) IDC
40 Zettabytes of data will
be generated WW in 20201
INTEL CONFIDENTIAL4
Democratize data analysis
Enhance scientific understanding, drive innovation,
and accelerate medical cures
Create new data-driven business models, reduce
resource waste, improve organizational processes
Increase public safety with smart traffic and
improve energy efficiency with smart grids
INTEL CONFIDENTIAL
Models and Cases
INTEL CONFIDENTIAL6
Data Value
Data Analysis
Data-Intensive Discovery
Drug
Discovery
Life Sciences
Genome
Data
EMR
Clininical
Trials
Sensor
Data
Images
Sim
Data
Physical Sciences
Census
Data
Text
A/V
Surveys
Social Sciences
Treatment
Optimization
Hypothesis
Formation
Modeling &
Prediction
Astronomy
Particle
Physics
Public Policy
Trend
Analysis
Data Management
INTEL CONFIDENTIAL7
Value
• Enable researchers to discover biomarkers and
drug targets by correlating genomic data sets
• 90% gain in throughput; 6X data compression
Analytics
• Provide curated data sets with pre-computed
analysis (classification, correlation, biomarkers)
• Provide APIs for applications to combine and
analyze public and private data sets
Data Management
• Use Hive and Hadoop for query and search
• Dynamically partition and scale Hbase
• 10-node cluster / Intel Xeon E5 processors
• 10GbE network
Data-Intensive Discovery: Genomics
Intel Distribution
INTEL CONFIDENTIAL8
Data Value
Data Analysis
Data-Driven Business
Customer
Service
Telco
Content CDR
IP
Traffic ShopProduct
Customer
Behavior
Retail
Customer
Behavior
Transactions
FSI
Network
Optimization
Product
Innovation
Market
Insight
Business
Efficiency
Behavior
Modeling
Fraud
Analytics
Client
Engagement
Data Management
INTEL CONFIDENTIAL9
Data-Driven Business: Customer Service
Value
• 300 million wireless subscribers
• Enable subscriber access to billing data
• 30X gain in performance; lower TCO
Analytics
• Provides real-time retrieval of 6 months data
• Supports new BI with 15 types of queries
• Enables targeted ad serving and promotions
Data Management
• Use Hadoop/HBase for search and analysis
• 30 TB/month of billing data
• 300K reads/second; 800K inserts/second
• 133-node cluster / Intel Xeon E5 processors CDR
Subscriber Self Service
INTEL CONFIDENTIAL10
Data Value
Data Analysis
Data-Rich Communities
Customer
Service
Utilities
Meter
Data
Infrastructure
Data
Monitor
Data
Behavior
Police & Security
ID
Demographics
Government Services
Network
Optimization
Smart
Grids
Safe
Streets
Crime
Detection
Crime
Prevention
Service
Agility
Waste &
Fraud Analysis
Data Management
ID Programs
INTEL CONFIDENTIAL11
Data-Rich Communities: Smart City
Value
• Enforce traffic laws and detect license fraud
• Monitor and predict traffic patterns
• In a city of 31 million people
Analytics
• Detect traffic law violations automatically
• Detect driver license fraud by data mining
• Forecast traffic with predictive analytics
Data Management
• 30,000 cameras
• 6Mb/s stream rate per camera
• 15 PB of images in active use
• 2 billion records in HBase
Detection Prevention
Regional
Local
INTEL CONFIDENTIAL
Platform
INTEL CONFIDENTIAL13
14
Si
28.085
INTEL CONFIDENTIAL14
At the intersection of transformative forces
Enabling exascale computing
on massive data sets
Helping enterprises build
open interoperable clouds
Contributing code and
fostering ecosystem
HPC Cloud Open Source
10
18
INTEL CONFIDENTIAL15
Intel® Distribution for Apache Hadoop* software
* Other names and brands may be claimed as the property of others.
Hardware-enhanced performance & security
Enables partner innovation in analytics
Strengthens Apache Hadoop* ecosystem
INTEL CONFIDENTIAL16
Intel® Distribution for Apache Hadoop* software
version 3.x
All external names and brands are claimed as the property of others.
Intel® Manager for Apache Hadoop software
Deployment, Configuration, Monitoring, Alerts, and Security
HDFS 2.0.3
Hadoop Distributed File System
YARN (MRv2)
Distributed Processing Framework
HBase0.96.1
ColumnarStore
Zookeeper3.4.5
Coordination
Flume1.3.0
LogCollector
Sqoop1.4.1
DataExchange
Pig 0.9.2
Scripting
Hive 0.10.0
SQL Query
Oozie 3.3.0
Workflow
Mahout 0.7
Machine Learning
Hcatalog
Metadata
Connectors
Ingest, Analysis, Visual
Intel proprietary Intel enhancements contributed to open source Open source components included without change
INTEL CONFIDENTIAL17
Intel® Distribution for Apache Hadoop* software
version 2.3
• File-based encryption in HDFS
• Up to 20x faster decryption with AES-NI*
• Role-based access control for Hadoop services
• Up to 8.5X faster Hive queries using HBase co-processor
• Adaptive data replication in HDFS and Hbase
• Optimized for SSD with Cache Acceleration Software
• Integrated text search with Lucene
• Simplified deployment & comprehensive monitoring
• Automated configuration with Intel® Active Tuner
• Deployment of HBase across mutiple datacenters
• Detailed profiling of Hadoop jobs
• Simplified design of HBase schemas (+ in 2.4)
• REST APIs for deployment and management (+ in 2.4)
*Based on internal testing
Hardware-enhanced Security
Optimized Performance
Simplified Management
INTEL CONFIDENTIAL18
Intel® Distribution for Apache Hadoop* software
version 3.0
• Cell-level ACLs in HBase
• Encryption support in Hive and Pig
• Secure inter-node communication with SSL
• Compression and CRC with SSE 4.2
• Up to 8.5X faster Hive queries using HBase co-processor
• Adaptive replication in HDFS and HBase
• Snapshot support in Hadoop
• SNMP support for monitoring
*Based on internal testing
• Hadoop 2.0.3 and YARN support
• Lustre support
• GlusterFS support
• Hcatalog support
INTEL CONFIDENTIAL
Security & Performance
INTEL CONFIDENTIAL20
Enterprise data requires defense in depth
Firewall
Gateway
Authn
AuthZ
Encryption
Audit & Alerts
Containment
INTEL CONFIDENTIAL21
Intel Expressway protects Hadoop APIs
Authn
RBAC
Encryption
Containment
• Enforces consistent security policies across all Hadoop services
• Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS APIs
• Complies with Common Criteria EAL4+, HSM, FIPS 140-2 certifications
• Deploys as software, virtual appliance, or hardware appliance
Hcatalog
Stargate
WebHDFS
Firewall
REST APIs
INTEL CONFIDENTIAL22
Kerberos authenticates Hadoop services
Encryption
Containment
Firewall
APIs
Authentication
KDC
request
ticket
send service
ticket
request service
send respose
validate
ticket
4
1
2
3
5 Intel
Manager
• Wizard enables setup of
secure cluster with
encrypted key exchange
• Manager generates principal
and keytab for Hadoop
services
• Manager enables batch
upload of keytab files
INTEL CONFIDENTIAL23
Manager simplifies role-based access control
Firewall
AuthZ
• File, table, and service-level controls
• Intel Manager pushes ACLs to each node
INTEL CONFIDENTIAL24
Intel Distribution provides HDFS encryption
Firewall
RBAC
• Extends compression codec into crypto codec
• Provides an abstract API for general use
MapReduce
RecordReader
Map
Combiner
Partitioner
Local
Merge & Sort
Reduce
RecordWriter
HDFS
Decrypt
Encrypt
Derivative
Encrypt
Derivative
Decrypt
INTEL CONFIDENTIAL25
Intel AES-NI accelerates decryption 20x
64k 4k 1k
AES-NI 460 457 454
No AES-NI 87 87 86
0
50
100
150
200
250
300
350
400
450
500
Speed(MB/s)
AES Encryption
64k 4k 1k
AES-NI 1266 1259 1253
No AES-NI 64 63 63
0
200
400
600
800
1000
1200
1400
Speed(MB/s)
AES Decryption
20X6X
Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark*
and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the
results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance
of that product when combined with other products. For more information go to http://www.intel.com/performance.
• OpenSSL 1.0.1c optimized to use Intel AES-NI (7 math functions in processor accelerate AES)
• Intel Distribution crypto framework uses OpenSSL 1.0.1c
• Patch and design document released to open source (JIRA HADOOP-9331)
INTEL CONFIDENTIAL26
Learn more about Intel and Hadoop
• Unique insights that help you tune,
secure, and manage your deployment
in addition to essential understanding
of Apache Hadoop
• Distilled from years of Intel
experience in deploying and
optimizing Apache Hadoop and HBase
for enterprises
• Based on Intel expertise in optimizing
the full Hadoop stack – from Hive on
Hadoop through Java to Linux on x86
hardware
http://hadoop.intel.com
http://www.intel.com/bigdata
Intel Training and Certification Case Studies and Resources
INTEL CONFIDENTIAL
Agility
INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY2828
Savanna: Hadoop on OpenStack
Ilya Elterman
Senior Director Cloud Services
• Dev and QA teams - fast clusters provisioning
• Data Scientists/Analysts - API to run the
analytic jobs with infrastructure provisioning
happening under the hood
• Administrators - centralized cluster
management and monitoring
Hadoop on OpenStack Use Cases
Goal is to create native OpenStack component to
provision and operate Hadoop clusters on top of
OpenStack. Key characteristics:
• Open source
• Native for OpenStack
• Support for different Hadoop distributions
• Makes resources dedicated to IaaS cloud
available for Hadoop workloads
Savanna Key Principles
Savanna Architecture Overview
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
DAL
Nova
Glance
Swift
Savanna
Pages
Hadoop
VM
Provisioning
Plugin
Hadoop
VM
Hadoop
VM
Hadoop
VM
VM
Manager
Image
Registry
Savanna Roadmap
Phase 1 – Completed, April 13th
Basic cluster provisioning with “pre-built” images
Phase 2 – In Progress, July 15th
Pluggable mechanism of integration with vendor tooling
and cluster operations support
Phase 3 – Scoping, 2-3 months
"Analytics as a service” - job execution framework, support
different scripting languages
Learn more about Savanna
• All code and documentation open source
• Latest version 0.1.2 from 05/13
• Launchpad home page
• https://launchpad.net/savanna
• Code on stackforge
o Integrated with OpenStack CI/CD
o https://github.com/stackforge/savanna
• Active community
• https://lists.launchpad.net/savanna-all/
INTEL CONFIDENTIAL
Live Demo
Savanna with Intel Distribution
at Intel Booth
Evolving Hadoop for the Data Society

More Related Content

What's hot

Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...DataWorks Summit
 
Just the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronJust the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronDataWorks Summit
 
Power of Splunk Search Processing Language (SPL)
Power of Splunk Search Processing Language (SPL)Power of Splunk Search Processing Language (SPL)
Power of Splunk Search Processing Language (SPL)Splunk
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkHow to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkSplunk
 
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...DataWorks Summit
 
Splunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkHow to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkSplunk
 
PaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewPaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewCisco DevNet
 
Getting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning GuideGetting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning GuideIntel IT Center
 
Splunk 101
Splunk 101Splunk 101
Splunk 101Splunk
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's NewHortonworks
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceThiago Santiago
 
What’s New: Splunk App for Stream and Splunk MINT
What’s New: Splunk App for Stream and Splunk MINTWhat’s New: Splunk App for Stream and Splunk MINT
What’s New: Splunk App for Stream and Splunk MINTSplunk
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016StampedeCon
 
Social Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and SupersetSocial Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and SupersetThiago Santiago
 
Splunk for IT Operations
Splunk for IT OperationsSplunk for IT Operations
Splunk for IT OperationsSplunk
 
Machine Data 101 Hands-on
Machine Data 101 Hands-onMachine Data 101 Hands-on
Machine Data 101 Hands-onSplunk
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...DataWorks Summit/Hadoop Summit
 
What's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingSplunk
 

What's hot (20)

Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
 
Just the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache MetronJust the sketch: advanced streaming analytics in Apache Metron
Just the sketch: advanced streaming analytics in Apache Metron
 
Power of Splunk Search Processing Language (SPL)
Power of Splunk Search Processing Language (SPL)Power of Splunk Search Processing Language (SPL)
Power of Splunk Search Processing Language (SPL)
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkHow to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk
 
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
Deep Learning in Security - Examples, Infrastructure, Challenges, and Suggest...
 
Splunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search DojoSplunk Ninjas: New Features and Search Dojo
Splunk Ninjas: New Features and Search Dojo
 
How to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in SplunkHow to Design, Build and Map IT and Business Services in Splunk
How to Design, Build and Map IT and Business Services in Splunk
 
PaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overviewPaNDA - a platform for Network Data Analytics: an overview
PaNDA - a platform for Network Data Analytics: an overview
 
Getting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning GuideGetting Started with Big Data: Planning Guide
Getting Started with Big Data: Planning Guide
 
Splunk 101
Splunk 101Splunk 101
Splunk 101
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
What’s New: Splunk App for Stream and Splunk MINT
What’s New: Splunk App for Stream and Splunk MINTWhat’s New: Splunk App for Stream and Splunk MINT
What’s New: Splunk App for Stream and Splunk MINT
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 
Social Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and SupersetSocial Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and Superset
 
Splunk for IT Operations
Splunk for IT OperationsSplunk for IT Operations
Splunk for IT Operations
 
Machine Data 101 Hands-on
Machine Data 101 Hands-onMachine Data 101 Hands-on
Machine Data 101 Hands-on
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
 
What's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-Boarding
 

Similar to Evolving Hadoop for the Data Society

Intel APJ Enterprise Day - Keynote by RK Hiremane
Intel APJ Enterprise Day - Keynote by RK HiremaneIntel APJ Enterprise Day - Keynote by RK Hiremane
Intel APJ Enterprise Day - Keynote by RK HiremaneIntelAPAC
 
Partner Keynote: Intel - The New Frontier of Cloud Computing
Partner Keynote: Intel - The New Frontier of Cloud ComputingPartner Keynote: Intel - The New Frontier of Cloud Computing
Partner Keynote: Intel - The New Frontier of Cloud ComputingAmazon Web Services
 
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014StampedeCon
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
 
Dell AI Telecom Webinar
Dell AI Telecom WebinarDell AI Telecom Webinar
Dell AI Telecom WebinarBill Wong
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Mark Goldstein
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Amazon Web Services
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC
 
Big Data Intel® Platform
Big Data Intel® PlatformBig Data Intel® Platform
Big Data Intel® Platformxband
 
Data Amp South Africa - SQL Server 2017
Data Amp South Africa - SQL Server 2017Data Amp South Africa - SQL Server 2017
Data Amp South Africa - SQL Server 2017Travis Wright
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdfRAHULRAHU8
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafkaconfluent
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
Big data intel platform commenting
Big data   intel platform commentingBig data   intel platform commenting
Big data intel platform commentingIntel IT Center
 
Excellent slides on the new z13s announced on 16th Feb 2016
Excellent slides on the new z13s announced on 16th Feb 2016Excellent slides on the new z13s announced on 16th Feb 2016
Excellent slides on the new z13s announced on 16th Feb 2016Luigi Tommaseo
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbetaAhnku Toh
 
SQL Server 2017 Overview and Partner Opportunities
SQL Server 2017 Overview and Partner OpportunitiesSQL Server 2017 Overview and Partner Opportunities
SQL Server 2017 Overview and Partner OpportunitiesTravis Wright
 
Advanced Analytics: Going From Big Data to Big Answers
Advanced Analytics: Going From Big Data to Big AnswersAdvanced Analytics: Going From Big Data to Big Answers
Advanced Analytics: Going From Big Data to Big Answersajayc47
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 

Similar to Evolving Hadoop for the Data Society (20)

Intel APJ Enterprise Day - Keynote by RK Hiremane
Intel APJ Enterprise Day - Keynote by RK HiremaneIntel APJ Enterprise Day - Keynote by RK Hiremane
Intel APJ Enterprise Day - Keynote by RK Hiremane
 
Partner Keynote: Intel - The New Frontier of Cloud Computing
Partner Keynote: Intel - The New Frontier of Cloud ComputingPartner Keynote: Intel - The New Frontier of Cloud Computing
Partner Keynote: Intel - The New Frontier of Cloud Computing
 
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
Intel’s Big Data and Hadoop Security Initiatives - StampedeCon 2014
 
AWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Summit Berlin 2013 - Big Data Analytics
AWS Summit Berlin 2013 - Big Data Analytics
 
Dell AI Telecom Webinar
Dell AI Telecom WebinarDell AI Telecom Webinar
Dell AI Telecom Webinar
 
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
Phoenix Data Conference - Big Data Analytics for IoT 11/4/17
 
The Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoTThe Intel Xeon Scalable Processor and IoT
The Intel Xeon Scalable Processor and IoT
 
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
Trusted Analytics as a Service (BDT209) | AWS re:Invent 2013
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
 
Big Data Intel® Platform
Big Data Intel® PlatformBig Data Intel® Platform
Big Data Intel® Platform
 
Data Amp South Africa - SQL Server 2017
Data Amp South Africa - SQL Server 2017Data Amp South Africa - SQL Server 2017
Data Amp South Africa - SQL Server 2017
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
Big data intel platform commenting
Big data   intel platform commentingBig data   intel platform commenting
Big data intel platform commenting
 
Excellent slides on the new z13s announced on 16th Feb 2016
Excellent slides on the new z13s announced on 16th Feb 2016Excellent slides on the new z13s announced on 16th Feb 2016
Excellent slides on the new z13s announced on 16th Feb 2016
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
 
SQL Server 2017 Overview and Partner Opportunities
SQL Server 2017 Overview and Partner OpportunitiesSQL Server 2017 Overview and Partner Opportunities
SQL Server 2017 Overview and Partner Opportunities
 
Advanced Analytics: Going From Big Data to Big Answers
Advanced Analytics: Going From Big Data to Big AnswersAdvanced Analytics: Going From Big Data to Big Answers
Advanced Analytics: Going From Big Data to Big Answers
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 

Evolving Hadoop for the Data Society

  • 1. INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY11 Evolving Hadoop for the Data Society Open Platform for Next-Gen Analytics vin.sharma strategy & marketing open source x open data
  • 3. INTEL CONFIDENTIAL3 Virtuous cycle of data-driven innovation CLOUD Richer data to analyze 2.8 Zettabytes of data generated WW in 20121 CLIENTS Richer user experiences Richer data from devices INTELLIGENT SYSTEMS Sources: (1) IDC Digital Universe 2020, (2) IDC 40 Zettabytes of data will be generated WW in 20201
  • 4. INTEL CONFIDENTIAL4 Democratize data analysis Enhance scientific understanding, drive innovation, and accelerate medical cures Create new data-driven business models, reduce resource waste, improve organizational processes Increase public safety with smart traffic and improve energy efficiency with smart grids
  • 6. INTEL CONFIDENTIAL6 Data Value Data Analysis Data-Intensive Discovery Drug Discovery Life Sciences Genome Data EMR Clininical Trials Sensor Data Images Sim Data Physical Sciences Census Data Text A/V Surveys Social Sciences Treatment Optimization Hypothesis Formation Modeling & Prediction Astronomy Particle Physics Public Policy Trend Analysis Data Management
  • 7. INTEL CONFIDENTIAL7 Value • Enable researchers to discover biomarkers and drug targets by correlating genomic data sets • 90% gain in throughput; 6X data compression Analytics • Provide curated data sets with pre-computed analysis (classification, correlation, biomarkers) • Provide APIs for applications to combine and analyze public and private data sets Data Management • Use Hive and Hadoop for query and search • Dynamically partition and scale Hbase • 10-node cluster / Intel Xeon E5 processors • 10GbE network Data-Intensive Discovery: Genomics Intel Distribution
  • 8. INTEL CONFIDENTIAL8 Data Value Data Analysis Data-Driven Business Customer Service Telco Content CDR IP Traffic ShopProduct Customer Behavior Retail Customer Behavior Transactions FSI Network Optimization Product Innovation Market Insight Business Efficiency Behavior Modeling Fraud Analytics Client Engagement Data Management
  • 9. INTEL CONFIDENTIAL9 Data-Driven Business: Customer Service Value • 300 million wireless subscribers • Enable subscriber access to billing data • 30X gain in performance; lower TCO Analytics • Provides real-time retrieval of 6 months data • Supports new BI with 15 types of queries • Enables targeted ad serving and promotions Data Management • Use Hadoop/HBase for search and analysis • 30 TB/month of billing data • 300K reads/second; 800K inserts/second • 133-node cluster / Intel Xeon E5 processors CDR Subscriber Self Service
  • 10. INTEL CONFIDENTIAL10 Data Value Data Analysis Data-Rich Communities Customer Service Utilities Meter Data Infrastructure Data Monitor Data Behavior Police & Security ID Demographics Government Services Network Optimization Smart Grids Safe Streets Crime Detection Crime Prevention Service Agility Waste & Fraud Analysis Data Management ID Programs
  • 11. INTEL CONFIDENTIAL11 Data-Rich Communities: Smart City Value • Enforce traffic laws and detect license fraud • Monitor and predict traffic patterns • In a city of 31 million people Analytics • Detect traffic law violations automatically • Detect driver license fraud by data mining • Forecast traffic with predictive analytics Data Management • 30,000 cameras • 6Mb/s stream rate per camera • 15 PB of images in active use • 2 billion records in HBase Detection Prevention Regional Local
  • 14. INTEL CONFIDENTIAL14 At the intersection of transformative forces Enabling exascale computing on massive data sets Helping enterprises build open interoperable clouds Contributing code and fostering ecosystem HPC Cloud Open Source 10 18
  • 15. INTEL CONFIDENTIAL15 Intel® Distribution for Apache Hadoop* software * Other names and brands may be claimed as the property of others. Hardware-enhanced performance & security Enables partner innovation in analytics Strengthens Apache Hadoop* ecosystem
  • 16. INTEL CONFIDENTIAL16 Intel® Distribution for Apache Hadoop* software version 3.x All external names and brands are claimed as the property of others. Intel® Manager for Apache Hadoop software Deployment, Configuration, Monitoring, Alerts, and Security HDFS 2.0.3 Hadoop Distributed File System YARN (MRv2) Distributed Processing Framework HBase0.96.1 ColumnarStore Zookeeper3.4.5 Coordination Flume1.3.0 LogCollector Sqoop1.4.1 DataExchange Pig 0.9.2 Scripting Hive 0.10.0 SQL Query Oozie 3.3.0 Workflow Mahout 0.7 Machine Learning Hcatalog Metadata Connectors Ingest, Analysis, Visual Intel proprietary Intel enhancements contributed to open source Open source components included without change
  • 17. INTEL CONFIDENTIAL17 Intel® Distribution for Apache Hadoop* software version 2.3 • File-based encryption in HDFS • Up to 20x faster decryption with AES-NI* • Role-based access control for Hadoop services • Up to 8.5X faster Hive queries using HBase co-processor • Adaptive data replication in HDFS and Hbase • Optimized for SSD with Cache Acceleration Software • Integrated text search with Lucene • Simplified deployment & comprehensive monitoring • Automated configuration with Intel® Active Tuner • Deployment of HBase across mutiple datacenters • Detailed profiling of Hadoop jobs • Simplified design of HBase schemas (+ in 2.4) • REST APIs for deployment and management (+ in 2.4) *Based on internal testing Hardware-enhanced Security Optimized Performance Simplified Management
  • 18. INTEL CONFIDENTIAL18 Intel® Distribution for Apache Hadoop* software version 3.0 • Cell-level ACLs in HBase • Encryption support in Hive and Pig • Secure inter-node communication with SSL • Compression and CRC with SSE 4.2 • Up to 8.5X faster Hive queries using HBase co-processor • Adaptive replication in HDFS and HBase • Snapshot support in Hadoop • SNMP support for monitoring *Based on internal testing • Hadoop 2.0.3 and YARN support • Lustre support • GlusterFS support • Hcatalog support
  • 20. INTEL CONFIDENTIAL20 Enterprise data requires defense in depth Firewall Gateway Authn AuthZ Encryption Audit & Alerts Containment
  • 21. INTEL CONFIDENTIAL21 Intel Expressway protects Hadoop APIs Authn RBAC Encryption Containment • Enforces consistent security policies across all Hadoop services • Serves as a trusted proxy to Hadoop, Hbase, and WebHDFS APIs • Complies with Common Criteria EAL4+, HSM, FIPS 140-2 certifications • Deploys as software, virtual appliance, or hardware appliance Hcatalog Stargate WebHDFS Firewall REST APIs
  • 22. INTEL CONFIDENTIAL22 Kerberos authenticates Hadoop services Encryption Containment Firewall APIs Authentication KDC request ticket send service ticket request service send respose validate ticket 4 1 2 3 5 Intel Manager • Wizard enables setup of secure cluster with encrypted key exchange • Manager generates principal and keytab for Hadoop services • Manager enables batch upload of keytab files
  • 23. INTEL CONFIDENTIAL23 Manager simplifies role-based access control Firewall AuthZ • File, table, and service-level controls • Intel Manager pushes ACLs to each node
  • 24. INTEL CONFIDENTIAL24 Intel Distribution provides HDFS encryption Firewall RBAC • Extends compression codec into crypto codec • Provides an abstract API for general use MapReduce RecordReader Map Combiner Partitioner Local Merge & Sort Reduce RecordWriter HDFS Decrypt Encrypt Derivative Encrypt Derivative Decrypt
  • 25. INTEL CONFIDENTIAL25 Intel AES-NI accelerates decryption 20x 64k 4k 1k AES-NI 460 457 454 No AES-NI 87 87 86 0 50 100 150 200 250 300 350 400 450 500 Speed(MB/s) AES Encryption 64k 4k 1k AES-NI 1266 1259 1253 No AES-NI 64 63 63 0 200 400 600 800 1000 1200 1400 Speed(MB/s) AES Decryption 20X6X Software and workloads used in performance tests may have been optimized for performance only on Intel® microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance. • OpenSSL 1.0.1c optimized to use Intel AES-NI (7 math functions in processor accelerate AES) • Intel Distribution crypto framework uses OpenSSL 1.0.1c • Patch and design document released to open source (JIRA HADOOP-9331)
  • 26. INTEL CONFIDENTIAL26 Learn more about Intel and Hadoop • Unique insights that help you tune, secure, and manage your deployment in addition to essential understanding of Apache Hadoop • Distilled from years of Intel experience in deploying and optimizing Apache Hadoop and HBase for enterprises • Based on Intel expertise in optimizing the full Hadoop stack – from Hive on Hadoop through Java to Linux on x86 hardware http://hadoop.intel.com http://www.intel.com/bigdata Intel Training and Certification Case Studies and Resources
  • 28. INTEL CONFIDENTIAL, FOR INTERNAL USE ONLY2828 Savanna: Hadoop on OpenStack Ilya Elterman Senior Director Cloud Services
  • 29. • Dev and QA teams - fast clusters provisioning • Data Scientists/Analysts - API to run the analytic jobs with infrastructure provisioning happening under the hood • Administrators - centralized cluster management and monitoring Hadoop on OpenStack Use Cases
  • 30. Goal is to create native OpenStack component to provision and operate Hadoop clusters on top of OpenStack. Key characteristics: • Open source • Native for OpenStack • Support for different Hadoop distributions • Makes resources dedicated to IaaS cloud available for Hadoop workloads Savanna Key Principles
  • 32. Savanna Roadmap Phase 1 – Completed, April 13th Basic cluster provisioning with “pre-built” images Phase 2 – In Progress, July 15th Pluggable mechanism of integration with vendor tooling and cluster operations support Phase 3 – Scoping, 2-3 months "Analytics as a service” - job execution framework, support different scripting languages
  • 33. Learn more about Savanna • All code and documentation open source • Latest version 0.1.2 from 05/13 • Launchpad home page • https://launchpad.net/savanna • Code on stackforge o Integrated with OpenStack CI/CD o https://github.com/stackforge/savanna • Active community • https://lists.launchpad.net/savanna-all/
  • 34. INTEL CONFIDENTIAL Live Demo Savanna with Intel Distribution at Intel Booth