SlideShare a Scribd company logo
Evolution of Big Data at Intel - crawl, walk
and run approach
Gomathy Bala | Director
Chandhu Yalla | Manager & Architect
Key Contributors: Sonja Sandeen, Seshu Edala, Nghia Ngo and Darin Watson
IT BI Big Data Team
Copyright © 2014, Intel Corporation. All rights reserved.
Legal Notices
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
The content in this presentation is being shared Under NDA.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2014, Intel Corporation. All rights reserved.
2
Copyright © 2014, Intel Corporation. All rights reserved.
Agenda
• Intel IT Big Data Journey
• Enterprise DW architecture
• BI Big Data 3 yr Roadmap
• Big Data Ecosystem Architecture
• Platform Strategies & BKMs
• Summary
3
Copyright © 2014, Intel Corporation. All rights reserved.
2011 2012 2013 2014 2015
Intel IT Big Data Journey
4
Big Data
&
Analytics
Strategy
Production
Online
Telmap:
1st Use Case
Preproduction
Online
Hadoop
Evaluation
IDH to CDH
Hadoop 2.0
$176M BV
Production: Security BI,
Attribute Reduction System,
ATM Ellipses Engine, IAH-
Retail Analytics
6 Environments
CDH 5.3
4 Use Cases in
Preproduction
12 POC Use
Cases
6 Use Cases in
Production
$290K
investment
$948/TB
3 Use Cases in
Production
Smart-What, Marketing-
IAH, Incident
Predictability
$6M BV
CDH 5.1
IAH – Cloud CRM
In Production
Enterprise
Standards,
Guidance,
Processes for
Platform &
Capabilities
15 Active Use Cases | $290K + 10.5 HC Investment | Delivered $182M BV
Copyright © 2014, Intel Corporation. All rights reserved.
Big Data & Analytics Really Delivers!
5From 2014 – 2015 Intel IT Business Review – Annual Edition
Kim's Video
Copyright © 2014, Intel Corporation. All rights reserved.
Any Data Source
ERP
In Memory Real-Time Data Platform
CRM
SCM
SRM
ECC
BW
ECCW
Real-Time & Self Service
Analytics Platform
MDG
NW
Teradata Cloudera Hadoop Data Lake
Reporting Tools
Data Tiering
Hot-Cold data
Enterprise
Data Warehouse
Other Apps
Custom
Intel
…
NR
T
Predictive
Analytics
BPC
BCS
Cloud
BI
Saa
S
New
Apps.
Downstream
Applications
2014-2017 Vision: Real-Time Enterprise
6
Copyright © 2014, Intel Corporation. All rights reserved.
FE Tools
CLS/Proxy
High speed data loader
BigData
• Machine Learning
• Log Processing
• Unstructured data
Use Cases
• High volume counter Analytics
• Text Parsing/Mining
• Strategic/Operational reporting
• Interactive Reporting
Use Cases
• High Concurrent user analytics -
Supply/Order
• Mission critical analytics – Finance/HR
SQL on Hadoop
Enterprise Data Architecture with Hadoop and Other MPP DWH
Current & Future Strategy
Future Present
EDWMfg Data
A %ge of
Traditiona
l BI use
cases
IMT
Copyright © 2014, Intel Corporation. All rights reserved.
BI Big Data | 3-Year Roadmap
8
Big Data + AA
Big Data + SSAA +
Traditional BI
Big Data + SSAA +
Traditional BI
2015
2016
2017
Scalable and well
designed Hadoop
Platform
 Evolve IMT + Hadoop
 Data Lineage & Data
Catalog
 Streaming Capabilities
 Advanced SQL on Hadoop
 ACID semantics
 Evolve Big Data + SSAA per
ecosystem roadmaps
 BC/DR
 End to end enterprise features
 Enterprise ready: OLAP and
Traditional DW
Hadoop is an open source framework designed for big data analytics.
Hadoop is evolving rapidly, but it will still take a couple of years for it to
mature and support “traditional bi” use cases.
Legend
Orange Text: Traditional BI Capabilities
Green Text: Big Data/AA Capabilities
 Security (RBAC, ITS/IRS)
 Data Governance
 Data Discovery
 Self Service AA Framework
 IMT + Hadoop
 AVP + Hadoop
 In-memory + Near real time
capabilities
 SQL on Hadoop
Copyright © 2014, Intel Corporation. All rights reserved.
Data Integration
Big Data Platform – Ecosystem Architecture & Maturity
9
NRT/Stream Processing In-Memory Processing
Processing
Layer Batch Processing
Data Virtualization Data DiscoveryAdv. AnalyticsAdv. Visualization
Data
Management
Presentation
Layer
End User
Data
Steward
Business
Analyst
Data
Scientist
DeveloperUser layer Auditor
Machine Learning
Analytical
layer Statistical
Numerical Time series
Textual/Log Spatial
Graph
Textual/Log DB Hierarchy DBRelational DB Graph DB
Storage
Model
Platform Virtualization
Infrastructure
Platform Management Network Management Systems Management
Data Ingestion
Continuous IntegrationDev Framework Security
Source/Target APIs 3rd Party Drivers
Ent. Scheduler Srvs Metadata MgmtWorkload Mgmt
Middleware
*Other names and brands may be claimed as the property of others.
Columnar DB
Data Egression
Other Vendors offered capabilities
Majority CDH offered capabilities
Data Consumption
Prescriptive
Guidance
Change
Release
GovernanceEngagement
Service
Management
Training
Support
Processes
Copyright © 2014, Intel Corporation. All rights reserved.
BI Big Data Platform
10
Hadoop Project Sandbox – CDH 5.3
Multiple Instances
Deployed on Intel Cloud & MyCloud
environments. TTM to business: 2-3 Days
Hadoop Pre-Production – CDH 5.3
10 data nodes | 399TB | 320 vcores
Use cases in Dev/POC: 14
Hadoop Production – CDH 5.3
22 data nodes | 658TB | 704 vcores
Use cases Live in prod: 7
 Hadoop 2.0 architecture provides reliability,
scalability & performance
 High availability and scalability design
 Well positioned to meet 2015 business use case
requirements
 Repeatable architecture for faster builds.
 Capacity additions: Add data node. White boxes,
Waterfall equipment or HP servers
 TTM: Varies depending on HW (3 wks-2 months) Job/Workflow
Management
Data Node Data Node Data Node Data Node Data Node
Name Node
Resource Mgr
Name Node
Resource Mgr
heartbeat, balancing, replication
YARN
Scale to meet business needs
Gateway
Nodes
(NN hi-av)
Gateway nodes
Login (ssh) : AD authentication &
authorization, access cluster, run
HDFS commands, submit jobs, etc.
Management
Node
Source Data
DB Data
Visualization
Tools
Data Movement/ETL
EDW or Datamart
DB data
Unstructured Semi-structured
Copyright © 2014, Intel Corporation. All rights reserved.
• Skills and resources with time to ramp up
• Starting small is ok. Focus on design and scalability for the platform.
• Technical product evaluation
 Stick with a distribution which is core Hadoop open source stack vs proprietary software
• Security is a big deal to Intel, Big Data Security capabilities implementation is
key focus
• Methodology to understand the data is to use an iterative discovery method with
technical, business and modeling teams.
• Intel IT Big Data Journey benefited heavily from Cloudera partnership
• Open source will play a big role in advancing Big Data capabilities and analytics
BKM’s | Summary
Copyright © 2014, Intel Corporation. All rights reserved.
BI Big Data IT@Intel Resource Info
12
BI Big Data IT@Intel Resource Links:
1. Hadoop Migration Success Story: How Intel IT Moved to Cloudera
2. Mining Big Data in the Enterprise for Better Business Intelligence
3. Enabling Big Data Platforms and Solutions with Centralized Data Management
4. Integrating Apache Hadoop* into Intel’s Big Data Environment
5. Using a Multiple Data Warehouse Strategy to Improve BI Analytics
To learn more: www.intel.com/bigdata
Copyright © 2014, Intel Corporation. All rights reserved.
Q & A
13
Intel Confidential — Do Not Forward
Copyright © 2014, Intel Corporation. All rights reserved.
Backup
15
Copyright © 2014, Intel Corporation. All rights reserved.
Big Data Capability Catalog
Hive
HDFS MapReduceZookeeper
Pig Mahout
NetworkServers Storage Security OS Hi-AvEAM / AD Integration
HDFS Compress
WHIRR
Hbase
Governance
Change
Release
Engagement
Service mgmt.
Prescriptive
Guidance
Training
SQOOP JDBC
Other DW
Infrastructure
Process
Cloudera* Distribution of Hadoop (CDH)
*Other names and brands may be claimed as the property of others.
Storm
Hcatalog
ACCUMULOYARN
SPARK
Autosys
SecureGIT
Impala JDBC
HiveODBC
3rd Party SW/Connectors
Integration
HUE SOLRIMPALA
PARQUET DataFu
Impala ODBC
TDCH
Oozie
Kafka
Sqoop
DI
Gateway
Flume
SFTP
SMBClient
Data
Integration
Camel
Enabled PlannedWIP
Avail. Now 1-3 Months 3-6+ Months
Cloudera Manager*
System Management
Cloudera Navigator*
Data Management
Audit
Access Control
Discovery Explore
Lineage Lifecyle
DeploymentMonitoring Reporting Diagnostics
Alerting
Service
Management
Rolling
Upgrades
Config
Rollbacks
List includes only the capabilities planned for next 6 months.
16
Google Analytics
SFDC
Sentry
Copyright © 2014, Intel Corporation. All rights reserved.
i. Find Differences with a
Comparative Evaluation in a
Sandbox Environment
ii. Define Your Strategy for the
Cloudera Implementation
iii. Split the Hardware
Environment
iv. Upgrade the Hadoop Version
v. Create a Preproduction-to-
Production Pipeline
vi. Rebalance the Data
Migration to Cloudera – 6 BKMs
Copyright © 2014, Intel Corporation. All rights reserved.
Building Block Strategy to Enterprise Security of Hadoop
Q1’15: Perimeter access with LDAP + finer grain
controls with Sentry. The second building block
towards enterprise grade security design.
Q2’15: Add Kerberos to enable
more Hadoop components and
further secure the platform
2H’15: Exploration starting,
awaiting product and target to
adopt in 2H’15 in Production.
NowQ2’15 2H’15
Copyright © 2014, Intel Corporation. All rights reserved.
Hadoop Maturity & Evolution
19
MapReduce
(batch data processing, cluster
resource management)
HDFS 1.0
(redundant, reliable
data storage)
Hadoop 1.0
YARN
(cluster resource management)
HDFS 2.0
(redundant, reliable data storage)
Interactive
(Impala)
In-Memory
(Spark)
Batch
(Map
Reduce)
Online
(Hbase)
Others
(Search, Storm
etc.)
Graph
Applications Run Natively In Hadoop
+ Scalable data storage and processing
platform
+ Positioned for Batch processing workloads
for Map and Reduce only
+ Apache Hive offers SQL like query
language
- Lacks reliability and stability
- No support for low latency queries
 Apache YARN allows you to run multiple applications in Hadoop and provides reliability, scalability
and performance
 Advanced Resource Management
 Apache Hive offers a 50x improvement in performance for queries
 Cloudera Impala to support low latency query requirements with SQL-92 and SQL- 2000 support
 Data at Rest Encryption and Row Level/Cell Level Security planned
 Data Streaming and Search Capability
 GraphDB
 Expanded Data Governance
 IMT + Hadoop Integration
 Improved Front End tool integration/support
 Deeper Diagnostics for multiple components
2005 - 2012 2013 - 2014
Hadoop 2.0
HDFS
(redundant, reliable
data storage)
YARN
(cluster resource management)
Batch
(Map Reduce)
Others
(data processing)
2015 - 2017
Copyright © 2014, Intel Corporation. All rights reserved.
2014 Intel IT Vital Statistics
20
>6,300 IT employees
59 global IT sites
>98,000 Intel employees1
168 Intel sites in 65 Countries
64 Data Centers
(91 Data Centers in 2010)
80% of servers virtualized
(42% virtualized in 2010, goal of 75%)
>147,000+ Devices
100% of laptops encrypted
100% of laptops with SSD’s
>43,200 handheld devices
57 mobile applications developed
Source: Information provided by Intel IT as of Jan 2014
1Total employee count does not include wholly owned subsidiaries that Intel IT
does not directly support
Copyright © 2014, Intel Corporation. All rights reserved.
Copyright © 2014, Intel Corporation. All rights reserved.
Big Data in the Industry
21
Recommendation Engine Fraud Detection
Sentiment Analytics
Behavioral Targeting
Customer Experience AnalyticsMarketing campaign Analytics
Copyright © 2014, Intel Corporation. All rights reserved.
Learn more about Intel IT’s Initiatives at
www.intel.com/IT
Sharing Intel IT Best Practices
With the World

More Related Content

What's hot

Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
Mithlesh Sadh
 
Big Data
Big DataBig Data
Big Data
Vinayak Kamath
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
neeraj rathore
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
Atidan Technologies Pvt Ltd (India)
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
Michael Rainey
 
Data Privacy in the DMBOK - No Need to Reinvent the Wheel
Data Privacy in the DMBOK - No Need to Reinvent the WheelData Privacy in the DMBOK - No Need to Reinvent the Wheel
Data Privacy in the DMBOK - No Need to Reinvent the Wheel
DATAVERSITY
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
Databricks
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Everis big data_wilson_v1.4
Everis big data_wilson_v1.4Everis big data_wilson_v1.4
Everis big data_wilson_v1.4
wilson_lucas
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
Hari Priya
 
Oracle Cloud Infrastructure.pptx
Oracle Cloud Infrastructure.pptxOracle Cloud Infrastructure.pptx
Oracle Cloud Infrastructure.pptx
GarvitNTT
 
Big Data
Big DataBig Data
Big Data
Rohit Jain
 
A Roadmap to Data Migration Success
A Roadmap to Data Migration SuccessA Roadmap to Data Migration Success
A Roadmap to Data Migration Success
FindWhitePapers
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
Prashant Sharma
 
Data Cleansing
Data CleansingData Cleansing
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
RohithND
 
Microsoft Cloud Computing
Microsoft Cloud ComputingMicrosoft Cloud Computing
Microsoft Cloud Computing
David Chou
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification
303Computing
 
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Knoldus Inc.
 
Key Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramKey Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance Program
DATAVERSITY
 

What's hot (20)

Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big Data
Big DataBig Data
Big Data
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
Three Big Data Case Studies
Three Big Data Case StudiesThree Big Data Case Studies
Three Big Data Case Studies
 
Data Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the CloudData Warehouse - Incremental Migration to the Cloud
Data Warehouse - Incremental Migration to the Cloud
 
Data Privacy in the DMBOK - No Need to Reinvent the Wheel
Data Privacy in the DMBOK - No Need to Reinvent the WheelData Privacy in the DMBOK - No Need to Reinvent the Wheel
Data Privacy in the DMBOK - No Need to Reinvent the Wheel
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Everis big data_wilson_v1.4
Everis big data_wilson_v1.4Everis big data_wilson_v1.4
Everis big data_wilson_v1.4
 
Introduction to big data
Introduction to big dataIntroduction to big data
Introduction to big data
 
Oracle Cloud Infrastructure.pptx
Oracle Cloud Infrastructure.pptxOracle Cloud Infrastructure.pptx
Oracle Cloud Infrastructure.pptx
 
Big Data
Big DataBig Data
Big Data
 
A Roadmap to Data Migration Success
A Roadmap to Data Migration SuccessA Roadmap to Data Migration Success
A Roadmap to Data Migration Success
 
Ppt for Application of big data
Ppt for Application of big dataPpt for Application of big data
Ppt for Application of big data
 
Data Cleansing
Data CleansingData Cleansing
Data Cleansing
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Microsoft Cloud Computing
Microsoft Cloud ComputingMicrosoft Cloud Computing
Microsoft Cloud Computing
 
Big Data and Classification
Big Data and ClassificationBig Data and Classification
Big Data and Classification
 
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)Migrating to Cloud: Inhouse Hadoop to Databricks (3)
Migrating to Cloud: Inhouse Hadoop to Databricks (3)
 
Key Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramKey Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance Program
 

Viewers also liked

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
DataWorks Summit
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
DataWorks Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
DataWorks Summit
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...DataWorks Summit
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
DataWorks Summit
 
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesDataWorks Summit
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
DataWorks Summit
 
June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2DataWorks Summit
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
DataWorks Summit
 
Apache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataApache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataDataWorks Summit
 
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllFrom Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for All
DataWorks Summit
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
DataWorks Summit
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsDataWorks Summit
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceDataWorks Summit
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
DataWorks Summit
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
DataWorks Summit
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
DataWorks Summit
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application ResourcesDataWorks Summit
 
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBaseComparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicBig Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo Clinic
 
Hadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop SummitHadoop crash course workshop at Hadoop Summit
Hadoop crash course workshop at Hadoop Summit
 
Spark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop SummitSpark crash course workshop at Hadoop Summit
Spark crash course workshop at Hadoop Summit
 
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
Bigger, Faster, Easier: Building a Real-Time Self Service Data Analytics Ecos...
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data PipelinesAirflow - An Open Source Platform to Author and Monitor Data Pipelines
Airflow - An Open Source Platform to Author and Monitor Data Pipelines
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2June 10 145pm hortonworks_tan & welch_v2
June 10 145pm hortonworks_tan & welch_v2
 
large scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraphlarge scale collaborative filtering using Apache Giraph
large scale collaborative filtering using Apache Giraph
 
Apache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic DataApache Lens: Unified OLAP on Realtime and Historic Data
Apache Lens: Unified OLAP on Realtime and Historic Data
 
From Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for AllFrom Beginners to Experts, Data Wrangling for All
From Beginners to Experts, Data Wrangling for All
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay HadoopHadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
Hadoop Eagle - Real Time Monitoring Framework for eBay Hadoop
 
How to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and AnalyticsHow to use Parquet as a Sasis for ETL and Analytics
How to use Parquet as a Sasis for ETL and Analytics
 
Improving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of ServiceImproving HDFS Availability with IPC Quality of Service
Improving HDFS Availability with IPC Quality of Service
 
Apache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and TimeApache Kylin - Balance Between Space and Time
Apache Kylin - Balance Between Space and Time
 
Scaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value StoresScaling HDFS to Manage Billions of Files with Key-Value Stores
Scaling HDFS to Manage Billions of Files with Key-Value Stores
 
Sqoop on Spark for Data Ingestion
Sqoop on Spark for Data IngestionSqoop on Spark for Data Ingestion
Sqoop on Spark for Data Ingestion
 
a Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resourcesa Secure Public Cache for YARN Application Resources
a Secure Public Cache for YARN Application Resources
 
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBaseComparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
 

Similar to Evolution of Big Data at Intel - Crawl, Walk and Run Approach

Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
jdijcks
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataPentaho
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
Cloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
Cloudera, Inc.
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformEMC
 
EMC Pivotal overview deck
EMC Pivotal overview deckEMC Pivotal overview deck
EMC Pivotal overview deckmister_moun
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpbigdata sunil
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
CA Technologies
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
Swiss Big Data User Group
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
Wilfried Hoge
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
Cécile Poyet
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
Hortonworks
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emcTaldor Group
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
Toronto-Oracle-Users-Group
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
Denodo
 

Similar to Evolution of Big Data at Intel - Crawl, Walk and Run Approach (20)

Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Oracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analyticsOracle Big Data Appliance and Big Data SQL for advanced analytics
Oracle Big Data Appliance and Big Data SQL for advanced analytics
 
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR DataExclusive Verizon Employee Webinar: Getting More From Your CDR Data
Exclusive Verizon Employee Webinar: Getting More From Your CDR Data
 
Robin_Hadoop
Robin_HadoopRobin_Hadoop
Robin_Hadoop
 
Impala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on HadoopImpala Unlocks Interactive BI on Hadoop
Impala Unlocks Interactive BI on Hadoop
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platformPivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
Pivotal deep dive_on_pivotal_hd_world_class_hdfs_platform
 
EMC Pivotal overview deck
EMC Pivotal overview deckEMC Pivotal overview deck
EMC Pivotal overview deck
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 
Bridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven WorldBridging the Big Data Gap in the Software-Driven World
Bridging the Big Data Gap in the Software-Driven World
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
A new platform for a new era emc
A new platform for a new era   emcA new platform for a new era   emc
A new platform for a new era emc
 
Big Data: Myths and Realities
Big Data: Myths and RealitiesBig Data: Myths and Realities
Big Data: Myths and Realities
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 
Big Data
Big DataBig Data
Big Data
 
Data Virtualization: An Introduction
Data Virtualization: An IntroductionData Virtualization: An Introduction
Data Virtualization: An Introduction
 

More from DataWorks Summit

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 

Evolution of Big Data at Intel - Crawl, Walk and Run Approach

  • 1. Evolution of Big Data at Intel - crawl, walk and run approach Gomathy Bala | Director Chandhu Yalla | Manager & Architect Key Contributors: Sonja Sandeen, Seshu Edala, Nghia Ngo and Darin Watson IT BI Big Data Team
  • 2. Copyright © 2014, Intel Corporation. All rights reserved. Legal Notices This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. The content in this presentation is being shared Under NDA. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. * Other names and brands may be claimed as the property of others. Copyright © 2014, Intel Corporation. All rights reserved. 2
  • 3. Copyright © 2014, Intel Corporation. All rights reserved. Agenda • Intel IT Big Data Journey • Enterprise DW architecture • BI Big Data 3 yr Roadmap • Big Data Ecosystem Architecture • Platform Strategies & BKMs • Summary 3
  • 4. Copyright © 2014, Intel Corporation. All rights reserved. 2011 2012 2013 2014 2015 Intel IT Big Data Journey 4 Big Data & Analytics Strategy Production Online Telmap: 1st Use Case Preproduction Online Hadoop Evaluation IDH to CDH Hadoop 2.0 $176M BV Production: Security BI, Attribute Reduction System, ATM Ellipses Engine, IAH- Retail Analytics 6 Environments CDH 5.3 4 Use Cases in Preproduction 12 POC Use Cases 6 Use Cases in Production $290K investment $948/TB 3 Use Cases in Production Smart-What, Marketing- IAH, Incident Predictability $6M BV CDH 5.1 IAH – Cloud CRM In Production Enterprise Standards, Guidance, Processes for Platform & Capabilities 15 Active Use Cases | $290K + 10.5 HC Investment | Delivered $182M BV
  • 5. Copyright © 2014, Intel Corporation. All rights reserved. Big Data & Analytics Really Delivers! 5From 2014 – 2015 Intel IT Business Review – Annual Edition Kim's Video
  • 6. Copyright © 2014, Intel Corporation. All rights reserved. Any Data Source ERP In Memory Real-Time Data Platform CRM SCM SRM ECC BW ECCW Real-Time & Self Service Analytics Platform MDG NW Teradata Cloudera Hadoop Data Lake Reporting Tools Data Tiering Hot-Cold data Enterprise Data Warehouse Other Apps Custom Intel … NR T Predictive Analytics BPC BCS Cloud BI Saa S New Apps. Downstream Applications 2014-2017 Vision: Real-Time Enterprise 6
  • 7. Copyright © 2014, Intel Corporation. All rights reserved. FE Tools CLS/Proxy High speed data loader BigData • Machine Learning • Log Processing • Unstructured data Use Cases • High volume counter Analytics • Text Parsing/Mining • Strategic/Operational reporting • Interactive Reporting Use Cases • High Concurrent user analytics - Supply/Order • Mission critical analytics – Finance/HR SQL on Hadoop Enterprise Data Architecture with Hadoop and Other MPP DWH Current & Future Strategy Future Present EDWMfg Data A %ge of Traditiona l BI use cases IMT
  • 8. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data | 3-Year Roadmap 8 Big Data + AA Big Data + SSAA + Traditional BI Big Data + SSAA + Traditional BI 2015 2016 2017 Scalable and well designed Hadoop Platform  Evolve IMT + Hadoop  Data Lineage & Data Catalog  Streaming Capabilities  Advanced SQL on Hadoop  ACID semantics  Evolve Big Data + SSAA per ecosystem roadmaps  BC/DR  End to end enterprise features  Enterprise ready: OLAP and Traditional DW Hadoop is an open source framework designed for big data analytics. Hadoop is evolving rapidly, but it will still take a couple of years for it to mature and support “traditional bi” use cases. Legend Orange Text: Traditional BI Capabilities Green Text: Big Data/AA Capabilities  Security (RBAC, ITS/IRS)  Data Governance  Data Discovery  Self Service AA Framework  IMT + Hadoop  AVP + Hadoop  In-memory + Near real time capabilities  SQL on Hadoop
  • 9. Copyright © 2014, Intel Corporation. All rights reserved. Data Integration Big Data Platform – Ecosystem Architecture & Maturity 9 NRT/Stream Processing In-Memory Processing Processing Layer Batch Processing Data Virtualization Data DiscoveryAdv. AnalyticsAdv. Visualization Data Management Presentation Layer End User Data Steward Business Analyst Data Scientist DeveloperUser layer Auditor Machine Learning Analytical layer Statistical Numerical Time series Textual/Log Spatial Graph Textual/Log DB Hierarchy DBRelational DB Graph DB Storage Model Platform Virtualization Infrastructure Platform Management Network Management Systems Management Data Ingestion Continuous IntegrationDev Framework Security Source/Target APIs 3rd Party Drivers Ent. Scheduler Srvs Metadata MgmtWorkload Mgmt Middleware *Other names and brands may be claimed as the property of others. Columnar DB Data Egression Other Vendors offered capabilities Majority CDH offered capabilities Data Consumption Prescriptive Guidance Change Release GovernanceEngagement Service Management Training Support Processes
  • 10. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data Platform 10 Hadoop Project Sandbox – CDH 5.3 Multiple Instances Deployed on Intel Cloud & MyCloud environments. TTM to business: 2-3 Days Hadoop Pre-Production – CDH 5.3 10 data nodes | 399TB | 320 vcores Use cases in Dev/POC: 14 Hadoop Production – CDH 5.3 22 data nodes | 658TB | 704 vcores Use cases Live in prod: 7  Hadoop 2.0 architecture provides reliability, scalability & performance  High availability and scalability design  Well positioned to meet 2015 business use case requirements  Repeatable architecture for faster builds.  Capacity additions: Add data node. White boxes, Waterfall equipment or HP servers  TTM: Varies depending on HW (3 wks-2 months) Job/Workflow Management Data Node Data Node Data Node Data Node Data Node Name Node Resource Mgr Name Node Resource Mgr heartbeat, balancing, replication YARN Scale to meet business needs Gateway Nodes (NN hi-av) Gateway nodes Login (ssh) : AD authentication & authorization, access cluster, run HDFS commands, submit jobs, etc. Management Node Source Data DB Data Visualization Tools Data Movement/ETL EDW or Datamart DB data Unstructured Semi-structured
  • 11. Copyright © 2014, Intel Corporation. All rights reserved. • Skills and resources with time to ramp up • Starting small is ok. Focus on design and scalability for the platform. • Technical product evaluation  Stick with a distribution which is core Hadoop open source stack vs proprietary software • Security is a big deal to Intel, Big Data Security capabilities implementation is key focus • Methodology to understand the data is to use an iterative discovery method with technical, business and modeling teams. • Intel IT Big Data Journey benefited heavily from Cloudera partnership • Open source will play a big role in advancing Big Data capabilities and analytics BKM’s | Summary
  • 12. Copyright © 2014, Intel Corporation. All rights reserved. BI Big Data IT@Intel Resource Info 12 BI Big Data IT@Intel Resource Links: 1. Hadoop Migration Success Story: How Intel IT Moved to Cloudera 2. Mining Big Data in the Enterprise for Better Business Intelligence 3. Enabling Big Data Platforms and Solutions with Centralized Data Management 4. Integrating Apache Hadoop* into Intel’s Big Data Environment 5. Using a Multiple Data Warehouse Strategy to Improve BI Analytics To learn more: www.intel.com/bigdata
  • 13. Copyright © 2014, Intel Corporation. All rights reserved. Q & A 13
  • 14. Intel Confidential — Do Not Forward
  • 15. Copyright © 2014, Intel Corporation. All rights reserved. Backup 15
  • 16. Copyright © 2014, Intel Corporation. All rights reserved. Big Data Capability Catalog Hive HDFS MapReduceZookeeper Pig Mahout NetworkServers Storage Security OS Hi-AvEAM / AD Integration HDFS Compress WHIRR Hbase Governance Change Release Engagement Service mgmt. Prescriptive Guidance Training SQOOP JDBC Other DW Infrastructure Process Cloudera* Distribution of Hadoop (CDH) *Other names and brands may be claimed as the property of others. Storm Hcatalog ACCUMULOYARN SPARK Autosys SecureGIT Impala JDBC HiveODBC 3rd Party SW/Connectors Integration HUE SOLRIMPALA PARQUET DataFu Impala ODBC TDCH Oozie Kafka Sqoop DI Gateway Flume SFTP SMBClient Data Integration Camel Enabled PlannedWIP Avail. Now 1-3 Months 3-6+ Months Cloudera Manager* System Management Cloudera Navigator* Data Management Audit Access Control Discovery Explore Lineage Lifecyle DeploymentMonitoring Reporting Diagnostics Alerting Service Management Rolling Upgrades Config Rollbacks List includes only the capabilities planned for next 6 months. 16 Google Analytics SFDC Sentry
  • 17. Copyright © 2014, Intel Corporation. All rights reserved. i. Find Differences with a Comparative Evaluation in a Sandbox Environment ii. Define Your Strategy for the Cloudera Implementation iii. Split the Hardware Environment iv. Upgrade the Hadoop Version v. Create a Preproduction-to- Production Pipeline vi. Rebalance the Data Migration to Cloudera – 6 BKMs
  • 18. Copyright © 2014, Intel Corporation. All rights reserved. Building Block Strategy to Enterprise Security of Hadoop Q1’15: Perimeter access with LDAP + finer grain controls with Sentry. The second building block towards enterprise grade security design. Q2’15: Add Kerberos to enable more Hadoop components and further secure the platform 2H’15: Exploration starting, awaiting product and target to adopt in 2H’15 in Production. NowQ2’15 2H’15
  • 19. Copyright © 2014, Intel Corporation. All rights reserved. Hadoop Maturity & Evolution 19 MapReduce (batch data processing, cluster resource management) HDFS 1.0 (redundant, reliable data storage) Hadoop 1.0 YARN (cluster resource management) HDFS 2.0 (redundant, reliable data storage) Interactive (Impala) In-Memory (Spark) Batch (Map Reduce) Online (Hbase) Others (Search, Storm etc.) Graph Applications Run Natively In Hadoop + Scalable data storage and processing platform + Positioned for Batch processing workloads for Map and Reduce only + Apache Hive offers SQL like query language - Lacks reliability and stability - No support for low latency queries  Apache YARN allows you to run multiple applications in Hadoop and provides reliability, scalability and performance  Advanced Resource Management  Apache Hive offers a 50x improvement in performance for queries  Cloudera Impala to support low latency query requirements with SQL-92 and SQL- 2000 support  Data at Rest Encryption and Row Level/Cell Level Security planned  Data Streaming and Search Capability  GraphDB  Expanded Data Governance  IMT + Hadoop Integration  Improved Front End tool integration/support  Deeper Diagnostics for multiple components 2005 - 2012 2013 - 2014 Hadoop 2.0 HDFS (redundant, reliable data storage) YARN (cluster resource management) Batch (Map Reduce) Others (data processing) 2015 - 2017
  • 20. Copyright © 2014, Intel Corporation. All rights reserved. 2014 Intel IT Vital Statistics 20 >6,300 IT employees 59 global IT sites >98,000 Intel employees1 168 Intel sites in 65 Countries 64 Data Centers (91 Data Centers in 2010) 80% of servers virtualized (42% virtualized in 2010, goal of 75%) >147,000+ Devices 100% of laptops encrypted 100% of laptops with SSD’s >43,200 handheld devices 57 mobile applications developed Source: Information provided by Intel IT as of Jan 2014 1Total employee count does not include wholly owned subsidiaries that Intel IT does not directly support Copyright © 2014, Intel Corporation. All rights reserved.
  • 21. Copyright © 2014, Intel Corporation. All rights reserved. Big Data in the Industry 21 Recommendation Engine Fraud Detection Sentiment Analytics Behavioral Targeting Customer Experience AnalyticsMarketing campaign Analytics
  • 22. Copyright © 2014, Intel Corporation. All rights reserved. Learn more about Intel IT’s Initiatives at www.intel.com/IT Sharing Intel IT Best Practices With the World

Editor's Notes

  1. 2
  2. Stream Processing or Complex Event Processing -- where small chunks of data come at rapid intervals [smaller quantum, requiring transformation]. E.g., Sensory data from manufacturing floors. Batch Processing -- aggregated chunks of data, perhaps collected over a long span, waiting to be analyzed in one run. OLAP processing. E.g. Gold path analysis on intel.com In-memory processing -- running interactive analytics over large batches of summary/factual data by leveraging the memory as the pre-emptive transient store. E.g. SQL aggregates/operational metrics from OLAP process Machine Learning -- class of unsupervised and supervised learning techniques destined for a decision support or an expert system Unsupervised Learning (No "response" variable. Just observations) -- tools Mahout Clustering -- E.g. customer segmentation; clustering users by age, ethnicity, gender, income standards, geo, profession, and buying propensity to new form factors. Frequent pattern mining -- E.g. co-branding strategies. People buying realsense cameras also downloading Intel XDK kits within 7 days of purchase. Supervised Learning [predicting a "response" variable when encountering a new "condition". The response patterns learned from prior training sets of course…] -- H2O Regression -- E.g. YoY growth for DCG Xeon co-processor shipment at 16% between 2011 and 2014. This year, we will ship 36 million units; current inventory levels at 23 mill Classification -- E.g. Customer (Widgets Inc) responses to email automation and phone calls favorable in the last 3 months. Last upgrade was 2 years ago. The likelihood of an enterprise upgrade is "high". Textual -- class of algorithms that "derive" meaning from what is otherwise flat left-to-right-top-to-bottom "text". Shred sentence structure into nouns-verbs-adjectives-adverbs; count entities and turn "text" into "terms" [features]. Encode the feature into a term-document or a "graph" representation so traditional analytics -- machine learning (supervised and unsupervised techniques may be applied). Lucene, SOLR is useful for indexing/tokenizing text; NLTK or Stanford parsers are useful to "tag" terms to class of linguistic tokens such as nouns and verbs. E.g. identify service management tickets that entail Windows 8.1 issues. Log -- Logs are textual in syntax but do not possess linguistic rigor. Such contents are useful just indexing as is and searching. The machines do not "decode" meaning. Humans synthesize and add logical rules when the content is surfaced back via a search interface. E.g Logstash used to monitor errors in log4j logs of Hive jobs. Spatial -- Class of problems that deal with spatial layout of entities. E.g. every die is sacred. Rationing and allocating sub-systems on a die via simulatory techniques to optimize wastage loss and maximize "premium" quality. Or optimizing lithographic etches that minimize orthogonal cuts by employing space-filling heuristics. Statistical -- class of problems that infer patterns from data that exhibits stochastic characteristics -- e.g. identifying aggreations like stddev, min, max, avg yields of a graphics die; and performing outlier analysis. Numerical -- class of problems that deal with data that exhibits deterministic characteristics -- e.g. Taguchi methods or iterative monte carlo methods that search and seek global minima/maxima. Genetic algorithms, deep learning methods/neural networks etc. Time-series -- class of problems that deal with data that exhibits stochasticity, but also exhibits temporal/seasonal resonance patterns. E.g. noise-cancellation filters that employ feedback loops; or predicting stock-price movement etc Graph -- class of problems that compute statistics about entities connected to other entities. E.g. computing pagerank/link-popularity of a web page, congestion patterns of a traffic flow, sewage system planning etc Storage Models Textual/Binary -- No DDL. All data is stored row-first, column-next where there is only one BLOB column per row. E.g Zip files, MainFrames Relational -- well specified DDL, but data is stored row-first [co-located fields of a row]; locking semantics at row level. Yields faster entity retrievals but poorer compression ratios when heterogeous fields co-exist in data. The index is built for row-offsets; e.g. -- Oracle, MySQL Columnar -- well specified DDL; but data is stored column-first [all first names are co-stored in ine file, last-names co-stored in another etc]; locking semantics at cell level. Yields faster aggregates [min, max on a single field], better compression ratios [because all fields of a columnar file are a homogenous type]. But lacks atomic consistency because a record change transpires into mutations in multiple "columnar/co-location" files. E.g. HBase, Cassandra Hierarchy -- well specified structural definition. Mostly follows a denormalized parent-child taxonomy. All fields relevant to a record are stored as a "hierarchic document" ala XML or JSON document. Yields a great consistency model because the grain of the data is a "document". Any mutation will always mean a complete denormalized update of the full document -- json or xml. E.g. MongoDB, CouchDB GraphDB -- native adjacency property graph that stores entities as "vertices" of a graph, relations as "edges", and attributes as "properties". Since indices are combinatorially developed on all -- entities, relations, and attributes -- adjacency mining, filtering, mutations are performant and atomic. E.g. Neo4J, TitanDB
  3. SLIDE PURPOSE: Who Are We … we are the IT organization at Intel (IT@Intel) .. Core background information on Intel IT and our mission/goals/capabilities Key Messages: We are the IT organization Inside Intel’s Business. Our organization is large, diverse multi-national enterprise with a wide variety of operational requirements and needs Our Vision is to accelerate Intel’s quest to connect and enrich the lives of every person on Earth by the end of the decade. Our Mission is to Grow Intel’s Business through Information Technology for Intel by facilitating IT Consumerization, delivering IT efficiency and continuity through Cloud Computing, increase employee productivity through seamless connectivity and Security, provide significant business value through Business Intelligence initiatives and drive increased collaboration through Social Computing. Review some of the Information/Key Stats shown here. Size and Location: 6,334 IT employees … Supporting over 98,000 employees. Note: Intel IT only reflects the number of employees we support directly (we exclude Intel employees who support wholly owned subsidiaries) Remote Support is Vital. Data Centers and Facilities: 59 Data Centers worldwide (down from 142 in 2007) Need to confirm this data[~55,000 servers (down from 100,000 in 2007) consuming a large electrical and power/cooling load (roughly 55MW total power) Our Data Centers also support 300M email messages (per month), >2,183 Terabytes WAN traffic (per month)] and store 45 petabytes of raw storage capacity Employee / Client Technology: Support over 147K devices (note >1 per employee ratio .. This ratio is growing with support of BYO and custom technology delivery to meet business needs) >We have been 80%+ mobile PCs (laptops) as our core employee technology standard since 1997 We have been actively evaluating, enabling and supporting many companion devices for improved productivity and flexibility Need to add what we are doing with tablets - Janet >43,200 Handhelds (variety of form factors (phones/tablets) vendors, software and solutions)  the majority of these devices are now EMPLOYEE OWNED Intel IT continues to embrace consumerization of IT and mobile applications are a major component of our strategy. We have delivered 57 mobile apps and counting to support new form factors. Our goal is to deliver a seamless, secure experience for our employees across a wide spectrum of devices by putting user experience first. Enabled Leadership Business Capabilities: Enable a top 25 supply chain (recognized by Gartner, previously AMR Research) . #25 in 2009, #18 in 2010, #16 in 2011, #7 in 2012 and #5 in 2013 key focus for IT innovation … delivered solid business results and competitive differentiation for Intel Additional fun facts … 100% Intel laptops support SSD and 100% are deployed with disk encryption