2. Agenda
2
12-12:30pm Registration and Lunch
12:30-12:40pm Welcome and Introductions -- Art Hansen
12:40-1:45pm Keynote Presentation -- Chris Ward, Brian Vaughan, James Bigger
1:45-2:20pm Hadoop in the Real World by MapR -- David Feldman
2:20-2:30pm Break
2:30-2:45pm Cisco Unified Computing System Rack Mount Servers for Big Data – Wade Ison
2:45-3:30pm Big Data Brainstorm Breakouts
3:30-4:30pm Refreshments, Q&A Session, and Conclusion
4:30pm Raffle Drawing for iPad
3. Big Data as a Competitive Strategy
Harvard’s Michael Porter:
1. Cost Leadership Strategy (Wal-Mart)
2. Differentiation Strategy (Southwest)
3. Innovation Strategy (Apple)
4. Operational Effectiveness Strategy (UPS)
5. Technology-based Competitive Strategy
4. What do we have that makes us different?
• Custom Apps
• Process (Workflow)
• Big Data
• People
• Culture
4
5. Big Data’s Financial Benefits
Gartner predicts that “Big Data will deliver transformational benefits to enterprises
within 2 to 5 years, and by 2015 will enable enterprises adopting this technology to
outperform competitors by 20% in every available financial metric
6. Goals for Today:
• High ROI less than a year
• Must be applied to things that
are important to the business
• Use of multiple patterns
encouraged
• New ways of correlating data
that was formally not
correlated
• Remember Big Data patterns
usually require scale
• Understand Big Data Major
Building Blocks
• Learn the major patterns
• Understand how to introduce Big
Data into the enterprise in practical
ways
• Identify a solid use case for Big
Data
Tips for Winning:
7. WWT Big Data Leadership Team
20 years of management
consulting and
entrepreneurial experience.
Expertise in financial services,
insurance and telecom. Prior
consulting experience with
Opera Solutions and A. T.
Kearney.
Ph.D. in Physics from Oxford
University.
James Bigger
Principal
Consultant
15 years in management
consulting, analytics and
software experience.
Expertise in healthcare and
insurance. Prior experience
with Opera Solutions, Mitchell
Madison Group and
Broadlane.
Ph.D. in Physics from Stanford
University.
Brian Vaughan
Principal
Consultant
20 years in management
consulting and executive
leadership. Expertise in retail,
marketing, hospitality &
financial services. Prior
consulting experience with
Opera Solutions and The
Boston Consulting Group.
BA from Princeton University,
MBA from the University of
Virginia Darden School of
Business.
Chris Ward
Principal
Consultant
Over 20 years of experience
in a range of IT and security
disciplines. Responsible for
deploying large, secure,
Hadoop-based platforms for
the U. S. Government. 10 year
of international experience
implementing networking and
virtual data center
environments
Undergraduate degree from
AIU.
Matt DuBell
Principal Systems
Engineer
Over 7 Years of experience in
management and analytics
consulting. Led engagements
in telecom at Opera Solutions.
Previous experience
performing predictive analytics
for NASA and USAF at The
Aerospace Corporation.
Ph.D. in Mechanical
Engineering from Pennsylvania
State University.
Yoni Malchi
Engagement
Manager
18 years of analytics and
software development
experience. Expertise in
financial services, healthcare,
insurance, retail and marketing
science. Prior analytics
development experience at
Opera Solutions, FICO and J.D.
Power and Associates.
Ph.D. in Physics from Stanford
University.
.
Jason Lu
Chief Scientist
Over 7 Years of management
consulting and entrepreneurial
experience. Expertize in
financial services, travel, and
retail sectors across US and
Europe. Led Big Data strategy
and analytical engagements at
Opera Solutions.
MSci in Astrophysics from the
University of Cambridge.
Jamie Milne
Engagement
Manager
Over 8 years of experience in
analytics consulting and
delivery management. Ran
engagements in wealth
management, corporate
security, marketing, education
and transportation at Opera
Solutions and IBM Global
Business Services.
BS in Mathematics from
Georgetown University.
Chris Infanti
Engagement
Manager
Over 20 years of experience
in enterprise datacenter,
building innovative solutions
in Big Data, storage, HPC,
virtualization, data migration
and enterprise applications.
Formerly lead architect for
NetApp's Big Data solutions,
and led the development
of the FlexPod select
solutions.
B.S. in Electrical Engineering.
Prem Jain
Principal
Architect
8. Volume, Variety and Velocity of Data are Exploding
The production of data is expanding at an astonishing rate. Drivers include the switch from analog to
digital technologies and the creation of structured and unstructured data by individuals and companies
via social media and the Web
8
• Every 60 Seconds:
- 98,000+ tweets
- 695,000 status updates
- 11 million instant messages
- 698,445 Google searches
- 168 million+ emails sent
- 1,820TB of data created
- 217 new mobile web users
• The need to process more data
faster to respond to dynamic
business trends has brought new
requirements for database
architectures
• We believe the industry stands at
the cusp of the most significant
revolution in database and,
therefore, application architectures
in the past 20 years.
VelocityVarietyVolume
0
10
20
30
40
2010 2015 2020
ZB
Enterprise Managed Data
Enterprise Created Data
0
10
20
30
40
50
60
70
80
2009 2010 2011 2012 2013 2014
Unstructured data storage
Structured data storage
EB
Source: IDC, Gartner, EMC, Worldwide File-Based Storage 2010-2014 Forecast
9. Vendor Landscape Is Crowded and Growing
Data Sources
& Capture
IT
Infrastructure
Data Management
& Integration
Analytics Platforms
and Solutions
Analytics Services and
Support
Data Vendors Infrastructure Vendors
Open Data Platforms
Proprietary Data Platforms
Extended infrastructure +
data platforms
Systems
Integrators
Specialized End-to-End Solutions
Analytics Service Providers
Vertical Analytics Solutions
10. Distributed File System
and Processing Language
Characteristics
• Parallel
storage/processing
• Flexible programming
model
• Horizontal scaling
• Batch processing
Non-relational Key-Value
Database
Characteristics
• Fast read/write
• Real time query
• Horizontal scaling
• Simple programming
model
• Dynamic schema
Column-Oriented
Analytics Database
Characteristics
• Relational
• Efficient compression
• Optimized for fast
read of many/all
records
In-Memory Database
and Processing
Characteristics
• Relational
• Random Access
• Extremely Fast
Enablement / Uses
• Complex Event
Processing
• Real Time Analytics
• Potential to use a
common database for
transactions and
analytics
Enablement / Uses
• Pre-processing of data
for analytics
• ETL for transforming
unstructured data to
structured
• Data summarization
Enablement / Uses
• Real-time ingest
• Rapid retrieval
• Input to MapReduce
Enablement / Uses
• On-Line Analytics
Processing (OLAP)
• Data storage and
retrieval for advanced
analytics
Foundational Emerging
Key Big Data Technologies
10
Hadoop NoSQL Columnar In-Memory
11. The Big Data Software Stack
The big data ecosystem includes open source and proprietary distributions that span the stack from
ingest through analytics
11
JobFlow
USER/MACHINE WORKFLOW
Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured
Flexible interfaces:
TRANSFORM
ANALYTICS
DATABASE
ANALYTICS
ACCESS/
QUERIES
INGEST
FILE SYSTEM/
DATABASE
MANAGEMENT
Columnar
In Memory
Parallel RDBMS
EMC/PIVOTAL HD /
GREENPLUM
HP/VERTICA/CLOUDERA
ORACLE BIG DATA
EXADATA/EXALYTICS
IBM INFOSPHERE
BIGINSIGHTS
SAP HANA
TERRACOTTA BIGMEMORY
ZOOKEEPER
CLOUDERA
HORTONWORKS
MAPR
PIVOTALHD
HADOOP
CASSANDRA
HBASE
MONGODB
TEREDATA
NETEZZA
GREENPLUM
VERTICA
OLAP
Natural Language
Custom Analytics
Custom API’s
SQL
OPEN SOURCE
COMMERCIAL
OPEN SOURCE
Fast,
Scalable
Provisioning
Maintenance
Flexible,
Compressed,
Fast Read
Optimized
for high vol
reads
Interfaces to
accept data
Real Time
& Batch
HDFS
NoSQL
- Document
- Key-Value
- Wide Column
SQL
PIG
HIVE
R
PYTHON
SAS
SPSS
Batch
Streaming
SQOOP
FLUME
SPLUNK
TALEND
LAYER PROPERTIES OPTIONS EXAMPLES OF PRODUCTS INTEGRATED OFFERINGS
MapReduce HADOOP
Parallel,
Distributed
ODS
Data
Warehouse
Call
Center
Server
Logs
Financial Demographic
OOZIE
DATA
ACQUIRE
ORGANIZE
ANALYZE
DECIDE
SOLUTIONS
MICROSTRATEGY
BUSINESS OBJECTS
COGNOS
ORACLE OBIEE PLUS
12. Technology: Expanding the Traditional Stack
Big Data requires a technology stack that leverages existing infrastructure and introduces new
technology for distributed parallel processing
12
Queries (SQL)
Relational Databases
Monolithic Hardware
(few CPUs and network
computers)
“Shared Disk/Memory”
Architecture
(centralized processing)
Direct Record Access or Queries
Monolithic Hardware
(few CPUs and network
computers)
“Shared Disk/Memory”
Architecture
(centralized processing)
NoSQL
Database
Parallel
Relational
Database
Distributed
File
System
High-Performance
Traditional
Relational
Database
MapReduce Programs
Distributed Hardware
(multicore CPUs, multiple computers
connected via high-performance network)
“Shared Nothing” Architecture
(distributed parallel processing)
INTERFACE
DATABASE/
DISTRIBUTED
PROCESSING
FRAMEWORK
HARDWARE
TRADITIONAL RELATIONAL
DATABASE STACK
STACK FOR THE NEW DATA
FOUNDATION
Source: IDC, CSC, Gartner
13. Business
Need
Class of
Analytics
Analytics: Translating Business Needs to Math
Regardless of industry, many use cases translate into a limited class of “math problems” that big-data
platforms (unlike transactional platforms) are optimized to solve at scale
13
Method
Analytics
Ready Stack
Hardware & Software
• Parallel
• Distributed
• Shared Nothing
• Columnar
• NoSQL
• In-Memory
• ARMA
• Decision Trees
• Genetic Algorithms
• Graph Theory
• Kalman Filter
• KNN
• Linear Regression
• Logistic Regression
• Matrix Factorization
• Monte Carlo
• Neural Networks
• Sorting
• Survival Time Analysis
• Visualization
• Regression
• Classification
• Clustering
• Forecasting
• Optimization
• Simulation
• Sparse Data Inference
• Anomaly Detection
• Natural Language
Processing
• Intelligent Data
Design
• Recommendation
• Risk Scoring
• Pricing
• Capacity Planning
• Cost Reduction
• Matching
• Retrieval
14. Defining The Business Opportunity Is The Starting Point
The power of “Big Data” lies in bringing together data in a timely fashion from sources within and external
to the enterprise - structured and unstructured - to create a complete view of critical business issues,
therefore enabling advanced analytics to unlock key insights that drive significant business value
14
Outcome
Analytics
Data
Technology
Clearly defined use cases with the potential to deliver
significant value by distilling vast data into new, previously
unknowable intelligence
Advanced machine learning techniques to analyze
data and mine for insights to drive critical business
decisions
Structured or unstructured, internal or
external, requiring new methods of
storage/integration
Emerging/new technology stacks
using scalable, distributed
architectures
15. Telematics is Transforming Auto Insurance
Big Data Use Case
Combine driving behavioral with actuarial
data to create individualized risk models that
more accurately predict claims losses that
enables risk adjusted pricing to gain market
share and increase margins
Business Imperative
To gain profitable market share, insurance
companies need to offer the lowest “risk
adjusted” pricing possible to consumers
Methods
• KNN
• Linear Regression
• SVD
Class of Analytics
• Regression
• Clustering
• Anomaly Detection
• Sensors to capture
routes, miles driven,
time of day, braking
patterns, driving speed
• Geospatial maps
tied to database
layers
Science & Data
HDFS
MapReduce
NoSQL
Data W/H
In database
Analytics
Data Marts
Technology
Data
15
C a s e S t u d y
I n s u r a n c e
16. Predictive Maintenance
16
FTP over
MESH
Data Logger
Data Logger
• One per truck
• (Logs, Sensors, OEM
Alarms, VIMS Service
Port)
Equipment
Maintenance
Dispatch &
Operator
Fuel, Oil
Analysis, etc.
Hours
1
Urgent Component
Problem
2 Critical Sensor Problem
Stratifying Alarms
3
Important/Not Urgent
Component/Sensor Problem
4
Not Important Component
or Sensor Problem
5 Noise - Ignore
Data Logger
Data Driven Preventative Maintenance
Data/Analytics driven timing for preventative maintenance
(e.g., oil changes) on individual Trucks1 Urgent Component
Problems
e.g., Engine, Transmission,
Differentials, Torque
Converters, Final Drives
Major Component Failure Model(s)
Project Scope
• 252 Trucks – 200
sensors per truck
• 7 Mine sites
• 10,000
readings/second
Data Integration
• Integrating 15+ siloed data sources
in multiple file formats
• 10 Terabytes of data
• 3 year historical data ecosystem
Business Impact: Higher equipment up-time; reduced critical component failure; better
preventative maintenance and increased labor productivity
C a s e S t u d y
M i n i n g
17. Data Warehouse Augmentation: Value Proposition
Augmenting the Data Warehouse with a less expensive Hadoop system will allow companies to free up valuable
space on their DW systems to run faster queries and analysis, whilst storing large volumes of their data universe
WWT Hadoop Appliance
Traditional Data
Warehouse
Full Data Universe
CRM Social
Media
Billing
Web logs
Payments
Scheduling
Cold Data Warm Data
Hot
Data
2. About 50% of data that is brought
into a typical Data Warehouse
system is rarely accessed: Cold Data
3. About 80% of the queries and
reporting performed on Hot Data
does not need to be at DW speeds
1. A significant amount of data is
thrown out during the ETL process
that may be valuable in the future
Traditional Data Warehouse
Full Data Universe
CRM Social
Media
Billing
Web logs
Payments
Scheduling
Cold Data
Warm
Data
2. Store Cold Data in Hadoop, taking
advantage of lower cost per TB
− Teradata: $17K
− Hadoop: $2K
3. Continue to take
advantage of DW agility
and speed in real-time
analysis and querying
1. Utilize additional Hadoop-based
storage to store full data universe
− Files can be stored in natural
format
Warm
Data
Hot
Data
Potential jumping-off point for Big Data Business Impact project
CURRENTPROPOSED
18. Integrating Many Data Sources To Provide Lift
Purchase
History
Service
History
Web
Data
Campaign
Metadata
Destination
Word clouds
Partner
Hotels
Profiled 100+m
transactions for
millions of customers
Linked data for
millions of customer
interactions and
service records
Analyzed billions of
page-views for
behavioral indicators
Extracted meaning
from tens of
thousands of email
campaigns
Mapped destinations
to key “feature tags”
which explain
selection
Geotagged tens of
thousands of partner
hotels by
understanding free
text description
C a s e S t u d y
G l o b a l A i r l i n e
18
Time
Nov
2010
Dec
2010
Jan
2011
Feb
2011
Mar
2011
Apr
2011
May
2011
Jun
2011
Jul
2011
Aug
2011
Sept
2011
Hotel ExperienceFlight Car Rental Holiday
Customer Travel Profile
ID= xxxx
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
Uptake%
% Offered
Lift
0%
20%
40%
60%
80%
100%
0% 20% 40% 60% 80% 100%
Uptake%
% Offered
Time
Nov
2010
Dec
2010
Jan
2011
Feb
2011
Mar
2011
Apr
2011
May
2011
Jun
2011
Jul
2011
Aug
2011
Sept
2011
Hotel ExperienceFlight Car Rental Holiday
Customer Travel Profile:
ID= xxxx
19. Typically social media tools focus on monitoring past/present activity. Predictive analytics allows users to identify
important threads and intervene early, shifting the focus to future activity
• Details on particular themes or attributes
• Forecasts trend and a mechanism to intervene
in attribute that are going viral
• Word cloud shows ongoing buzz and sentiment
• Tabular view shows emerging themes and
sentiment, virality score and recommended
time-window for action
Social Media Analytics
C a s e S t u d y
C o n s u m e r
E l e c t r o n i c s
19
20. Curriculum
Management
Engine
Curriculum
Management Engine
We designed a recommendation engine that generates a dynamic set of recommendations on a daily basis
(over 1MM/day, from sales force handhelds, website, call centers) that learns and adapts to increase its
ability to change behaviors over time through a Curriculum Management Engine
Plan for Smith Household:
Total Wallet = $600
Aspiration: Achieve 60% share
of wallet up from 40%
How:
• Habituate Pizza and Ice
Cream and Increase
Frequency
• Move Into Dinner Entrees &
Sides
• Move Into Higher Margin
Breakfast Entrees
• Increase Frequency of
Purchases
VISIT #1:
1. Haven’t Bought In A While:
2. Others On My Route Like:
3. Would You Like Another?:
4. Just for You -- $1.00 Off
Household
Response
VISIT #2
1. Would You Like Another?
2. Others On My Route Like:
3. No pizza; not yet consumed
4. Just For You
Nature of
Recommendations
• Individuated Offers –
Especially for You
• Cross-Sell/ Up-sell –
Based on latent needs
• Reminders – Haven’t
bought in a while
• Trials – Never tried but
similar people like it
• Promotions – Being a
loyal customer
Recommendations for Grocery Retailer’s
Customers Delivered $100 million p.a. in EBIT
C a s e S t u d y
F o o d G r o c e r
21. Using Internal and External Data with Advanced
Analytics for Site Selection
• Comprehensive performance data
– Fronts store / pharmacy sales
– Customer and patient demographics
– Local area demographic
• Web Scraping and Text Analytics
– Neighborhood business profile
– Competitor performance
– Healthcare alternatives (ER, Urgent Care, PCPs)
• Non-linear, multivariate predictive models
– Linear/Logistic Regression
– Decision Trees (CART)
– Random Forest
– Gradient Boosting Machine
– Neural Networks
• Incorporation of all data, including variables
usually viewed as “qualitative”
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
R = 0.75
M o d e l P e r f o r m a n c e
Predicted Patient Volume
Actual Patient Volume
+17%
Model
Recommendation
0.83
Original
Expansion
Plan
0.71
Potential Volume
I m p a c t
C a s e S t u d y
R e t a i l P h a r m a c y
22. Designing Appropriate Reference Architectures
A reference architecture is a specific set of software and hardware components that together comprise
an Analytics-Ready Infrastructure
22
USER/MACHINE WORKFLOW
Visualization Forecasts Pricing Reports Alerts Scores Offers
NETWORK
LAYER DESCRIPTION EXAMPLES OF PRODUCTS
DATA
FILE SYSTEM/
DATABASES
Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured
ODS
Data
Warehouse
Call
Center
Server
Logs
Financial Demographic
CUSTOM ANALYTICS
ANALYTICS TOOLS
ANALYTICS DATABASES
• Flexible, Compressed, Fast Read
• Columnar, In Memory, Parallel
RDBMS
• High-level programming languages
with packaged analytical modules
• Can be either general purpose or
industry/function specific
• Services
• Advanced models
• Parallel, Distributed
• HDFS or NoSQL
• Interfaces to accept fast and
varied data
“Analytics-Ready
Infrastructure”
COMPUTE
STORAGE
INGEST
• 10Ge, low latency
• Commodity, rack mount
• Purpose built servers
• Internal JBOD, Direct Attached,
Network
SAS R PYTHON SPSS
VERTICA GREENPLUM TERADATA NETEZZA
EXADATA SAP HANA
CLOUDERA MAPR HORTONWORKS PIVOTALHD
MARKLOGIC DATATACTICSORACLE NOSQL
FLUME SQOOP TALEND VELOCIDATA
UCS-C240 UCS-C460 HP 380P HP SL4540
UCS 6200 NEXUS 2200 HP 5800 DELL FORCE10
JBOD SATA JBOD SSD E-SERIES ISILON
23. Deploying new technologies
and combining with existing
architecture
• How do we create an effective
integrated Big Data stack?
• What new technologies do we
need and how do they fit
together?
Organizing for success
• Where does Big Data fit?
• What belongs in the BUs vs.
centralized?
• Who is responsible for data
integrity?
• Where do we find the critical
resources needed to deliver
Big Data solutions?
Navigating a crowded and
evolving vendor landscape
• How do we separate
marketing hype from reality?
• Who should we use? Who can
we trust
Defining the business value
proposition
• What problem/opportunity
are we pursuing?
• What is the value that can be
created?
Four Major Big Data Challenges Facing Most Companies
In our meetings with customers, four issues are consistently brought up as a major challenges related to
creating a big data capability that can effectively support the business units
23
Key
Big Data
Challenges
24. Dual Approach to Delivering Big Data Solutions
WWT offers customers both strategic and tactical approaches to derive value from the application of Big Data
analytics and technology
24
• Strategic Roadmap
− Big Data Strategy
− Use Case Design
• Use Case PoC
− Analytics Development
− Workflow Integration
• Data Warehouse Augmentation
− ETL Offload
− Data Lake Creation
• SAP HANA Implementation
• Big Data Stack Build / Optimization
• Production Support & Sustainment
BIG DATA BUSINESS
IMPACT
Extract value from data to drive
multiple Use Cases
BIG DATA TECHNOLOGY
OPTIMIZATION
Accomplish data tasks, faster, cheaper,
better
25. EXAMPLE SCALE OUT HARDWARE
• Multiple Nexus 6000/ 7000
Series switches
• 5 – 50 Big Data racks
• Cisco SAP HANA scale-out
(e.g. 8-16 UCS-B200)
• Software scale-out
EXAMPLE STARTER KIT:
Cisco SAP HANA Medium Appliance (2 UCS-C460)
• Big Data Solution Stack:
o 2 UCS 6296PP
o Each Big Data rack:
2 Nexus 2232PP
8-16 HP DL380 or SL4540, UCS-C240, etc.
o Initially: 1 – 2 racks
o Software: MapR, E.
Service and Solution Offerings
25
• Develop a roadmap for
implementing Big Data
- Use case exploration
- Data Governance,
Infrastructure and
Analytics ownership
• Define high impact use
cases
• Design and test
appropriate reference
architectures
Plan Design Pilot Scale
WWT
Offerings
Indicative
Infra-
structure
• Create detailed
description of selected
pilot use cases
- Analytics
- Workflow
integration
• Test various reference
architectures
• “Stand-up” reference
architecture
• Design the pilot
- Success criteria
- Timeline
- Scope
• Identify and prepare
data
• Build analytical models
• Design workflow
• Implement, manage and
monitor
Analytics-Ready Infrastructure Solution Development
• Implement design
changes from pilot
learnings
• Invest in software
development as
necessary to improve UI
• Prepare ETL process for
scale
• Build out infrastructure
as required to support
rollout
4. Production Support
• Operationalizing POC
• Infrastructure Sustainment
• Training
• Ongoing support
3. Proof of Concept
• POC design
• Analytical models
• Customer data loaded,
processed and analyzed
1.Strategic Roadmap
• Use case definition
• Organizational alignment
• Big Data Architecture high
level design
2. Big Data Stack Build
• Detailed design Big Data
architecture and BOM
• Procure, configure and
deploy Big Data stack
26. Advanced Technology Center (ATC)
COLLABORATIONENTERPRISE NETWORKS SECURITY DATA CENTER
A highly collaborative, ecosystem to design,
build, educate, demo & deploy advanced
technology solutions for our customers &
partners
Hands-on Access to over $50M in Equipment
• Point Product Demos
• Tech. Training Sessions
• EBCs / ATC Tours
• Tech Days Demos
• Customer Proof of Concepts
• Reference Arch. Dev.
• Product Training / PS
• Version Upgrade Testing
• Version Upgrade Testing
• Strategic Ref. Arch. Demo
(RAD)
• Product Comparison –Func.
• Product Comparison – Perf.
• Customer Access to Lab
• Customer Environment
• Workshop Demos
• Early Field Trials / Beta Code
• Certification
• Next Generation
Networking
• Nexus (7K, 5K, 3K & 2K)
• Virtual Networking
(Nexus 1000v)
• OTV, LISP, Fabric Path
• Layer 2 Extension
• DR/BC Networking
• BYOD (Bring Your Own
Device) & Secure
Mobility
• Jukebox
• ISE & RSA
• ASA 1000v
• VSG (Virtual Security
Gateway)
• Cyber Security Solutions
• Unified
Communications
• Tandberg Video
• VXI (View &
XenDesktop)
• WebEx, Call Center &
Collaboration Solutions
• Phones, Backpacks &
Soft, Phone Clients
• Telepresence &
Business Video
• Vblock, FlexPod &
CloudSystem Matrix
• EMC & NetApp Storage
• vSphere / XenServer
• vCloud Director
• VDI (View /
XenDesktop)
• Cisco CIAC & BMC CLM
• EMC’s UIM & Cloupia
• FAST MDC (Mobile Data
Center) Solutions
26
27. ATC Big Data Functions: Overview
Three functions of the ATC have been identified, which will support Sales (and other) processes
27
Function Description Usage
Proof of
Concept
• Test customer solutions prior to full onsite
implementation, e.g.
− Run Use Case analytical models and
architectures on Big Data machines
− Create Big Data hardware/software stack,
potentially with client data
• Mid-term project basis, to
provide an environment
for customer, based on a
running engagement
Technology
Comparison
• Compare Big Data solutions to provide insight
into strengths and weaknesses of each
• Run “bake-offs” to gauge how well a full
solution can be solved using certain
components
• To test generic POCs, may
be customer-driven
• Inform Big Data Team on
best solutions
Field Demo • Showcase Big Data capabilities by hosting
demos of WWT PoCs and analysis
− Run Use Case analytical models and
architectures on Big Data machines
• Tool for sales calls and
EBCs
28. Big Data Environment Set-up: ATC Reference Architectures
28
Four analytics-ready infrastructure stacks have been developed in the ATC to showcase Big Data technologies
DATA
Enterprise Structured Enterprise Unstructured 3rd Party Web/ Unstructured
ODS
Data
Warehouse
Call
Center
Server
Logs
Financial Demographic
STORAGE
REFERENCE
ARCHITECTURE 1
NETWORK
FILE SYSTEM/
DATABASES
ANALYTICS TOOLS
ANALYTICS
DATABASES
COMPUTE
INGEST
REFERENCE
ARCHITECTURE 2
HP Internal Local
Storage
UCS – NetApp Direct
Attached Storage
UCS 6296UP NEXUS 2232PP
UCS-C220M3
REFERENCE
ARCHITECTURE 3
UCS – Isilon Network
Storage
UCS 6296 NEXUS 2200
HAWQ HBASE
PIVOTALHD
UCS-C240
MICROSTRATEGYMICROSTRATEGY
REFERENCE
ARCHITECTURE 4
SAP HANA
HITACHI
UCS B BLADES
JBOD SATA
HORTON
IMPALA
NEXUS 2200
HP DL 380
HBASE
R PYTHON R PYTHONR PYTHON
HITACHINETAPP E5460 ISILON
VELOCIDATA VELOCIDATA VELOCIDATA
MAPR
CLOUDERA CLOUDERA
GEMFIRE
IMPALA HBASE
JAVA JAVA JAVA
In ProcessCurrent In Process
SPLUNK SPLUNK SPLUNK
HORTON MAPR HORTON MAPR
CLOUDERA
SAP HANA
VELOCIDATA SPLUNK
.“transformational benefits,” however, will be delivered to very few enterprises according to another Gartner prediction, from December 2011: “Through 2015, more than 85 percent of Fortune 500 organizations will fail to effectively exploit big data for competitive advantage.”
A key understanding of Big Data Analytics is that it doesn’t replace BI or EDW’s that are in use today. It is imperative that organizations include their historical structured data in their analysis. It is the ability to analyze across multiple data sets that delivers new understanding.
A big driver of Big Data is the transformation of the data from the structured data we have historically analyzed to the data that is emerging today. Often it is real-time data generated via sensors or click stream. Semi-structured data is data that can often be transformed into structured data if manipulated in the correct manner. Examples include log files or emails. Unstructured data is one of the fastest growing data types and often the richest in information that can be gleaned from it. Examples include video, tweets and other social media and machine generated data.
Because of the volume and complexity of the data itself, the preferred approach for processing big data is in clustered computing environments and Massively Parallel Processing (MPP), which enable simultaneous, parallel ingest and data loading, and analysis.
Architectural
Independent
Multivendor
Tony Berg will cover more in-depth