The New Model

Slideshare Copy
•  Presented at the Leaders Building Leaders summit
•  April 3, 2015 @ Union College
https://www.ucollege.edu/academics/business-and-
computer-science/leaders-building-leaders

THE NEW MODEL
(Big) Data Processing for Next Generation Business Value
Leaders Building Leaders Friday, 2015-04-03

What is This?
Abstract:
•  Innovations in computing technology have led to the
development of new platforms, such as Apache Hadoop, that
enable a new style of data processing.
•  This new approach offers low-cost scalable-computing on
commodity hardware and provides data processing capabilities
that combine the traditional SQL-based RDBMS with new
mechanisms for continuous, real-time, and deep analysis of
both stored and streaming information.
•  The technology enables a modern data architecture that
enterprises rely on for more timely and in-depth decision
making.

Who Am I? David
Kaiser
@ddkaiser
linkedin.com/in/dkaiser
facebook.com/dkaiser
dkaiser@dkaiser.org
dkaiser@hortonworks.com
1995 Union College – B.S. Computer Information Systems
23 years experience with Linux & Open-Source Software
Career Emphases:
•  Data Warehousing, Enterprise Data Modeling
•  High Performance Computing (HPC)
•  Geospatial & Multi-dimensional Analytics
•  Open-Source Solutions and Architecture
5 years experience with Apache Hadoop
Employed at Hortonworks as a Senior Solutions Engineer

Who Are You?
•  Computer Scientists?
•  Business Advocates?
•  Data Scientists?
•  Industry Practitioners?
•  Consumers?

Timeline View
•  In my field, time is often the most-important dimension
•  Timelines provide context, realization through visualization
•  Watch the top border for the next 45 minutes
Historical/ Contextual
Business Use-Cases
Technology Brief
Scalable Computing
Data Science
Q Q
Q Questions
(‘Timeline View’ Slide)

25 Years – Technology Evolution
•  Classroom Technology Then and Now:
•  Chalkboards è Whiteboards
•  Homework Collection Box è Moodle
•  “Overhead” Projectors è Digital Projectors
Technology has helped evolve
the education process to a
more interactive, socially
connected, media-driven and
real-time stream of events.

25 Years – Computing Evolution

HP 3000 Mainframe @ Union
1978 – HP 3000 Model II
1986 – HP 3000 Series 70
•  Every terminal on the Union College
campus was wired to the mainframe.
Campus Life depended on the HP 3000:
•  Checking out a library book
•  Purchasing food in the cafeteria or deli
•  Checking-in at the Lifestyle Center
•  Receiving your semester grades
•  Testing your code (Database Design &
COBOL Programming class)
A very centralized topology – one system
stored every: application, file, record
1986: base machine with 8MB RAM, $150,000
Configured w/13GB disk, 16MB RAM, $250,000

25 Years – Networking Evolves
1990 2015 2015 Advantage
Media Copper Wire Fiber-Optic
Strands 500 2
Weight 10+ pounds per foot 5 grams per foot 900x Lighter
Voice Capacity 250 Voice Calls 8000 Voice Calls (1997) 32x More
Data Capacity 7 MB/s 1 GB/s (in 1997) 145x More

25 Years – Storage Innovation
1988 2015 2015 Advantage
Media Spinning
Magnetic Plates
Solid-State(SSD)
NAND (Flash) RAM
Shock Resistant
Weight 6.8 Pounds 0.4 Grams 7700x Lighter
Power 4 Amps 40 mA 100x Less Energy
Cost $394 $399 none
Capacity 80MB 200GB 2500X Larger

Cloud – The New Datastore
•  What is a CDN? Cache Delivery Network
•  High speed content from the cloud
•  Social shares (your Facebook photos)
•  Spotify Songs
•  Hulu Video Clips
•  Delivery of online games, online ads,
•  etc.
•  Just One Example
•  http://www.edgecast.com/network/
•  Analyzing CDN usage logs provides great insight

Academic Computing Evolution
è

Degrees @ Union Kept Pace
1970’s, 1980’s 1990 2000 2010 Now
Tabulating? (1960’s)
Data Processing (70’s)
Computer Information Systems
Computer Science
Mainframe -> Personal -> Client/Server -> Internet -> Cloud Computing
Serial -> Ethernet Network -> Fiber-Optic -> Wi-FI
Hard Disk -> SSD / Flash -> Online Storage

The World’s Data
•  Explosive increase in amount of data to process
•  Transition from centralized (mainframe)
•  To: all those distributed devices
•  Increase in Data transfer and storage
2.8 Zettabytes
in 2012
44 Zettabytes
in 2020
1 ZB is 2 to the 70th power bytes, which is
approximately 10 to the 21st power bytes.
(1,000,000,000,000,000,000,000) bytes.

Velocity Example
•  http://www.retale.com/info/retail-in-real-time/

Internet of Things -> Even More Data
•  IoT is a concept of every thing being networked
•  http://postscapes.com/internet-of-things-examples/
•  “Smart Home”
•  Energy efficiency
•  Proactive shopping
•  Environmental / Pollution Monitoring
•  Integrating major platforms: Auto, Entertainment, Comms
•  Already receive a text on my phone when my car needs service
•  Can send a Google Map POI from the phone to the car navigation
•  ARM Processor shipments: 64 Billion since 1993
•  > 12 Bbn in 2014

Electric Utility Use Case
Smart Meter Sensor Data

Traditional Data – an Incomplete Picture
12 data points per home, per year

Smart Meter Data – 100,000x More Info
5 different data measurements * 4/hr * 24 * 365 = 175,200 data points
San Diego County, 1.8M meters. LA County, 7.1M meters.
1.5 Trillion data points per year for 2 counties
è

Providing New Analytics
Challenge: Power outages cost the US economy
$80 billion annually
•  Utilities must match supply with demand
•  Slow response to peak demand requires
expensive “peaker plants” or can cause blackouts
•  Understanding voltages at edge-points is key
Solution: Managing voltage levels saves energy, reduces peak-
driven strains on the grid
•  Smart meters provide greater monitoring and control
•  Companies can pro-actively manage the grid to avoid outages
•  Analyze transmission repair needs and dispatch crews more effectively

Providing New Value for Consumers

Auto Insurance Use Case
Sensor Data for Pay-As-You-Drive Coverage

Traditional Auto Insurance Data Collection
Historical collection of driving behavior data: tickets and accidents

Collecting New Driving Data with Sensors
New Applications: Save lives, avoid tickets, reduce premiums

Longer Data Retention & Faster Analysis
Challenge: Risk analysis lagged because of
architecture gaps
•  Volume, velocity and variety of data taxed
existing storage platform
•  ETL process captured only 25% of the data,
took 5-7 days to complete
Solution: More data improves assessment of actual risk
•  Vastly improved interactive analysis with Apache Hive
•  ETL acceleration: now process 100% of the data in three days or less

Manufacturing Use Case
Analyzing Defects Across Batches,
Building a Reputation for Quality

Manufacturing Data for Defect Analysis
Test data determines overall product quality, enables failure analysis
(such as yield rate) for manufacturing performance
Note: Images are not of the client’s operations (for discussion purposes only)

Data for Real-Time Decisions and
Historical Analysis
Challenge: Data scarcity made root cause
analysis difficult for returned products
•  200 million units manufactured annually
•  Despite world-leading manufacturing process,
more than 10,000 units returned monthly
•  Subset (selected fields) of manufacturing data
retained for only 3 months
Solution: Longer data retention for better root cause analysis
•  All manufacturing data retained for 24 months
•  10x improvement in speed to insight
•  Searchable data for >1,000 employees

Retail Use Case
360-Degree View of Customer Lifetime Value

360-Degree View of LCV* for Home
Supply Retailer
Customer behavior data stored in silos, difficult to join for 360-view
Note: Images are not of the client’s operations (for discussion purposes only)
LCV: Lifetime Customer Value

Targeted Marketing, Data Storage Savings
Challenge: Lack of unified customer records
•  Global distribution: home, online and 1000s of stores
•  No “golden record” of customer across all channels
(web traffic, POS and in-home services in silos)
•  Limited ability to do targeted marketing
•  Data storage costs increasing
Solution: Storage savings & a golden record for targeted marketing
•  Golden record enables targeted, personalized marketing
•  Data warehouse offload saves millions in recurring annual expense
•  New use case: price optimization versus competitors
à millions in top-line growth

Recommendation Engines
Machine Learning in Action
•  As you order items from Amazon
•  Your Netflix video viewing choices
influence your suggested videos
•  Your Spotify listening list
influences your suggested artists
•  Even Google Maps adjusts what
you see depending on your
history; for example, I often
search for meeting places and
now Google Maps shows this by
default.

Behavior, Co-occurrence, and Text
Retrieval
Making Predictions
•  Behavior of users is the best clue
to what they want.
•  Co-occurrence is a simple basis
to compute significant indicators
of what should be recommended.
•  There are similarities between
the weighting of indicator scores
in output of such a model and the
mathematics that underlie text
retrieval engines.
•  This mathematical similarity
makes it possible to exploit text
based search to deploy a
recommender using Apache Solr/
Lucene.

Speculative Use-Cases
Some Communities Drive Unusual Patterns
*Hedge Funds, Mostly

http://www.wsj.com/articles/ibm-to-invest-3-billion-in-sensor-data-unit-1427774463

http://www.technologyreview.com/view/535081/data-mining-
reveals-a-global-link-between-corruption-and-wealth/

www.zdnet.com/article/weird-but-inevitable-algorithm-now-serves-on-a-corporate-board/

TECHNOLOGY
Scalable Computing & Data Science

New Data Paradigm is Driving a Shift in IT...
Traditional Systems
•  Data constrained to apps
•  Can’t manage new data
•  Costly to scale
Business Value
Clickstream
Geolocation
Web Data
Internet
of Things
Files, Emails
Server Logs
2.8 Zettabytes
in 2012
44 Zettabytes
in 2020
LAGGARDS
New Data, New Opportunity
ERP CRM SCM
New Data
Traditional Data
LEADERS
1 Zettabyte (ZB) = 1 million Petabytes (PB); Sources: IDC, IDG Enterprise, and AMR Research
Limited ability
to innovate Industry leadership
via full fidelity of data
and advanced analytics

…From Reactive to Proactive Value Chains
…proactive maintenanceBreak then fix
…personalized quality of
service
Customer service silos
…proactive diagnostics and
designer medicine
Mass treatment
…real-time trade surveillance
& compliance analysis
Daily risk analysis
…real-time personalization
and 360° customer view
Mass brandingRetail
Financial Services
Healthcare
Manufacturing
Telco
INDUSTRY LEADERS

To Realize Full Potential, a New Approach Is Needed
EXISTING

Systems

Clickstream
Web

&
Social

Geoloca9on
Internet
of

Things

Server

Logs

Files,

Emails

NEW
SOURCES
The goal:
Turn data into
value
$
NEW
VALUE
The problem:
Data architectures
don’t scale
Costs
Data Structure
Silos

Modern Data Architecture Emerges
Clickstream
Web

&
Social

Geoloca9on
Internet
of

Things

Server

Logs

Files,

Emails

SOURCES
Existing Systems
ERP
CRM
SCM

ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
Large Shared Data Storage
Distributed High-Performance Compute System
Interactive Real-TimeBatch Partner ISVBatch BatchMPP
EDW

Goal: To Unify Data & Processing
Modern Data Architecture
•  Enables applications to
access all enterprise
data through an efficient
centralized architecture
•  Provides versatility to
handle any applications
and datasets no matter
the size or type
•  Leverages new and
existing data center
infrastructure
investments
•  Scalable and affordable;
low cost per TB

Modern Data Architecture Emerges
Clickstream
Web

&
Social

Geoloca9on
Internet
of

Things

Server

Logs

Files,

Emails

SOURCES
Existing Systems
ERP
CRM
SCM

ANALYTICS
Data
Marts
Business
Analytics
Visualization
& Dashboards
ANALYTICS
Applications
Business
Analytics
Visualization
& Dashboards
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
Hadoop HDFS (Distributed File System)
Hadoop YARN: Data Operating System
Interactive Real-TimeBatch Partner ISVBatch BatchMPP
EDW

Goal: To Unify Data & Processing
Modern Data Architecture
•  Apache Hadoop
•  HDFS provides the
replicated, distributed
data storage
•  YARN provides the
scalable compute grid
•  Standardized platform
provides base to host all
big-data applications

•  To first understand Hadoop, let’s first take a look at:
•  High-Performance-Computing
•  Distributed Processing

History of Super-Computing: Cray 1
“Unified Memory”
All cores accessible
Intricate hand-wired
Backplane
Expensive liquid
cooling system

Cray Jaguar XT
Move to distributed /
multi-node
Still uses
expensive liquid
cooling system

Apache Hadoop
•  Partitioned
•  Distributed
•  High Performance
•  Flexible, Supports Many Types of Apps and Workloads
•  Runs on Commodity Hardware : Affordability

Apache Hadoop: Big Data Platform
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS
with Predictable Performance and Quality of Service
Applica9ons
Run
Na9vely
in
Hadoop

HDFS2
(Redundant,
Reliable
Storage)

YARN
(Cluster
Opera7ng
System)

BATCH

(MapReduce)

INTERACTIVE

(Tez)

STREAMING

(Storm,
S4,…)

GRAPH

(Giraph)

IN-‐MEMORY

(Spark)

HPC
MPI

(OpenMPI)

ONLINE

(HBase)

OTHER

(Search)

(Weave…)

Hadoop + Linux
Provides a 100% Open-Source framework for efficient
scalable data processing on commodity hardware
Commodity
Hardware
Linux – The
Open-source
Operating System
Hadoop – The
Open-source
Data Operating
System

Hive – MR Hive – Tez
MapReduce, Tez Dataflows
SELECT a.x, AVERAGE(b.y) AS avg
FROM a JOIN b ON (a.id = b.id) GROUP BY a
UNION SELECT x, AVERAGE(y) AS AVG
FROM c GROUP BY x
ORDER BY AVG;
SELECT a.state
JOIN (a, c)
SELECT c.price
SELECT b.id
JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)
M M M
R R
M M
R
M M
R
M M
R
HDFS
HDFS
HDFS
M M M
R R
R
M M
R
R
SELECT a.state,
c.itemId
JOIN (a, c)
JOIN(a, b)
GROUP BY a.state
COUNT(*)
AVERAGE(c.price)
SELECT b.id

© Hortonworks Inc. 2011 – 2014. All Rights
Reserved
HDP: Completely Open Data Platform
Hortonworks Data Platform 2.2
Hortonworks Data Platform provides Hadoop
for the Enterprise: providing core enterprise
services, for any application and any data.
Completely Open
•  HDP incorporates every
element required of an
enterprise data platform:
data storage, data access,
governance, security,
operations
•  All components are
developed in open source
and then rigorously
tested, certified, and
delivered as an integrated
open source platform
that’s easy to consume
and use by the enterprise
and ecosystem.
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
ApachePig
° °
° °
° ° °
° ° °
HDFS
(Hadoop Distributed File System)
GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
Apache
Falcon
ApacheHive
Cascading
ApacheHBase
ApacheAccumulo
ApacheSolr
ApacheSpark
ApacheStorm
Apache
Sqoop
Apache
Flume
Apache
Kafka
SECURITY
Apache
Ranger
Apache
Knox
OPERATIONS
Apache
Ambari
Apache
Zookeeper
Apache
Oozie

© Hortonworks Inc. 2011 – 2014. All Rights
Reserved
Year Founded In 2011, 24 engineers from the original
Yahoo! Hadoop team created Hortonworks.
Ticker Symbol NASDAQ: HDP
Headquarters Santa Clara, CA
Business Model OpenSource Software Support Subscriptions,
Training and Consulting Services
Non-GAAP Billings Grew from zero to over $125 million
on an annualized basis in 10 quarters
Subscription
Customers
332 in 10 quarters
with 99 added in Q4-2014 alone.
Support 24×7, global web, telephone support
Partners 1000 joint engineering, strategic reseller,
technology, and system integrator partners
Employees 650
Global Operations 17 countries
#1
28 out of 86 Apache Hadoop
committers
Hortonworks employs the largest group of
Hadoop committers under one roof; more
than twice any other company.
#1
163 Apache committer seats for
projects in HDP Our committers work in
20+ projects on the data access,
management, security, operations, and
governance needs of the enterprise; more
than twice any other company.
Hortonworks Quick Facts
The Forrester Wave™ Big Data
Hadoop Solutions
We are recognized as a leader in Hadoop by
Forrester Research based on the strengths of
our offerings and strategy

“So, we can co-locate all the data…
- Can we correlate it?”

or…
How do I use all this data?

What is Data Science?
•  The scientific exploration of data to extract meaning or
insight, and the construction of software systems to utilize
such insight in a business context

•  …the art of discovery
…and the science
of operations

•  …the art of discovery
…and the science
of operations
What is a Data Scientist?
… A person who explores and discovers
interesting and valuable facts within data
and builds systems to deliver value

Driver: Advanced Analytic Applications
Single View: Improve
acquisition & retention
•  Enables a single view of
each customer, allowing
organizations to provide
targeted, personalized
customer experiences.
•  Single view reduces
attrition, improves cross-sell
and improves customer
satisfaction.
Predictive Analytics:
Identify next best action
•  Capture, store and process
large volumes of data
streaming from connected
devices.
•  Stream processing and data
science help introduce new
analytics for real-time and
batch analysis.
Data Discovery:
Uncover new findings
•  Allows exploration of new
data types and large data
sets that were previously
too big to capture, store &
process.
•  Unlocks insights from data
such as clickstream, geo-
location, sensor, server log,
social, text and video data.

Single View
Improve acquisition and
retention
Data Discovery
Uncover new findings
Predictive
Analytics
Identify your next best
action
Financial
Services
New Account Risk Screens Insurance Underwriting Trading Risk
Improved Customer Service Aggregate Banking Data as a Service Insurance Underwriting
Cross-sell & Upsell of Financial Products Identify Claims Errors for Reimbursement
Risk Analysis for Usage-Based Car
Insurance
Telecom
Unified Household View of the Customer
Protect Customer Data from Employee
Misuse
Searchable Data for NPTB
Recommendations
Analyze Call Center Contacts Records Call Detail Records (CDR) Analysis Network Infrastructure Capacity Planning
Inferred Demographics for Improved
Targeting
Tiered Service for High-Value Customers
Proactive Maintenance on Transmission
Equipment
Retail
360° View of the Customer Website Optimization for Path to Purchase Supply Chain Optimization
Localized, Personalized Promotions
Data-Driven Pricing, improved loyalty
programs
A/B Testing for Online Advertisements
Customer Segmentation In-Store Shopper Behavior Personalized, Real-time Offers
Healthcare
Electronic Medical Records Use Genomic Data in Medical Trials Monitor Patient Vitals in Real-Time
Improving Lifelong Care for Epilepsy
Monitor Medical Supply Chain to Reduce
Waste
Rapid Stroke Detection and Intervention
Reduce Patient Re-Admittance Rates Healthcare Analytics as a Service Video Analysis for Surgical Decision Support
Oil & Gas
Unify Exploration & Production Data Geographic exploration Monitor Rig Safety in Real-Time
DCA to Slow Well Declines Curves Define Operational Set Points for Wells
Proactive Maintenance for Oil Field
Equipment
Government
Single View of Entity
Sentiment Analysis on Program
Effectiveness
CBM & Autonomic Logistic Analysis
Prevent Fraud, Waste and Abuse Meet Deadlines for Government Reporting
Proactive Maintenance for Public
Infrastructure
Driver: Advanced Analytic Applications

Ex: Predictive Analytics Case Studies
Preventative
Maintenance
Oil and Gas Co. analyzes
streaming sensor data to
predict issues and fix
equipment before pumps
break and jeopardize oil
production.
Resource
Optimization
Energy Co. analyzes
smart meter data and grid
metrics to predict future
consumption patterns and
identify substations where
voltage can be reduced to
drive cost savings.
Behavioral
Insight
Insurance Co. collects
sensor data from cars
and analyzes it in hours
to maintain up-to-date risk
profiles, predict the
likelihood of future claims,
and adjust pricing and
products accordingly.

Ex: Predictive Analytics Case Studies
Truck
Sensors
Distributed Storage: HDFS
Many Workloads: YARN
Stream Processing
(Storm)
Inbound Messaging
(Kafka)
Microsoft
Excel
Interactive Query
(Hive)
Alerts & Events
(ActiveMQ)
Real-Time
User Interface
Real-time Serving
(HBase)

Data Science is iterative in nature…
Visualize,
Grok
Hypothesize;
Model
Measure/
Evaluate
Acquire Data
Clean Data
Formulate
Question
Deploy

Data
Exploration
Feature
Engineering
Pre
Processing
Data Science combines proficiencies…
•  Practical data science
is comprised of four
main groups with key
supporting functions
•  A data scientist needs
to be proficient in all
these functions that
range from technical to
analytical
Signal
Processing
OCR
Transform
Normalize
Aggregate
Simple
Statistics
Data
Modeling
Frequent
Itemset
Anomaly
Detection
Clustering
Collaborative
Filter
Regression
Classification
Supervised
Learning
Unsupervised
Learning
ReportingVisualizationData Quality
technical analytical
Dimension
Reduction
Feature
Selection
Information
Theory
Natural
Language
Processing

Areas of expertise in data science
Data Engineer
•  Data engineering
(quality, ETL, pipelines etc…)
•  Computer science
•  Coding (Java, Scala, Python, etc…)
Applied Scientist
•  Research scientist focusing on solving
real-world problems
•  Machine learning, advanced statistics,
applied math, NLP, visualization.
Business Analyst
•  Business/domain expertise
•  SQL, Excel, Visualization
tools
Big data engineer
•  Hadoop, PIG, HIVE,
Cascading, SOLR, etc
•  Statistics and machine
learning over large datasets

What is Machine Learning?
WALL-E was a machine that learned how to
feel emotions after 700 years of experiences
on Earth collecting human artifacts.
Machine learning is the science of getting
computers to learn from data and act without
being explicitly programmed.
•  Machine learning is about the
construction and study of systems that
can learn from data.
•  The core of machine learning deals with
representation and generalization so that
the system will perform well on unseen
data instances and predict unknown
events.
•  There is a wide variety of machine
learning tasks and successful
applications.

Six Machine Learning Tasks
Unsupervised tasks:
•  Clustering
•  Outlier Detection
•  Affinity Analysis
•  Recommendation
Supervised tasks:
•  Classification
•  Regression

Supervised Learning
•  Supervised
learning:
the training data
(i.e. the data being
presented to the
machine learning
algorithm) is labeled.
•  In this case, the
machine is tasked
with classifying new
data based on the
provided labels.

Unsupervised Learning
Unsupervised
learning:
•  The machine
algorithm is
not provided
any training
data.
•  Algorithm
must discover
information
about the new
data.

Detecting Outliers – Fraud Detection
Identity Thief is a comedy about a
woman in Florida stealing the
identity of a man named Sandy
Bigelow from Colorado
Local outlier factor compares the
local density of a point's
neighborhood with the local density of
its neighbors. Points that have a
substantially lower density than
neighbors are outliers.
The k-nearest neighbor-based
(KNN-based) algorithms use the
average distance from the closest K
neighborhood to a point as the outlier
factor.
One-class SVM (one-class Support
Vector Machines) is a variation of
regular SVM suitable for outlier
detection.

IBM Watson
http://www.skynews.com.au/business/business/world/2015/03/22/ibm-offers-businesses-data-mining-of-twitter.html
Machine-Learning Champion?

Some Recommended Books – Pt. 1
•  I own these / use for reference or ideas
•  Recommended Books on Data Analysis
•  Visualizing Data, O'Reilly / Fry
•  Data Analysis with Open Source Tools, O'Reilly / Janert
•  Books on Apache Hadoop / MapReduce, Computation
•  Hadoop: The Definitive Guide, O'Reilly / White
•  MapReduce Design Patterns, O'Reilly / Miner, Shook
•  Apache Hadoop YARN, Pearson / Murthy et. al.
•  Data-Intensive Text Processing with MapReduce
•  High Performance Computing, O'Reilly / Dowd, Severance
•  Business Centric Titles
•  The Art of Scalability, Addison-Wesley / Abbott, Fisher

•  General Advice
•  Books on developer language areas related to Data Science:
•  Spark - Learning Spark
•  Python
•  R
•  Books on data science
•  Machine Learning: The Art and Science of Algorithms that Make Sense
of Data
Some Recommended Books – Pt. 2

QUESTIONS
Download this presentation:
http://slideshare.net/ddkaiser/the-new-model-46612062

The New Model

More Related Content

What's hot

Similar to The New Model

Recently uploaded

The New Model