SlideShare a Scribd company logo
1 of 35
Download to read offline
Peter Aiken, Ph.D. & Micah Dalton
Implementing Big Data, NOSQL, & HADOOP
Demystifying Big Data: Bigger is (Usually) Better
Copyright 2017 by Data Blueprint Slide # 1
• DAMA International President 2009-2013
• DAMA International Achievement Award 2001 (with
Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
Peter Aiken, Ph.D.
• 33+ years in data management
• Repeated international recognition
• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS (vcu.edu)
• DAMA International (dama.org)
• 10 books and dozens of articles
• Experienced w/ 500+ data
management practices
• Multi-year immersions:

– US DoD (DISA/Army/Marines/DLA)

– Nokia

– Deutsche Bank

– Wells Fargo

– Walmart

– … PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
The Case for the
Chief Data Officer
Recasting the C-Suite to Leverage
Your MostValuable Asset
Peter Aiken and
Michael Gorman
2
Copyright 2017 by Data Blueprint Slide #
Micah Dalton
3Copyright 2017 by Data Blueprint Slide #
Micah is a senior business leader with twenty years of
management experience building and leading teams
to deliver results across various industries including;
financial services, public sector, non-profit and higher
education. Micah’s expertise in offering pragmatic
business solutions has made him valuable member of
client team. Micah's skills focus on using data to
drive root cause identification, analytics, strategy,
financial analysis and reporting, procurement strategy
and cost management, and operations analysis and
management. Micah helped lead the development of
Capital One’s Six Sigma program & completed his
Black Belt training. Micah also holds certifications in
Organizational Change Management (PROSCI) and
Data Management (CDMP-Associate from DAMA).
Micah earned his MBA from Duke’s Fuqua School of
Business focusing his interests in corporate finance
and business strategy. Prior to that Micah earned this
Bachelor’s degree in economics from Mary
Washington College. Additionally, Micah was a
member of the 2014 class of Leadership Metro
Richmond and has been an adjunct professor of
Marketing at the University of Mary Washington.
4Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
Welcome to the Post-Big Data Era!
5Copyright 2017 by Data Blueprint Slide #
Data	Velocity
Data	Volume
Data	Variety
Big	Data:	Expanding	on	3	
Fronts	at	an	Increasing	Rate	
Big Data(has something to do with Vs - doesn't it?)
• Volume
– Amount of data
• Velocity
– Speed of data in and out
• Variety
– Range of data types and sources
• 2001 Doug Laney
• Variability
– Many options or variable interpretations confound analysis
• 2011 ISRC
• Vitality
–A dynamically changing Big Data environment in which analysis and predictive models
must continually be updated as changes occur to seize opportunities as they arrive
• 2011 CIA
• Virtual
– Scoping the discussion to only include online assets
• 2012 Courtney Lambert
• Value/Veracity
• Stuart Madnick (John Norris Maguire Professor of Information Technology, MIT Sloan School of
Management & Professor of Engineering Systems, MIT School of Engineering)
6Copyright 2017 by Data Blueprint Slide #
The 13 V’s of Big Data
• Vast Volume of Vigorously, Verified, Vexingly, Variable,
Verbose yet Valuable, Vital, Visualized, high Velocity and
Veracity data that encourages the Vanity of the big data
experts
– Original from John Marshey – Sillicon Graphics 1998

(with contributed extensions)
7Copyright 2017 by Data Blueprint Slide #
• We have no objective
definition of big data!
– Any measurements,
claims of success,
quantifications, etc.
must be viewed
skeptically and with
suspicion!
8Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
I shall not today
attempt further to
define the kinds of
material but I know
it when I see it ...
(Justice Potter Stewart)
9Copyright 2017 by Data Blueprint Slide #
Big Data
10Copyright 2017 by Data Blueprint Slide #
Big Data
11Copyright 2017 by Data Blueprint Slide #
[ Techniques /
Technologies ]
12Copyright 2017 by Data Blueprint Slide #
Big Data
Big Data Techniques
• New techniques available to impact the productivity (order of
magnitude) of any analytical insight cycle that compliment,
enhance, or replace conventional (existing) analysis methods
• Big data techniques are currently characterized by:
– Continuous, instantaneously 

available data sources
– Non-von Neumann 

Processing (defined later in the presentation)
– Capabilities approaching 

or past human comprehension
– Architecturally enhanceable 

identity/security capabilities
– Other tradeoff-focused data processing
• So a good question becomes "where in our existing architecture
can we most effectively apply Big Data Techniques?"
13Copyright 2017 by Data Blueprint Slide #
The Big Data Landscape
Copyright Dave Feinleib, bigdatalandscape.com
14Copyright 2017 by Data Blueprint Slide #
The Big Data Landscape 2.0
15Copyright 2017 by Data Blueprint Slide #
The Big Data Landscape 3.0
Copyright Dave Feinleib, bigdatalandscape.com
16Copyright 2017 by Data Blueprint Slide #
Internet of Things Landscape 2016
17Copyright 2017 by Data Blueprint Slide #
18Copyright 2017 by Data Blueprint Slide #
http://blogs.cisco.com/sp/from-internet-of-things-to-web-of-things/
Howmuchdatais

generatedeveryminute!
19Copyright 2017 by Data Blueprint Slide #
20Copyright 2017 by Data Blueprint Slide #
114#
Rela%onal(zone(
Non+rela%onal(zone(
Lotus#Notes#
Objec/vity#
MarkLogic#
InterSystems#
Caché#
McObject#
Starcounter#
ArangoDB#
Founda/onDB#
Neo4J#
InfiniteGraph#
CouchDB#
Oracle#NoSQL#
Redis#
Handlersocket#
##RavenDB#
RethinkDB#
Google#App##
Engine#Datastore#
LevelDB#
Accumulo#
Cassandra#
HBase#
Riak#
Couchbase#
Splice#Machine#
Ac/an#Ingres#
SAP#Sybase#ASE#
EnterpriseDB#
SQL##
Server#
MySQL#
Informix#MariaDB#
SAP##
HANA#
#
IBM#
DB2#
Database.com#
ClearDB#
Google#Cloud#SQL#
Rackspace#
Cloud#Databases#
AWS#RDS#
Azure#SQL#
Database#
FathomDB#
HP#Cloud#RDB#
#for#MySQL#
StormDB#
Teradata##
Aster#
HPCC#
Cloudera#
Hortonworks#MapR# IBM##
BigInsights#
ZeWaset#
NGDATA#
Infochimps#
Metascale#
Mortar#
Data#
Rackspace#
Qubole#
Voldemort#
Aerospike#
Teradata#
IBM#PureData#
for#Analy/cs#
Pivotal#Greenplum#
HP#Ver/ca#
SAP#Sybase#IQ#
IBM#InfoSphere#
Ac/an#Vector#
XtremeData#
Kx#Systems#
Exasol#
Ac/an#Matrix#
ParStream#
TokuDB#
ScaleDB#
ScaleArc#
Con/nuent#
TransLa[ce#
NuoDB#
Drizzle#
JustOneDB#
Pivotal#GemFire#XD#
Galera#
ScaleBase#
Zimory#Scale#
Clustrix#
Tesora#
MemSQL# GenieDB#
Datomic#YarcData#
FlockDB#
Allegrograph#
HypergraphDB#
AffinityDB#
Giraph#
Trinity# MemCachier#
Redis#Labs#
Memcached#Cloud#
FairCom#
BitYota#
IronCache#
Grid/cache(zone(
Memcached#
Ehcache#
ScaleOut#
So^ware#
IBM##
eXtreme##
Scale#
Oracle##
Coherence#
GigaSpaces#XAP#GridGain#
Pivotal#
GemFire#
CloudTran#
InfiniSpan#
Hazelcast#
Oracle#
Exaly/cs#
Oracle#
Database#
MySQL#Cluster#
Oracle##
Endeca#Server# A[vio#
Elas/csearch#
LucidWorks#
Big#Data#
Lucene/Solr#
IBM#InfoSphere##
Data#Explorer#
Towards(
E*discovery(
Towards(
enterprise(search(
Documentum#
xDB#
Tamino#
XML#Server#
Ipedo#XML#
Database#
ObjectStore#
LucidDB#
MonetDB#
Metamarkets#Druid#
Databricks/Spark#
AWS#
Elas/Cache#
#
Firebird#
SciDB#
SQLite#
Oracle#TimesTen#
solidDB#
Adabas#
IBM#IMS#
UniData#
UniVerse#
WakandaDB#
Al/scale#
Oracle#Big##
Data#Appliance#
RainStor#
OrientDB#
Sparksee#
Metamarkets#
Treasure#
Data#
PostgreSQL#
Percona#
vFabric#Postgres#
©#2014#by#451#Research#LLC.##
All#rights#reserved##
HyperDex#
TIBCO#
Ac/veSpaces#
Titan#
SAP#Sybase#SQL#Anywhere#
JethroData#
CitusDB#
Pivotal#
HD/HAWQ#
BigMemory#
Ac/an#
Versant#
DataStax#
Enterprise#
DeepDB#
Infobright#
FatDB#
Google#Cloud#
Datastore#
Heroku#
Postgres#
GrapheneDB#
Cassandra.io#
Hypertable#
BerkeleyDB#
Sqrrl#
Enterprise#
Microso^#
HDInsight#
HP#
Autonomy#
Oracle#
Exadata#
IBM##
PureData#
IBM#
Big#SQL#
Impala#
Apache#
Drill#
Presto#
Microso^#
SQL#Server#
PDW#
Apache#
Tajo#
Apache#
Hive#
SPARQLBASE#
MammothDB#
Al/base#HDB#
LogicBlox#
SRCH2#
TIBCO#
LogLogic#
Splunk#
Towards(
SIEM(
Loggly# Sumo#
Logic#Logentries#
InfiniSQL#
JumboDB#
Ac/an#PSQL#
Progress#OpenEdge#
Kogni/o#
Al/base#XDB#
Savvis#
So^layer#
Verizon#
xPlenty#
Stardog#
MariaDB#Enterprise#
Apache#Storm#
Apache#S4#
IBM#
InfoSphere#
Streams#
TIBCO#
StreamBase#
DataTorrent#
AWS#
Kinesis#
Feedzai#
Guavus#
Lokad#
SQLStream#
So^ware#AG#
Key:((
General#purpose#
Specialist#analy/c#
BigTables#
Graph#
Document#
Key#value#stores#
easeaeService#
Key#value#direct##
access#
Hadoop#
MySQL#ecosystem#
Advanced##
clustering/sharding#
New#SQL#databases#
Data#caching#
Data#grid#
Search#
Appliances#
Inememory#
Stream#processing#
OpenStack#Trove#
1010data#
Google##
BigQuery#
AWS#
Redshi^#
TempoIQ#
InfluxDB#
WebScaleSQL#
MySQL#
Fabric#Spider#
2#
E#
D
A
B
C
TeSystems#
E#
D
A
B
C
2# 4#3# 5#
SQream#
SpaceCurve#
PostgreseXL#
Google#Cloud##
Dataflow#
Trafodion# Hadapt#
Azure#
Search#
Red#Hat#JBoss#
Data#Grid#
6#5#4#
MongoDB#
Cloudant#
Iris#Couch#
MongoLab#
Compose#
ObjectRocket#
CloudBird#
Azure#DocumentDB#
1# 3#
1# 6#
Data
Platforms
Map
October 2014
https://
451research.com/
dashboard/dpa
CockroachDB#
AWS#DynamoDB#
Redisetoego#
AWS#SimpleDB#
Redis#Labs#
Redis#Cloud#
RedisGreen#
AWS#Elas/Cache#
with#Redis#
MagnetoDB#
ObjectRocket#
Redis#
##TokuMX#
VoltDB#
CortexDB#
CodeFutures#
Oracle#Big##
Data#Cloud#
AWS#
EMR#
Google##
Compute#
Engine#
Stra/o#
21Copyright 2017 by Data Blueprint Slide #
INDEX#
D6 #1010data#
D2 #Accumulo#
B3 #Ac/an#Ingres#
C6 #Ac/an#Matrix#
B5 #Ac/an#PSQL#
C6 #Ac/an#Vector#
E1 #Ac/an#Versant#
D1 #Adabas#
C2 #Aerospike#
E1 #AffinityDB#
E1 #Allegrograph#
D3 #Al/base#HDB#
D3 #Al/base#XDB#
A3 #Al/scale#
B4 #Apache#Drill#
B4 #Apache#Hive#
A2 #Apache#S4#
A2 #Apache#Storm#
B3 #Apache#Tajo#
B2 #ArangoDB#
A1 #A[vio#
E2 #AWS#DynamoDB#
E4 #AWS#Elas/Cache#
E2 #AWS#Elas/Cache#with#Redis#
A4 #AWS#EMR#
A2 #AWS#Kinesis#
D5 #AWS#RDS#
D6 #AWS#Redshi^#
E2 #AWS#SimpleDB#
E2 #Azure#DocumentDB#
B2 #Azure#Search#
D5 #Azure#SQL#Database#
D2 #BerkeleyDB#
E4 #BigCache#
E4 #BigMemory#
D6 #BitYota#
C2 #Cassandra#
D2 #Cassandra.io#
B5 #CitusDB#
D5 #ClearDB#
E2 #Cloudant#
D2 #CloudBird#
A5 #Cloudera#
E5 #CloudTran#
C4 #Clusrix#
C3 #CockroachDB#
C4 #CodeFutures#
D2 #Compose#
D4 #Con/nuent#
B2 #CortexDB#
C2 #Couchbase#
D2 #CouchDB#
D5 #Database.com#
A5 #Databricks/Spark#
C2 #DataStax#Enterprise#
#
A2 #DataTorrent#
C3 #Datomic#
D4 #DeepDB#
C1 #Documentum#xDB#
C4 #Drizzle#
E5 #Ehcache#
A1 #Elas/csearch#
B3 #EnterpriseDB#
C4 #CodeFutures#
C4 #CodeFutures#
E2 #Compose#
D4 #Con/nuent#
C2 #Couchbase#
D2 #CouchDB#
D5 #Database.com#
A5 #Databricks/Spark#
C2 #DataStax#Enterprise#
A2 #DataTorrent#
C3 #Datomic#
D4 #DeepDB#
E2 #DocumentDB#
C1 #Documentum#xDB#
C5 #Drizzle#
E5 #Ehcache#
A1 #Elas/csearch#
B3 #EnterpriseDB#
C5 #Exasol#
C3 #FairCom#
C2 #FatDB#
D5 #FathomDB#
A2 #FeedZai#
B3 #Firebird#
D1 #FlockDB#
C2 #Founda/onDB#
D4 #Galera#
C4 #GenieDB#
E4 #GigaSpaces#XAP#
E1 #Giraph#
D5 #Google#BigQuery#
D2 #Google#App#Engine#Datastore#
A2 #Google#Cloud#Dataflow#
D2 #Google#Cloud#Datastore#
C5 #Google#Cloud#SQL#
A4 #Google#Compute#Engine#
D1 #GrapheneDB#
E3 #GridGain#
A2 #Guavus#
B5 #Hadapt#
C2 #Handlersocket#
E5 #Hazelcast#
C2 #HBase#
C5 #Heroku#Postgres#
A5 #Hortonworks#
A1 #HP#Autonomy#
D5 #HP#Cloud#RDB#for#MySQL#
C6 #HP#Ver/ca#
#
B6 #HPCC#
D2 #HyperDex#
E1 #HypergraphDB#
C2 #Hypertable#
B4 #IBM#Big#SQL#
A5 #IBM#BigInsights#
B4 #IBM#DB2#
E6 #IBM#eXtreme#Scale#
D1 #IBM#IMS#
C6 #IBM#InfoSphere#
B2 #IBM#InfoSphere#Data#Explorer#
A2 #IBM#InfoSphere#Streams#
B4 #IBM#PureData#
B6 #IBM#PureData#for#Analy/cs#
B5 #Impala#
E6 #InfiniSpan#
C3 #InfiniSQL#
E1 #InfiniteGraph#
D6 #InfluxDB#
C4 #Infobright#
A3 #Infochimps#
B5 #Informix#
E1 #Intersystems#Caché#
C1 #Ipedo#XML#Database#
E2 #Iris#Couch#
E4 #IronCache#
B5 #JethroData#
D2 #JumboDB#
C3 #JustOneDB#
C6 #Kogni/o#
C6 #Kx#Systems#
D2 #LevelDB#
B1 #Logentries#
B1 #Loggly#
D6 #LogicBlox#
A2 #Lokad#
E2 #Lotus#Notes#
A1 #Lucene/Solr#
C6 #LucidDB#
B2 #LucidWorks#Big#Data#
E2 #MagnetoDB#
B4 #MammothDB#
A4 #MapR#
B3 #MariaDB#
B3 #MariaDB#Enterprise#
B2 #MarkLogic#
D1 #McObject#
E5 #Memcached#
E3 #MemCachier#
C3 #MemSQL#
A3 #Metamarkets#
C6 #Metamarkets#Druid#
A5 #Metascale#
A5 #Microso^#HD#Insight#
B5 #Microso^#SQL#Server##
B5 #Microso^#SQL#Server#PDW#
#
#
#
#
D6 #MonetDB#
D2 #MongoDB#
E2 #MongoLab#
A3 #Mortar#Data#
B4 #MySQL#
C4 #MySQL#Cluster#
C4 #MySQL#Fabric#
C1 #Neo4J#
B2 #NGDATA#
C3 #NuoDB#
E1 #Objec/vity#
E2 #ObjectRocket#
D2 #ObjectRocket#Redis#
D1 #ObjectStore#
C5 #OpenStack#Trove#
A5 #Oracle#Big#Data#Appliance#
A5 #Oracle#Big#Data#Cloud#
E5 #Oracle#Coherence#
B4 #Oracle#Database#
A1 #Oracle#Endeca#Server#
B4 #Oracle#Exadata#
B6 #Oracle#Exaly/cs#
D2 #Oracle#NoSQL#
C5 #Oracle#TimesTen#
C1 #OrientDB#
C6 #ParStream#
B3 #Percona#
E4 #Pivotal#GemFire#
D6 #Pivotal#Greenplum#
B5 #Pivotal#HD/HAWQ#
D3 #Pivotal#SQLFire#
B3 #PostgreseXL#
B3 #PostgreSQL#
B4 #Presto#
C5 #Progress#OpenEdge#
A3 #Qubole#
A3 #Rackspace#
C5 #Rackspace#Cloud#Databases#
B6 #RainStor#
D2 #RavenDB#
E6 #Red#Hat#JBoss#Data#Grid#
C2 #Redis#
E3 #Redis#Labs#Memcached#Cloud#
E2 #Redis#Labs#Redis#Cloud#
E2 #Redisetoego#
E2 #RedisGreen#
D2 #RethinkDB#
C2 #Riak#
B5 #SAP#HANA#
B3 #SAP#Sybase#ASE#
C6 #SAP#Sybase#IQ#
B3 #SAP#Sybase#SQL#Anywhere#
A3 #Savvis#
C4 #ScaleArc#
C4 #ScaleBase#
C4 #ScaleDB#
#
E3 #ScaleOut#So^ware#
B6 #SciDB#
A3 #So^layer#
A2 #So^ware#AG#
C5 #solidDB#
D6 #SpaceCurve#
C1 #Sparksee#
E1 #SPARQLBASE#
C4 #Spider#
B3 #Splice#Machine#
B2 #Splunk#
B3 #SQLite#
A2 #SQLStream#
B6 #SQream#
B2 #Sqrrl#Enterprise#
A1 #SRCH2#
B2 #Starcounter#
D1 #Stardog#
C5 #StormDB#
A6 #Stra/o#
B1 #Sumo#Logic#
A3 #TeSystems#
C1 #Tamino#XML#Server#
D6 #TempoIQ#
B6 #Teradata#
B6 #Teradata#Aster#
C4 #Tesora#
E4 #TIBCO#Ac/veSpaces#
B1 #TIBCO#LogLogic#
A2 #TIBCO#StreamBase#
D1 #Titan#
C4 #TokuDB#
D2 #TokuMX#
B3 #Trafodion#
D3 #TransLa[ce#
A4 #Treasure#Data#
E1 #Trinity#
C1 #UniData#
C1 #UniVerse#
A3 #Verizon#
B3 #vFabric#Postgres#
D2 #Voldemort#
C3 #VoltDB#
D1 #WakandaDB#
D5 #WebScaleSQL#
A3 #xPlenty#
B6 #XtremeData#
C1 #YarcData#
A4 #ZeWaset#
D4 #Zimory#Scale#
#
#
#
hWps://451research.com/dashboard/dpa#
Big Data = Big Spending
• Enterprises are spending wildly on Big Data but don’t
know if it’s worth it yet (Business Insider, 2012)
• Big Data Technology Spending Trend:
• 83% increase over the next 3 years (worldwide):
– 2012: $28 billion
– 2013: $34 billion
– 2016: $232 billion
• Caution:
– Don’t fall victim to SOS (Shiny Object 

Syndrome)
– A lot of money is being invested but 

is it generating the expected return?
– Gartner Hype Cycle suggests results 

are going to be disappointing
22Copyright 2017 by Data Blueprint Slide #
http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe
http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html
http://www.gartner.com/DisplayDocument?id=2195915&ref=clientFriendlyUrl
Big Data Technologies by themselves, are a One Legged Stool
23Copyright 2017 by Data Blueprint Slide #
Governance is the major means
of preventing over reliance on
one legged stools!
24Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
Costpercomputingcycledeclining
25Copyright 2017 by Data Blueprint Slide #
26Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
10X+++ rapid access
27Copyright 2017 by Data Blueprint Slide #
"There’s now a blurring between the storage world and the memory world"
• Faster processors outstripped
not only the hard disk, but main
memory
– Hard disk too slow
– Memory too small
• Flash drives remove both
bottlenecks
– Combined Apple and Yahoo have
spend more than $500 million to
date
• Make it look like traditional
storage or more system
memory
– Minimum 10x improvements
– Dragonstone server is 3.2 tb flash
memory (Facebook)
• Bottom line - new capabilities!
28Copyright 2017 by Data Blueprint Slide #
Non-von Neumann Processing/Efficiencies
• von Neumann
bottleneck 

(computer science)
– "An inefficiency inherent in
the design of any von
Neumann machine that
arises from the fact that
most computer time is
spent in moving
information between
storage and the central
processing unit rather than
operating on it"


[http://encyclopedia2.thefreedictionary.com/von+Neumann+bottleneck]
• Michael Stonebraker
– Ingres (Berkeley/MIT)
– Modern database
processing is
approximately 4%
efficient
• Many big data
architectures are
attempts to address
this, but:
– Zero sum game
– Trade characteristics
against each other
• Reliability
• Predictability
– Google/MapReduce/
Bigtable
– Amazon/Dynamo
– Netflix/Chaos Monkey
– Hadoop
– McDipper
• Big data techniques
exploit non-von
Neumann processing
29Copyright 2017 by Data Blueprint Slide #
30
What is NoSQL?
Copyright 2017 by Data Blueprint Slide #
• Commonly interpreted as both "No SQL" and "Not Only SQL
• Broad class of database management technologies that
provide a mechanism for storage and retrieval of data that
doesn’t follow traditional relational database methodology.
• Motivations
– Simplicity of design
– Horizontal scaling
– Finer control over availability of the data.
• The data structures used by NoSQL databases differ from
those used in relational databases, making some operations
faster in NoSQL and others 

faster in relational 

databases
What is Hadoop?
• A data storage and processing 

system, that runs on clusters of commodity servers.
• Able to store any kind of data in its native format.
• Perform a wide variety of analyses and transformations.
• Store terabytes, and even petabytes, of data
inexpensively.
• Handles hardware and system failures automatically,
without losing data or interrupting data analyses.
• Critical components of Hadoop:
– HDFS- The Hadoop Distributed File System is the storage system
for a Hadoop cluster, responsible for distribution of data across the
servers.
– Mapreduce- The inner workings of Hadoop that allows for distributed
and parallel analytical job execution.
31Copyright 2017 by Data Blueprint Slide #
One of Data Blueprint's Big Data Clusters
32Copyright 2017 by Data Blueprint Slide #
Why NoSQL? Why Hadoop?
• Large number of users (read: the internet)
• Rapid app development and deployment
• Large number of mission critical writes (sensors/etc)
• Small, continuous reads and writes, especially where
“Consistency” is less important (social networks)
• Hadoop solves the hard scaling problems caused by large
amounts of complex data.
• As the amount of data in a cluster grows, 

new servers can be added to a Hadoop 

cluster incrementally and inexpensively 

to store and analyze it.
33Copyright 2017 by Data Blueprint Slide #
Hadoop Use Cases in the Real World
• Risk Modeling
• Customer Churn Analysis
• Recommendation Engine
• Ad Targeting
• Point of Sale Transaction Analysis
• Social Sentiment on Social Media
• Analyzing network data to predict failure
• Threat analysis
• Trade Surveillance
34Copyright 2017 by Data Blueprint Slide #
35Copyright 2017 by Data Blueprint Slide #
http://blogs.informatica.com/perspectives/uk/2011/08/09/hadoop-enriches-data-science-part-2-of-hadoop-series/
Potential Tradeoffs:
CAP theorem: consistency, availability and partition-tolerance
36Copyright 2017 by Data Blueprint Slide #






Partition 

(Fault)

Tolerance


Availability


Consistency
RDBMS
NOSQL
Atomicity
Consistency
Isolation
Durability
Basic
Availability
Soft-state
Eventual consistency
37Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
Pacman
• Decomposition
• Reassembly
– not optional!
38Copyright 2017 by Data Blueprint Slide #
39Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
Sandwich use case
• Landing Zone (less
expensive)
– Especially useful in cases were
data is highly disposable
• Existing technologies are the
– Contents sandwiched and 

complemented landing zone and
archival capabilities
• Archiving/Offloading (less
need for structure)
– "Cold" transactional and analytic
data

Adapted from Nancy Kopp: 

http://ibmdatamag.com/2013/08/relishing-the-big-data-burger/
40Copyright 2017 by Data Blueprint Slide #
Landing_Zone
Archiving_Offloading
Existing 

Data Architectural 

Processing
See Like a Snake
41Copyright 2017 by Data Blueprint Slide #
42Copyright 2017 by Data Blueprint Slide #
Pit Organ
43Copyright 2017 by Data Blueprint Slide #
They can switch back and forth
between those two systems, or
use both simultaneously, giving
them a leg up, so to speak,
when it comes to targeting a
warm object.
Pit Organ
44Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples


























<-Feedback
Discernm
ent


Exploitable

Insight
• Patterns/objects,
hypotheses emerge
– What can be observed?
• Operationalizing
– The dots can be
repeatedly connected
Analytics Insight Cycle
!
Exis&ng!
Knowledge
/base
• Things are happening
– Sensemaking
techniques address
"what" is happening?
• Patterns/objects,
hypotheses emerge
– What can be observed?
• Operationalizing
– The dots can be
repeatedly connected
– "Big Data" contributions
are shown in orange
• Margaret Boden's
computational
creativity
– Exploratory
– Combinational
– Transformational
45Copyright 2017 by Data Blueprint Slide #
Volume
Velocity
Variety




Potential/
actual
insights
Pattern/Object

Emergence 

Analytical
bottleneck














C
om
bined/
inform
ed
insights
"Sensemaking" 

Techniques
Humans Generally Better Machines Generally Better
• Sense low level stimuli
• Detect stimuli in noisy background
• Recognize constant patterns in varying situations
• Sense unusual and unexpected events
• Remember principles and strategies
• Retrieve pertinent details without a priori connection
• Draw upon experience and adapt decision to situation
• Select alternatives if original approach fails
• Reason inductively; generalize from observations
• Act in unanticipated emergencies and novel situations
• Apply principles to solve varied problems
• Make subjective evaluations
• Develop new solutions
• Concentrate on important tasks when overload occurs
• Adapt physical response to changes in situation
• Sense stimuli outside human's range
• Count or measure physical quantities
• Store quantities of coded information accurately
• Monitor prespecified events, especially infrequent
• Make rapid and consisted responses to input signals
• Recall quantities of detailed information accurately
• Retrieve pertinent detailed without a priori connection
• Process quantitative data in prespecified ways
• Perform repetitive preprogrammed actions reliably
• Exert great, highly controlled physical force
• Perform several activities simultaneously
• Maintain operations under heavy operation load
• Maintain performance over extended periods of time
J. C. R. Licklider's Man-Computer Symbiosis
46Copyright 2017 by Data Blueprint Slide #
Best approaches combines manual and automated methods!
47Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
Gartner Recommendations
48Copyright 2017 by Data Blueprint Slide #
Impacts Top
RecommendationsSome of the new analytics that are made
possible by big data have no precedence,
so innovative thinking will be required to
achieve value
Treat big data projects as innovation
projects that will require change
management efforts. The business will
take time to trust new data sources and
new analytics
Creative thinking can unearth valuable
information sources already inside the
enterprise that are underused
Work with the business to conduct an
inventory of internal data sources outside
of IT's direct control, and consider
augmenting existing data that is IT
'controlled.' With an innovation mindset,
explore the potential insight that can be
gained from each of these sources
Big data technologies often create the
ability to analyze faster, but getting value
from faster analytics requires business
changes
Ensure that big data projects that improve
analytical speed always include a process
redesign effort that aims at getting
maximum benefit from that speed
Gartner 2012
Innovation
• Innovation is the development of new customers
value through solutions that meet new needs,
inarticulate needs, or old customer and market
needs in new ways. This is accomplished through
different or more effective products, processes,
services, technologies, or ideas that are readily
available to markets, governments, and society.
• Innovation differs from invention in that innovation
refers to the use of a better and, as a result, novel
idea or method, whereas invention refers more
directly to the creation of the idea or method itself.
• Innovation differs from improvement in that
innovation refers to the notion of doing something
different (Lat. innovare: "to change") rather than
doing the same thing better.
49Copyright 2017 by Data Blueprint Slide #
Data must be incorporated into the innovation-navigation process
50Copyright 2017 by Data Blueprint Slide #
Two Uses for Data in Support of Innovation
1. Using data to keep
the innovation
process on track
2. Using data to
innovate
51Copyright 2017 by Data Blueprint Slide #
Q3 Q4 Q1 Q2
Innovation Series Kick Off Performance Reviews
Intro to Innovative
Thinking & Strategies
Incremental Innovation Disruptive Innovation
INSPIRATION
TOOLS
INFRASTRUCTURE
ACCOUNTABILITY
Think Different Series, Inspired Reading, Innovation Experiences
Desk Drops, IS Innovation, Academy Classes
Town Hall Meetings, Break Out Rooms, Innovation Board
Performance Reviews
Big Data Display
- Game Plan 2013 Session
- Big Data (Speaker TBD)
Think Different Session
- Innovation
- Negotiation
- Providing Feedback that Inspires
*
ALTRIA IS INNOVATION TRANSFORMATION 2012 - 2013
IS INNOVATION ECOSYSTEM
Tech Display
- Town Hall Session
“Creating an Innovation Ecosystem”
Evolution of Publishing Display
- Think Different Session:
Jason Ashlock
- Effective Communication
- Strategic Agility
- Cultivating an Innovation Culture
*
iGloo Room Displays
- Speaker Session
- IS Innovation Academy
* managers only
KEY
Experiential Component
Ongoing Communications
Q3
Sustaining
Transformational Commercial Disruptive“Er”
© THE FRONTIER PROJECT, LLC
Data
Data
Data
Copyright 2017 by Data Blueprint Slide #
Data
52
53Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
Reengineering(Objective Definition)
• How can state that you
have improved any
system?
• If you don't understand
the existing (legacy)
systems strengths and
weaknesses
• You can't use that
these to inform the new
system
• To reengineer
– You must first reverse
engineering and then
– Use that information to
architect the new system
54Copyright 2017 by Data Blueprint Slide #
Legacy System Analysis

(break down & compare)
$$$Value
New System Requirements
New System
55Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
Copyright 2013 by Data Blueprint
Potential Tradeoffs:
CAP theorem: consistency, availability and partition-tolerance
56






Partition 

(Fault)

Tolerance


Availability


Consistency
RDBMS
NOSQL
Small datasets can be both consistent & available
Atomicity
Consistency
Isolation
Durability
Basic
Availability
Soft-state
Eventual consistency
'Throw-away' prototyping
• With 'throw-away' prototyping a small
part of the system is developed and
then given to the end user to try out
and evaluate. The user provides
feedback which can quickly be
incorporated into the development of
the main system. The prototype is
then discarded or thrown away.
57Copyright 2017 by Data Blueprint Slide #
58Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
David Brooks, New York Times
59
Copyright 2015 by Data Blueprint
• Data analysis struggles with the social
– Your brain is excellent at social cognition - people can
• Mirror each other’s emotional states
• Detect uncooperative behavior
• Assign value to things through emotion
– Data analysis measures the quantity of social
interactions but not the quality
• Map interactions with co-workers you see during work days
• Can't capture devotion to childhood friends seen annually
– When making (personal) decisions about social
relationships, it’s foolish to swap the amazing machine
in your skull for the crude machine on your desk
• Data struggles with context
– Decisions are embedded in sequences and contexts
– Brains think in stories - weaving together multiple
causes and multiple contexts
– Data analysis is pretty bad at
• Narratives / Emergent thinking / Explaining
• Data creates bigger haystacks
– More data leads to more statistically significant
correlations
– Most are spurious and deceive us
– Falsity grows exponentially greater amounts of data
we collect
• Big data has trouble with big problems
– For example: the economic stimulus debate
– No one has been persuaded by data to switch sides
• Data favors memes over masterpieces
– Detect when large numbers of people take an instant
liking to some cultural product
– Products are hated initially because they are unfamiliar
• Data obscures values
– Data is never raw; it’s always structured according to
somebody’s predispositions and values
Some Big Data Limitations
Maslow's Hierarchiy of Needs
60Copyright 2017 by Data Blueprint Slide #
You can accomplish
Advanced Data Practices
without becoming proficient
in the Foundational Data
Practices however 

this will:
• Take longer
• Cost more
• Deliver less
• Present 

greater

risk
(with thanks to Tom DeMarco)
Data Management Practices Hierarchy
Advanced 

Data 

Practices
• MDM
• Mining
• Big Data
• Analytics
• Warehousing
• SOA
Foundational Data Practices
Data Platform/Architecture
Data Governance Data Quality
Data Operations
Data Management Strategy
Technologies
Capabilities
61Copyright 2017 by Data Blueprint Slide #
62Copyright 2017 by Data Blueprint Slide #
Implementing Big Data, NOSQL, & HADOOP

Demystifying Big Data: Bigger is (Usually) Better
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
• Why it is important to consider the messenger
– What is being "sold?"
– We are using the wrong vocabulary to discuss this topic
• Technically what are Big Data Technologies good at?
– Computers→ commodity-based computing infrastructure
– Flash memory is currently obeying Moore's Law
– RAM→increased processing
– Parallel-friendly approaches (lots of repeatable actions)
• Successful Big Data Approaches ...
– Innovation
– Reengineering (precise definition)
– Throw away Prototyping
• How does that help operationally?
– Solid support community
– Examples
Copyright 2013 by Data Blueprint
Social Sentiment Analysis
• One of the burgeoning areas
for use of Big Data / Hadoop
platforms.
• Allows for the landing of
multiple sources of
unstructured data. (Twitter,
Facebook, Linked In, etc.)
• Data than can be analyzed
with algorithms looking for
keywords that determine
positive/negative feedback
63
Copyright 2013 by Data Blueprint
64
Operational Use
• Utilize real time pricing data from multiple sources to dynamically
update the pricing for books in the Amazon Marketplace.
• Ingested data from multiple sources looking for real time changes
in price.
• Would apply predictive model to determine best price point and set
price of their books on the marketplace.
• Increased conversion rate, but created a race to the bottom
situation if not monitored
Copyright 2013 by Data Blueprint
65
Healthcare Example: Patient Data
• Clinical data:
– Diagnosis/prognosis/treatment
– Genetic data
• Patient demographic data
• Insurance data:
– Insurance provider
– Claims data
• Prescriptions & pharmacy information
• Physical fitness data
– Activity tracking through 

smartphone apps & social media
• Health history
• Medical research data
Copyright 2013 by Data Blueprint
66
http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/
Retail Example: Loyalty Programs & Big Data
• Companies need to understand current wants and needs AND
predict future tendencies
• Customer -> Repeat Customer -> Brand Advocate
• Customer loyalty programs & retention strategies
– Track what is being purchased and how often
– Coupons based on purchasing history
– Targeted communications, campaigns & special offers
– Social media for additional interactions
– Personalize consumer interactions
• Customer purchase history influences
product placements
– Retailers rapidly respond to consumer demands
– Product placements, planogram optimization, etc.
Copyright 2013 by Data Blueprint
67
References
• The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November
20, 2012)
• McKinsey: Big Data: The next frontier for innovation, competition and productivity
(http://www.mckinsey.com/insights/business_technology/
big_data_the_next_frontier_for_innovation?p=1)
• The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/
2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics)
• Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving
Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/
2575515)
• The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/
2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&)
• CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/
429681/five_steps_how_better_manage_your_data/)
• Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’ But Don’t Know If It’s
Worth It Yet (http://www.businessinsider.com/enterprise-big-data-
spending-2012-11#ixzz2cdT8shhe)
• Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/
kathleen-kim/big-data-spending-to-increase-for-it-industry.html)
• Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/xerox/
2013/09/27/big-data-boosts-customer-loyalty-no-really/)
Copyright 2013 by Data Blueprint
It’s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to everyone now
68
Questions?
10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056

More Related Content

What's hot

DataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDATAVERSITY
 
Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata StrategiesDATAVERSITY
 
Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan
Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful SwanData-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan
Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful SwanDATAVERSITY
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeDATAVERSITY
 
The Value of Metadata
The Value of MetadataThe Value of Metadata
The Value of MetadataDATAVERSITY
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDATAVERSITY
 
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DATAVERSITY
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
DAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDATAVERSITY
 
Big Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and TechnologyBig Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and TechnologyDATAVERSITY
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...DATAVERSITY
 
The future of bi isn't a bi tool
The future of bi isn't a bi toolThe future of bi isn't a bi tool
The future of bi isn't a bi toolDATAVERSITY
 
DataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Management vs. Data StrategyDataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Management vs. Data StrategyDATAVERSITY
 
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...DATAVERSITY
 
Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success StoriesDATAVERSITY
 
DI&A Webinar: Big Data Analytics
DI&A Webinar: Big Data AnalyticsDI&A Webinar: Big Data Analytics
DI&A Webinar: Big Data AnalyticsDATAVERSITY
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...DATAVERSITY
 
Master Data Management - Practical Strategies for Integrating into Your Data ...
Master Data Management - Practical Strategies for Integrating into Your Data ...Master Data Management - Practical Strategies for Integrating into Your Data ...
Master Data Management - Practical Strategies for Integrating into Your Data ...DATAVERSITY
 
DataEd Online: Unlock Business Value through Data Governance
DataEd Online: Unlock Business Value through Data GovernanceDataEd Online: Unlock Business Value through Data Governance
DataEd Online: Unlock Business Value through Data GovernanceDATAVERSITY
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsDATAVERSITY
 

What's hot (20)

DataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance ProgramsDataEd Slides: Growing Practical Data Governance Programs
DataEd Slides: Growing Practical Data Governance Programs
 
Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata Strategies
 
Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan
Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful SwanData-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan
Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
The Value of Metadata
The Value of MetadataThe Value of Metadata
The Value of Metadata
 
DI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data WarehouseDI&A Slides: Data Lake vs. Data Warehouse
DI&A Slides: Data Lake vs. Data Warehouse
 
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
DAS Slides: Building a Future-State Data Architecture Plan - Where to Begin?
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
DAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use CasesDAS Slides: Graph Databases — Practical Use Cases
DAS Slides: Graph Databases — Practical Use Cases
 
Big Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and TechnologyBig Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and Technology
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
 
The future of bi isn't a bi tool
The future of bi isn't a bi toolThe future of bi isn't a bi tool
The future of bi isn't a bi tool
 
DataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Management vs. Data StrategyDataEd Slides: Data Management vs. Data Strategy
DataEd Slides: Data Management vs. Data Strategy
 
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
Data Architecture Strategies Webinar: Emerging Trends in Data Architecture – ...
 
Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success Stories
 
DI&A Webinar: Big Data Analytics
DI&A Webinar: Big Data AnalyticsDI&A Webinar: Big Data Analytics
DI&A Webinar: Big Data Analytics
 
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
Data Architecture Strategies: Artificial Intelligence - Real-World Applicatio...
 
Master Data Management - Practical Strategies for Integrating into Your Data ...
Master Data Management - Practical Strategies for Integrating into Your Data ...Master Data Management - Practical Strategies for Integrating into Your Data ...
Master Data Management - Practical Strategies for Integrating into Your Data ...
 
DataEd Online: Unlock Business Value through Data Governance
DataEd Online: Unlock Business Value through Data GovernanceDataEd Online: Unlock Business Value through Data Governance
DataEd Online: Unlock Business Value through Data Governance
 
Data-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture RequirementsData-Ed Online: Data Architecture Requirements
Data-Ed Online: Data Architecture Requirements
 

Similar to Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better

Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your BusinessData-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your BusinessDATAVERSITY
 
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopData-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopDATAVERSITY
 
Data-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData Blueprint
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDATAVERSITY
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementBig MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementCaserta
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalIIIT Allahabad
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataRoi Blanco
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)DATAVERSITY
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
All Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceAll Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceInside Analysis
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallTrillium Software
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationDoug Denton
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...DATAVERSITY
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big DataDATAVERSITY
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohenTaldor Group
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperativeTrillium Software
 

Similar to Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better (20)

Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your BusinessData-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
 
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, HadoopData-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
Data-Ed Webinar: A Framework for Implementing NoSQL, Hadoop
 
Data-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and HadoopData-Ed: A Framework for no sql and Hadoop
Data-Ed: A Framework for no sql and Hadoop
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best Practices
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementBig MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
 
Big Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar SemwalBig Data By Vijay Bhaskar Semwal
Big Data By Vijay Bhaskar Semwal
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
All Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceAll Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data Governance
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
 
Data Modeling for Big Data
Data Modeling for Big DataData Modeling for Big Data
Data Modeling for Big Data
 
Big data
Big dataBig data
Big data
 
5. big data vs it stki - pini cohen
5. big data vs  it    stki - pini cohen5. big data vs  it    stki - pini cohen
5. big data vs it stki - pini cohen
 
Big data and the data quality imperative
Big data and the data quality imperativeBig data and the data quality imperative
Big data and the data quality imperative
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 

Recently uploaded (20)

Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 

Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better

  • 1. Peter Aiken, Ph.D. & Micah Dalton Implementing Big Data, NOSQL, & HADOOP Demystifying Big Data: Bigger is (Usually) Better Copyright 2017 by Data Blueprint Slide # 1 • DAMA International President 2009-2013 • DAMA International Achievement Award 2001 (with Dr. E. F. "Ted" Codd • DAMA International Community Award 2005 Peter Aiken, Ph.D. • 33+ years in data management • Repeated international recognition • Founder, Data Blueprint (datablueprint.com) • Associate Professor of IS (vcu.edu) • DAMA International (dama.org) • 10 books and dozens of articles • Experienced w/ 500+ data management practices • Multi-year immersions:
 – US DoD (DISA/Army/Marines/DLA)
 – Nokia
 – Deutsche Bank
 – Wells Fargo
 – Walmart
 – … PETER AIKEN WITH JUANITA BILLINGS FOREWORD BY JOHN BOTTEGA MONETIZING DATA MANAGEMENT Unlocking the Value in Your Organization’s Most Important Asset. The Case for the Chief Data Officer Recasting the C-Suite to Leverage Your MostValuable Asset Peter Aiken and Michael Gorman 2 Copyright 2017 by Data Blueprint Slide #
  • 2. Micah Dalton 3Copyright 2017 by Data Blueprint Slide # Micah is a senior business leader with twenty years of management experience building and leading teams to deliver results across various industries including; financial services, public sector, non-profit and higher education. Micah’s expertise in offering pragmatic business solutions has made him valuable member of client team. Micah's skills focus on using data to drive root cause identification, analytics, strategy, financial analysis and reporting, procurement strategy and cost management, and operations analysis and management. Micah helped lead the development of Capital One’s Six Sigma program & completed his Black Belt training. Micah also holds certifications in Organizational Change Management (PROSCI) and Data Management (CDMP-Associate from DAMA). Micah earned his MBA from Duke’s Fuqua School of Business focusing his interests in corporate finance and business strategy. Prior to that Micah earned this Bachelor’s degree in economics from Mary Washington College. Additionally, Micah was a member of the 2014 class of Leadership Metro Richmond and has been an adjunct professor of Marketing at the University of Mary Washington. 4Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples
  • 3. Welcome to the Post-Big Data Era! 5Copyright 2017 by Data Blueprint Slide # Data Velocity Data Volume Data Variety Big Data: Expanding on 3 Fronts at an Increasing Rate Big Data(has something to do with Vs - doesn't it?) • Volume – Amount of data • Velocity – Speed of data in and out • Variety – Range of data types and sources • 2001 Doug Laney • Variability – Many options or variable interpretations confound analysis • 2011 ISRC • Vitality –A dynamically changing Big Data environment in which analysis and predictive models must continually be updated as changes occur to seize opportunities as they arrive • 2011 CIA • Virtual – Scoping the discussion to only include online assets • 2012 Courtney Lambert • Value/Veracity • Stuart Madnick (John Norris Maguire Professor of Information Technology, MIT Sloan School of Management & Professor of Engineering Systems, MIT School of Engineering) 6Copyright 2017 by Data Blueprint Slide #
  • 4. The 13 V’s of Big Data • Vast Volume of Vigorously, Verified, Vexingly, Variable, Verbose yet Valuable, Vital, Visualized, high Velocity and Veracity data that encourages the Vanity of the big data experts – Original from John Marshey – Sillicon Graphics 1998
 (with contributed extensions) 7Copyright 2017 by Data Blueprint Slide # • We have no objective definition of big data! – Any measurements, claims of success, quantifications, etc. must be viewed skeptically and with suspicion! 8Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples
  • 5. I shall not today attempt further to define the kinds of material but I know it when I see it ... (Justice Potter Stewart) 9Copyright 2017 by Data Blueprint Slide # Big Data 10Copyright 2017 by Data Blueprint Slide #
  • 6. Big Data 11Copyright 2017 by Data Blueprint Slide # [ Techniques / Technologies ] 12Copyright 2017 by Data Blueprint Slide # Big Data
  • 7. Big Data Techniques • New techniques available to impact the productivity (order of magnitude) of any analytical insight cycle that compliment, enhance, or replace conventional (existing) analysis methods • Big data techniques are currently characterized by: – Continuous, instantaneously 
 available data sources – Non-von Neumann 
 Processing (defined later in the presentation) – Capabilities approaching 
 or past human comprehension – Architecturally enhanceable 
 identity/security capabilities – Other tradeoff-focused data processing • So a good question becomes "where in our existing architecture can we most effectively apply Big Data Techniques?" 13Copyright 2017 by Data Blueprint Slide # The Big Data Landscape Copyright Dave Feinleib, bigdatalandscape.com 14Copyright 2017 by Data Blueprint Slide #
  • 8. The Big Data Landscape 2.0 15Copyright 2017 by Data Blueprint Slide # The Big Data Landscape 3.0 Copyright Dave Feinleib, bigdatalandscape.com 16Copyright 2017 by Data Blueprint Slide #
  • 9. Internet of Things Landscape 2016 17Copyright 2017 by Data Blueprint Slide # 18Copyright 2017 by Data Blueprint Slide # http://blogs.cisco.com/sp/from-internet-of-things-to-web-of-things/
  • 10. Howmuchdatais
 generatedeveryminute! 19Copyright 2017 by Data Blueprint Slide # 20Copyright 2017 by Data Blueprint Slide # 114# Rela%onal(zone( Non+rela%onal(zone( Lotus#Notes# Objec/vity# MarkLogic# InterSystems# Caché# McObject# Starcounter# ArangoDB# Founda/onDB# Neo4J# InfiniteGraph# CouchDB# Oracle#NoSQL# Redis# Handlersocket# ##RavenDB# RethinkDB# Google#App## Engine#Datastore# LevelDB# Accumulo# Cassandra# HBase# Riak# Couchbase# Splice#Machine# Ac/an#Ingres# SAP#Sybase#ASE# EnterpriseDB# SQL## Server# MySQL# Informix#MariaDB# SAP## HANA# # IBM# DB2# Database.com# ClearDB# Google#Cloud#SQL# Rackspace# Cloud#Databases# AWS#RDS# Azure#SQL# Database# FathomDB# HP#Cloud#RDB# #for#MySQL# StormDB# Teradata## Aster# HPCC# Cloudera# Hortonworks#MapR# IBM## BigInsights# ZeWaset# NGDATA# Infochimps# Metascale# Mortar# Data# Rackspace# Qubole# Voldemort# Aerospike# Teradata# IBM#PureData# for#Analy/cs# Pivotal#Greenplum# HP#Ver/ca# SAP#Sybase#IQ# IBM#InfoSphere# Ac/an#Vector# XtremeData# Kx#Systems# Exasol# Ac/an#Matrix# ParStream# TokuDB# ScaleDB# ScaleArc# Con/nuent# TransLa[ce# NuoDB# Drizzle# JustOneDB# Pivotal#GemFire#XD# Galera# ScaleBase# Zimory#Scale# Clustrix# Tesora# MemSQL# GenieDB# Datomic#YarcData# FlockDB# Allegrograph# HypergraphDB# AffinityDB# Giraph# Trinity# MemCachier# Redis#Labs# Memcached#Cloud# FairCom# BitYota# IronCache# Grid/cache(zone( Memcached# Ehcache# ScaleOut# So^ware# IBM## eXtreme## Scale# Oracle## Coherence# GigaSpaces#XAP#GridGain# Pivotal# GemFire# CloudTran# InfiniSpan# Hazelcast# Oracle# Exaly/cs# Oracle# Database# MySQL#Cluster# Oracle## Endeca#Server# A[vio# Elas/csearch# LucidWorks# Big#Data# Lucene/Solr# IBM#InfoSphere## Data#Explorer# Towards( E*discovery( Towards( enterprise(search( Documentum# xDB# Tamino# XML#Server# Ipedo#XML# Database# ObjectStore# LucidDB# MonetDB# Metamarkets#Druid# Databricks/Spark# AWS# Elas/Cache# # Firebird# SciDB# SQLite# Oracle#TimesTen# solidDB# Adabas# IBM#IMS# UniData# UniVerse# WakandaDB# Al/scale# Oracle#Big## Data#Appliance# RainStor# OrientDB# Sparksee# Metamarkets# Treasure# Data# PostgreSQL# Percona# vFabric#Postgres# ©#2014#by#451#Research#LLC.## All#rights#reserved## HyperDex# TIBCO# Ac/veSpaces# Titan# SAP#Sybase#SQL#Anywhere# JethroData# CitusDB# Pivotal# HD/HAWQ# BigMemory# Ac/an# Versant# DataStax# Enterprise# DeepDB# Infobright# FatDB# Google#Cloud# Datastore# Heroku# Postgres# GrapheneDB# Cassandra.io# Hypertable# BerkeleyDB# Sqrrl# Enterprise# Microso^# HDInsight# HP# Autonomy# Oracle# Exadata# IBM## PureData# IBM# Big#SQL# Impala# Apache# Drill# Presto# Microso^# SQL#Server# PDW# Apache# Tajo# Apache# Hive# SPARQLBASE# MammothDB# Al/base#HDB# LogicBlox# SRCH2# TIBCO# LogLogic# Splunk# Towards( SIEM( Loggly# Sumo# Logic#Logentries# InfiniSQL# JumboDB# Ac/an#PSQL# Progress#OpenEdge# Kogni/o# Al/base#XDB# Savvis# So^layer# Verizon# xPlenty# Stardog# MariaDB#Enterprise# Apache#Storm# Apache#S4# IBM# InfoSphere# Streams# TIBCO# StreamBase# DataTorrent# AWS# Kinesis# Feedzai# Guavus# Lokad# SQLStream# So^ware#AG# Key:(( General#purpose# Specialist#analy/c# BigTables# Graph# Document# Key#value#stores# easeaeService# Key#value#direct## access# Hadoop# MySQL#ecosystem# Advanced## clustering/sharding# New#SQL#databases# Data#caching# Data#grid# Search# Appliances# Inememory# Stream#processing# OpenStack#Trove# 1010data# Google## BigQuery# AWS# Redshi^# TempoIQ# InfluxDB# WebScaleSQL# MySQL# Fabric#Spider# 2# E# D A B C TeSystems# E# D A B C 2# 4#3# 5# SQream# SpaceCurve# PostgreseXL# Google#Cloud## Dataflow# Trafodion# Hadapt# Azure# Search# Red#Hat#JBoss# Data#Grid# 6#5#4# MongoDB# Cloudant# Iris#Couch# MongoLab# Compose# ObjectRocket# CloudBird# Azure#DocumentDB# 1# 3# 1# 6# Data Platforms Map October 2014 https:// 451research.com/ dashboard/dpa CockroachDB# AWS#DynamoDB# Redisetoego# AWS#SimpleDB# Redis#Labs# Redis#Cloud# RedisGreen# AWS#Elas/Cache# with#Redis# MagnetoDB# ObjectRocket# Redis# ##TokuMX# VoltDB# CortexDB# CodeFutures# Oracle#Big## Data#Cloud# AWS# EMR# Google## Compute# Engine# Stra/o#
  • 11. 21Copyright 2017 by Data Blueprint Slide # INDEX# D6 #1010data# D2 #Accumulo# B3 #Ac/an#Ingres# C6 #Ac/an#Matrix# B5 #Ac/an#PSQL# C6 #Ac/an#Vector# E1 #Ac/an#Versant# D1 #Adabas# C2 #Aerospike# E1 #AffinityDB# E1 #Allegrograph# D3 #Al/base#HDB# D3 #Al/base#XDB# A3 #Al/scale# B4 #Apache#Drill# B4 #Apache#Hive# A2 #Apache#S4# A2 #Apache#Storm# B3 #Apache#Tajo# B2 #ArangoDB# A1 #A[vio# E2 #AWS#DynamoDB# E4 #AWS#Elas/Cache# E2 #AWS#Elas/Cache#with#Redis# A4 #AWS#EMR# A2 #AWS#Kinesis# D5 #AWS#RDS# D6 #AWS#Redshi^# E2 #AWS#SimpleDB# E2 #Azure#DocumentDB# B2 #Azure#Search# D5 #Azure#SQL#Database# D2 #BerkeleyDB# E4 #BigCache# E4 #BigMemory# D6 #BitYota# C2 #Cassandra# D2 #Cassandra.io# B5 #CitusDB# D5 #ClearDB# E2 #Cloudant# D2 #CloudBird# A5 #Cloudera# E5 #CloudTran# C4 #Clusrix# C3 #CockroachDB# C4 #CodeFutures# D2 #Compose# D4 #Con/nuent# B2 #CortexDB# C2 #Couchbase# D2 #CouchDB# D5 #Database.com# A5 #Databricks/Spark# C2 #DataStax#Enterprise# # A2 #DataTorrent# C3 #Datomic# D4 #DeepDB# C1 #Documentum#xDB# C4 #Drizzle# E5 #Ehcache# A1 #Elas/csearch# B3 #EnterpriseDB# C4 #CodeFutures# C4 #CodeFutures# E2 #Compose# D4 #Con/nuent# C2 #Couchbase# D2 #CouchDB# D5 #Database.com# A5 #Databricks/Spark# C2 #DataStax#Enterprise# A2 #DataTorrent# C3 #Datomic# D4 #DeepDB# E2 #DocumentDB# C1 #Documentum#xDB# C5 #Drizzle# E5 #Ehcache# A1 #Elas/csearch# B3 #EnterpriseDB# C5 #Exasol# C3 #FairCom# C2 #FatDB# D5 #FathomDB# A2 #FeedZai# B3 #Firebird# D1 #FlockDB# C2 #Founda/onDB# D4 #Galera# C4 #GenieDB# E4 #GigaSpaces#XAP# E1 #Giraph# D5 #Google#BigQuery# D2 #Google#App#Engine#Datastore# A2 #Google#Cloud#Dataflow# D2 #Google#Cloud#Datastore# C5 #Google#Cloud#SQL# A4 #Google#Compute#Engine# D1 #GrapheneDB# E3 #GridGain# A2 #Guavus# B5 #Hadapt# C2 #Handlersocket# E5 #Hazelcast# C2 #HBase# C5 #Heroku#Postgres# A5 #Hortonworks# A1 #HP#Autonomy# D5 #HP#Cloud#RDB#for#MySQL# C6 #HP#Ver/ca# # B6 #HPCC# D2 #HyperDex# E1 #HypergraphDB# C2 #Hypertable# B4 #IBM#Big#SQL# A5 #IBM#BigInsights# B4 #IBM#DB2# E6 #IBM#eXtreme#Scale# D1 #IBM#IMS# C6 #IBM#InfoSphere# B2 #IBM#InfoSphere#Data#Explorer# A2 #IBM#InfoSphere#Streams# B4 #IBM#PureData# B6 #IBM#PureData#for#Analy/cs# B5 #Impala# E6 #InfiniSpan# C3 #InfiniSQL# E1 #InfiniteGraph# D6 #InfluxDB# C4 #Infobright# A3 #Infochimps# B5 #Informix# E1 #Intersystems#Caché# C1 #Ipedo#XML#Database# E2 #Iris#Couch# E4 #IronCache# B5 #JethroData# D2 #JumboDB# C3 #JustOneDB# C6 #Kogni/o# C6 #Kx#Systems# D2 #LevelDB# B1 #Logentries# B1 #Loggly# D6 #LogicBlox# A2 #Lokad# E2 #Lotus#Notes# A1 #Lucene/Solr# C6 #LucidDB# B2 #LucidWorks#Big#Data# E2 #MagnetoDB# B4 #MammothDB# A4 #MapR# B3 #MariaDB# B3 #MariaDB#Enterprise# B2 #MarkLogic# D1 #McObject# E5 #Memcached# E3 #MemCachier# C3 #MemSQL# A3 #Metamarkets# C6 #Metamarkets#Druid# A5 #Metascale# A5 #Microso^#HD#Insight# B5 #Microso^#SQL#Server## B5 #Microso^#SQL#Server#PDW# # # # # D6 #MonetDB# D2 #MongoDB# E2 #MongoLab# A3 #Mortar#Data# B4 #MySQL# C4 #MySQL#Cluster# C4 #MySQL#Fabric# C1 #Neo4J# B2 #NGDATA# C3 #NuoDB# E1 #Objec/vity# E2 #ObjectRocket# D2 #ObjectRocket#Redis# D1 #ObjectStore# C5 #OpenStack#Trove# A5 #Oracle#Big#Data#Appliance# A5 #Oracle#Big#Data#Cloud# E5 #Oracle#Coherence# B4 #Oracle#Database# A1 #Oracle#Endeca#Server# B4 #Oracle#Exadata# B6 #Oracle#Exaly/cs# D2 #Oracle#NoSQL# C5 #Oracle#TimesTen# C1 #OrientDB# C6 #ParStream# B3 #Percona# E4 #Pivotal#GemFire# D6 #Pivotal#Greenplum# B5 #Pivotal#HD/HAWQ# D3 #Pivotal#SQLFire# B3 #PostgreseXL# B3 #PostgreSQL# B4 #Presto# C5 #Progress#OpenEdge# A3 #Qubole# A3 #Rackspace# C5 #Rackspace#Cloud#Databases# B6 #RainStor# D2 #RavenDB# E6 #Red#Hat#JBoss#Data#Grid# C2 #Redis# E3 #Redis#Labs#Memcached#Cloud# E2 #Redis#Labs#Redis#Cloud# E2 #Redisetoego# E2 #RedisGreen# D2 #RethinkDB# C2 #Riak# B5 #SAP#HANA# B3 #SAP#Sybase#ASE# C6 #SAP#Sybase#IQ# B3 #SAP#Sybase#SQL#Anywhere# A3 #Savvis# C4 #ScaleArc# C4 #ScaleBase# C4 #ScaleDB# # E3 #ScaleOut#So^ware# B6 #SciDB# A3 #So^layer# A2 #So^ware#AG# C5 #solidDB# D6 #SpaceCurve# C1 #Sparksee# E1 #SPARQLBASE# C4 #Spider# B3 #Splice#Machine# B2 #Splunk# B3 #SQLite# A2 #SQLStream# B6 #SQream# B2 #Sqrrl#Enterprise# A1 #SRCH2# B2 #Starcounter# D1 #Stardog# C5 #StormDB# A6 #Stra/o# B1 #Sumo#Logic# A3 #TeSystems# C1 #Tamino#XML#Server# D6 #TempoIQ# B6 #Teradata# B6 #Teradata#Aster# C4 #Tesora# E4 #TIBCO#Ac/veSpaces# B1 #TIBCO#LogLogic# A2 #TIBCO#StreamBase# D1 #Titan# C4 #TokuDB# D2 #TokuMX# B3 #Trafodion# D3 #TransLa[ce# A4 #Treasure#Data# E1 #Trinity# C1 #UniData# C1 #UniVerse# A3 #Verizon# B3 #vFabric#Postgres# D2 #Voldemort# C3 #VoltDB# D1 #WakandaDB# D5 #WebScaleSQL# A3 #xPlenty# B6 #XtremeData# C1 #YarcData# A4 #ZeWaset# D4 #Zimory#Scale# # # # hWps://451research.com/dashboard/dpa# Big Data = Big Spending • Enterprises are spending wildly on Big Data but don’t know if it’s worth it yet (Business Insider, 2012) • Big Data Technology Spending Trend: • 83% increase over the next 3 years (worldwide): – 2012: $28 billion – 2013: $34 billion – 2016: $232 billion • Caution: – Don’t fall victim to SOS (Shiny Object 
 Syndrome) – A lot of money is being invested but 
 is it generating the expected return? – Gartner Hype Cycle suggests results 
 are going to be disappointing 22Copyright 2017 by Data Blueprint Slide # http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html http://www.gartner.com/DisplayDocument?id=2195915&ref=clientFriendlyUrl
  • 12. Big Data Technologies by themselves, are a One Legged Stool 23Copyright 2017 by Data Blueprint Slide # Governance is the major means of preventing over reliance on one legged stools! 24Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples
  • 13. Costpercomputingcycledeclining 25Copyright 2017 by Data Blueprint Slide # 26Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples
  • 14. 10X+++ rapid access 27Copyright 2017 by Data Blueprint Slide # "There’s now a blurring between the storage world and the memory world" • Faster processors outstripped not only the hard disk, but main memory – Hard disk too slow – Memory too small • Flash drives remove both bottlenecks – Combined Apple and Yahoo have spend more than $500 million to date • Make it look like traditional storage or more system memory – Minimum 10x improvements – Dragonstone server is 3.2 tb flash memory (Facebook) • Bottom line - new capabilities! 28Copyright 2017 by Data Blueprint Slide #
  • 15. Non-von Neumann Processing/Efficiencies • von Neumann bottleneck 
 (computer science) – "An inefficiency inherent in the design of any von Neumann machine that arises from the fact that most computer time is spent in moving information between storage and the central processing unit rather than operating on it"

 [http://encyclopedia2.thefreedictionary.com/von+Neumann+bottleneck] • Michael Stonebraker – Ingres (Berkeley/MIT) – Modern database processing is approximately 4% efficient • Many big data architectures are attempts to address this, but: – Zero sum game – Trade characteristics against each other • Reliability • Predictability – Google/MapReduce/ Bigtable – Amazon/Dynamo – Netflix/Chaos Monkey – Hadoop – McDipper • Big data techniques exploit non-von Neumann processing 29Copyright 2017 by Data Blueprint Slide # 30 What is NoSQL? Copyright 2017 by Data Blueprint Slide # • Commonly interpreted as both "No SQL" and "Not Only SQL • Broad class of database management technologies that provide a mechanism for storage and retrieval of data that doesn’t follow traditional relational database methodology. • Motivations – Simplicity of design – Horizontal scaling – Finer control over availability of the data. • The data structures used by NoSQL databases differ from those used in relational databases, making some operations faster in NoSQL and others 
 faster in relational 
 databases
  • 16. What is Hadoop? • A data storage and processing 
 system, that runs on clusters of commodity servers. • Able to store any kind of data in its native format. • Perform a wide variety of analyses and transformations. • Store terabytes, and even petabytes, of data inexpensively. • Handles hardware and system failures automatically, without losing data or interrupting data analyses. • Critical components of Hadoop: – HDFS- The Hadoop Distributed File System is the storage system for a Hadoop cluster, responsible for distribution of data across the servers. – Mapreduce- The inner workings of Hadoop that allows for distributed and parallel analytical job execution. 31Copyright 2017 by Data Blueprint Slide # One of Data Blueprint's Big Data Clusters 32Copyright 2017 by Data Blueprint Slide #
  • 17. Why NoSQL? Why Hadoop? • Large number of users (read: the internet) • Rapid app development and deployment • Large number of mission critical writes (sensors/etc) • Small, continuous reads and writes, especially where “Consistency” is less important (social networks) • Hadoop solves the hard scaling problems caused by large amounts of complex data. • As the amount of data in a cluster grows, 
 new servers can be added to a Hadoop 
 cluster incrementally and inexpensively 
 to store and analyze it. 33Copyright 2017 by Data Blueprint Slide # Hadoop Use Cases in the Real World • Risk Modeling • Customer Churn Analysis • Recommendation Engine • Ad Targeting • Point of Sale Transaction Analysis • Social Sentiment on Social Media • Analyzing network data to predict failure • Threat analysis • Trade Surveillance 34Copyright 2017 by Data Blueprint Slide #
  • 18. 35Copyright 2017 by Data Blueprint Slide # http://blogs.informatica.com/perspectives/uk/2011/08/09/hadoop-enriches-data-science-part-2-of-hadoop-series/ Potential Tradeoffs: CAP theorem: consistency, availability and partition-tolerance 36Copyright 2017 by Data Blueprint Slide # 
 
 
 Partition 
 (Fault)
 Tolerance 
 Availability 
 Consistency RDBMS NOSQL Atomicity Consistency Isolation Durability Basic Availability Soft-state Eventual consistency
  • 19. 37Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples Pacman • Decomposition • Reassembly – not optional! 38Copyright 2017 by Data Blueprint Slide #
  • 20. 39Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples Sandwich use case • Landing Zone (less expensive) – Especially useful in cases were data is highly disposable • Existing technologies are the – Contents sandwiched and 
 complemented landing zone and archival capabilities • Archiving/Offloading (less need for structure) – "Cold" transactional and analytic data
 Adapted from Nancy Kopp: 
 http://ibmdatamag.com/2013/08/relishing-the-big-data-burger/ 40Copyright 2017 by Data Blueprint Slide # Landing_Zone Archiving_Offloading Existing 
 Data Architectural 
 Processing
  • 21. See Like a Snake 41Copyright 2017 by Data Blueprint Slide # 42Copyright 2017 by Data Blueprint Slide #
  • 22. Pit Organ 43Copyright 2017 by Data Blueprint Slide # They can switch back and forth between those two systems, or use both simultaneously, giving them a leg up, so to speak, when it comes to targeting a warm object. Pit Organ 44Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples
  • 23. 
 
 
 
 
 
 
 
 
 
 
 
 
 <-Feedback Discernm ent 
 Exploitable
 Insight • Patterns/objects, hypotheses emerge – What can be observed? • Operationalizing – The dots can be repeatedly connected Analytics Insight Cycle ! Exis&ng! Knowledge /base • Things are happening – Sensemaking techniques address "what" is happening? • Patterns/objects, hypotheses emerge – What can be observed? • Operationalizing – The dots can be repeatedly connected – "Big Data" contributions are shown in orange • Margaret Boden's computational creativity – Exploratory – Combinational – Transformational 45Copyright 2017 by Data Blueprint Slide # Volume Velocity Variety 
 
 Potential/ actual insights Pattern/Object
 Emergence 
 Analytical bottleneck
 
 
 
 
 
 
 
C om bined/ inform ed insights "Sensemaking" 
 Techniques Humans Generally Better Machines Generally Better • Sense low level stimuli • Detect stimuli in noisy background • Recognize constant patterns in varying situations • Sense unusual and unexpected events • Remember principles and strategies • Retrieve pertinent details without a priori connection • Draw upon experience and adapt decision to situation • Select alternatives if original approach fails • Reason inductively; generalize from observations • Act in unanticipated emergencies and novel situations • Apply principles to solve varied problems • Make subjective evaluations • Develop new solutions • Concentrate on important tasks when overload occurs • Adapt physical response to changes in situation • Sense stimuli outside human's range • Count or measure physical quantities • Store quantities of coded information accurately • Monitor prespecified events, especially infrequent • Make rapid and consisted responses to input signals • Recall quantities of detailed information accurately • Retrieve pertinent detailed without a priori connection • Process quantitative data in prespecified ways • Perform repetitive preprogrammed actions reliably • Exert great, highly controlled physical force • Perform several activities simultaneously • Maintain operations under heavy operation load • Maintain performance over extended periods of time J. C. R. Licklider's Man-Computer Symbiosis 46Copyright 2017 by Data Blueprint Slide # Best approaches combines manual and automated methods!
  • 24. 47Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples Gartner Recommendations 48Copyright 2017 by Data Blueprint Slide # Impacts Top RecommendationsSome of the new analytics that are made possible by big data have no precedence, so innovative thinking will be required to achieve value Treat big data projects as innovation projects that will require change management efforts. The business will take time to trust new data sources and new analytics Creative thinking can unearth valuable information sources already inside the enterprise that are underused Work with the business to conduct an inventory of internal data sources outside of IT's direct control, and consider augmenting existing data that is IT 'controlled.' With an innovation mindset, explore the potential insight that can be gained from each of these sources Big data technologies often create the ability to analyze faster, but getting value from faster analytics requires business changes Ensure that big data projects that improve analytical speed always include a process redesign effort that aims at getting maximum benefit from that speed Gartner 2012
  • 25. Innovation • Innovation is the development of new customers value through solutions that meet new needs, inarticulate needs, or old customer and market needs in new ways. This is accomplished through different or more effective products, processes, services, technologies, or ideas that are readily available to markets, governments, and society. • Innovation differs from invention in that innovation refers to the use of a better and, as a result, novel idea or method, whereas invention refers more directly to the creation of the idea or method itself. • Innovation differs from improvement in that innovation refers to the notion of doing something different (Lat. innovare: "to change") rather than doing the same thing better. 49Copyright 2017 by Data Blueprint Slide # Data must be incorporated into the innovation-navigation process 50Copyright 2017 by Data Blueprint Slide #
  • 26. Two Uses for Data in Support of Innovation 1. Using data to keep the innovation process on track 2. Using data to innovate 51Copyright 2017 by Data Blueprint Slide # Q3 Q4 Q1 Q2 Innovation Series Kick Off Performance Reviews Intro to Innovative Thinking & Strategies Incremental Innovation Disruptive Innovation INSPIRATION TOOLS INFRASTRUCTURE ACCOUNTABILITY Think Different Series, Inspired Reading, Innovation Experiences Desk Drops, IS Innovation, Academy Classes Town Hall Meetings, Break Out Rooms, Innovation Board Performance Reviews Big Data Display - Game Plan 2013 Session - Big Data (Speaker TBD) Think Different Session - Innovation - Negotiation - Providing Feedback that Inspires * ALTRIA IS INNOVATION TRANSFORMATION 2012 - 2013 IS INNOVATION ECOSYSTEM Tech Display - Town Hall Session “Creating an Innovation Ecosystem” Evolution of Publishing Display - Think Different Session: Jason Ashlock - Effective Communication - Strategic Agility - Cultivating an Innovation Culture * iGloo Room Displays - Speaker Session - IS Innovation Academy * managers only KEY Experiential Component Ongoing Communications Q3 Sustaining Transformational Commercial Disruptive“Er” © THE FRONTIER PROJECT, LLC Data Data Data Copyright 2017 by Data Blueprint Slide # Data 52
  • 27. 53Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples Reengineering(Objective Definition) • How can state that you have improved any system? • If you don't understand the existing (legacy) systems strengths and weaknesses • You can't use that these to inform the new system • To reengineer – You must first reverse engineering and then – Use that information to architect the new system 54Copyright 2017 by Data Blueprint Slide # Legacy System Analysis
 (break down & compare) $$$Value New System Requirements New System
  • 28. 55Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples Copyright 2013 by Data Blueprint Potential Tradeoffs: CAP theorem: consistency, availability and partition-tolerance 56 
 
 
 Partition 
 (Fault)
 Tolerance 
 Availability 
 Consistency RDBMS NOSQL Small datasets can be both consistent & available Atomicity Consistency Isolation Durability Basic Availability Soft-state Eventual consistency
  • 29. 'Throw-away' prototyping • With 'throw-away' prototyping a small part of the system is developed and then given to the end user to try out and evaluate. The user provides feedback which can quickly be incorporated into the development of the main system. The prototype is then discarded or thrown away. 57Copyright 2017 by Data Blueprint Slide # 58Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples
  • 30. David Brooks, New York Times 59 Copyright 2015 by Data Blueprint • Data analysis struggles with the social – Your brain is excellent at social cognition - people can • Mirror each other’s emotional states • Detect uncooperative behavior • Assign value to things through emotion – Data analysis measures the quantity of social interactions but not the quality • Map interactions with co-workers you see during work days • Can't capture devotion to childhood friends seen annually – When making (personal) decisions about social relationships, it’s foolish to swap the amazing machine in your skull for the crude machine on your desk • Data struggles with context – Decisions are embedded in sequences and contexts – Brains think in stories - weaving together multiple causes and multiple contexts – Data analysis is pretty bad at • Narratives / Emergent thinking / Explaining • Data creates bigger haystacks – More data leads to more statistically significant correlations – Most are spurious and deceive us – Falsity grows exponentially greater amounts of data we collect • Big data has trouble with big problems – For example: the economic stimulus debate – No one has been persuaded by data to switch sides • Data favors memes over masterpieces – Detect when large numbers of people take an instant liking to some cultural product – Products are hated initially because they are unfamiliar • Data obscures values – Data is never raw; it’s always structured according to somebody’s predispositions and values Some Big Data Limitations Maslow's Hierarchiy of Needs 60Copyright 2017 by Data Blueprint Slide #
  • 31. You can accomplish Advanced Data Practices without becoming proficient in the Foundational Data Practices however 
 this will: • Take longer • Cost more • Deliver less • Present 
 greater
 risk
(with thanks to Tom DeMarco) Data Management Practices Hierarchy Advanced 
 Data 
 Practices • MDM • Mining • Big Data • Analytics • Warehousing • SOA Foundational Data Practices Data Platform/Architecture Data Governance Data Quality Data Operations Data Management Strategy Technologies Capabilities 61Copyright 2017 by Data Blueprint Slide # 62Copyright 2017 by Data Blueprint Slide # Implementing Big Data, NOSQL, & HADOOP
 Demystifying Big Data: Bigger is (Usually) Better • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples • Why it is important to consider the messenger – What is being "sold?" – We are using the wrong vocabulary to discuss this topic • Technically what are Big Data Technologies good at? – Computers→ commodity-based computing infrastructure – Flash memory is currently obeying Moore's Law – RAM→increased processing – Parallel-friendly approaches (lots of repeatable actions) • Successful Big Data Approaches ... – Innovation – Reengineering (precise definition) – Throw away Prototyping • How does that help operationally? – Solid support community – Examples
  • 32. Copyright 2013 by Data Blueprint Social Sentiment Analysis • One of the burgeoning areas for use of Big Data / Hadoop platforms. • Allows for the landing of multiple sources of unstructured data. (Twitter, Facebook, Linked In, etc.) • Data than can be analyzed with algorithms looking for keywords that determine positive/negative feedback 63 Copyright 2013 by Data Blueprint 64 Operational Use • Utilize real time pricing data from multiple sources to dynamically update the pricing for books in the Amazon Marketplace. • Ingested data from multiple sources looking for real time changes in price. • Would apply predictive model to determine best price point and set price of their books on the marketplace. • Increased conversion rate, but created a race to the bottom situation if not monitored
  • 33. Copyright 2013 by Data Blueprint 65 Healthcare Example: Patient Data • Clinical data: – Diagnosis/prognosis/treatment – Genetic data • Patient demographic data • Insurance data: – Insurance provider – Claims data • Prescriptions & pharmacy information • Physical fitness data – Activity tracking through 
 smartphone apps & social media • Health history • Medical research data Copyright 2013 by Data Blueprint 66 http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/ Retail Example: Loyalty Programs & Big Data • Companies need to understand current wants and needs AND predict future tendencies • Customer -> Repeat Customer -> Brand Advocate • Customer loyalty programs & retention strategies – Track what is being purchased and how often – Coupons based on purchasing history – Targeted communications, campaigns & special offers – Social media for additional interactions – Personalize consumer interactions • Customer purchase history influences product placements – Retailers rapidly respond to consumer demands – Product placements, planogram optimization, etc.
  • 34. Copyright 2013 by Data Blueprint 67 References • The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November 20, 2012) • McKinsey: Big Data: The next frontier for innovation, competition and productivity (http://www.mckinsey.com/insights/business_technology/ big_data_the_next_frontier_for_innovation?p=1) • The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/ 2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics) • Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/ 2575515) • The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/ 2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&) • CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/ 429681/five_steps_how_better_manage_your_data/) • Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’ But Don’t Know If It’s Worth It Yet (http://www.businessinsider.com/enterprise-big-data- spending-2012-11#ixzz2cdT8shhe) • Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/ kathleen-kim/big-data-spending-to-increase-for-it-industry.html) • Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/xerox/ 2013/09/27/big-data-boosts-customer-loyalty-no-really/) Copyright 2013 by Data Blueprint It’s your turn! Use the chat feature or Twitter (#dataed) to submit your questions to everyone now 68 Questions?
  • 35. 10124 W. Broad Street, Suite C Glen Allen, Virginia 23060 804.521.4056