SlideShare a Scribd company logo
1 of 94
Download to read offline
• Big Data
could
know us
better than
we know
ourselves
– Dan
Gardner
• We'll see this as the
time in history wh
the world's
information was
transformed from
inert, passive stat
and put into a
unified system th
brings that
information alive
– Michael Nielsen
ow have a
ce to en
me the
of our
nowledge
rse, one an
onstantly e,
figures
to match
eeds
hael S.
one
at
A Framework for Implementing
NoSQL, Hadoop
• N • Today a street stall in Mumbai can access more
b information, maps, statistics, academic papers, price
n trends, futures markets, and data than a U.S.
c President could only a few decades ago
– – Juan Enriquez
ot everything that can
e counted counts, and
ot everything that
ounts can be counted
Albert Einstein
Big Data and NoSQL continue to make headlines everywhere.
However, most of what has been written about these topics is
focused on the hardware, services, and scale out. But what about
a Big Data and NoSQL Strategy, one that supports your business
strategy? Virtually every major organization thinking about these
data platforms is faced with the challenge of figuring out the
appropriate approach and the requirements. This presentation will
provide guidance on how to think about and establish realistic Big
Data management plans and expectations. We will introduce a
framework for evaluating the various choices when it comes to
implementing and succeeding with Big Data/NoSQL and show
how to demonstrate a sample use case.
Takeaways:
• A Framework for evaluating Big Data techniques
• Deciding on a Big Data platform – How do you know which one
is a good fit for you?
• The means by which big data techniques can complement
existing data management practices
• The prototyping nature of practicing big data techniques
• The distinct ways in which utilizing Big Data can generate
business value
Date:
Time:
Presenter:
June 9, 2015
2:00 PM ET/11:00AM PT
PeterAiken, Ph.D. & Josh Bartels
• Soon we will salt the oceans, the land, and the sk
with uncounted numbers of sensors invisible to th
eyes but visible to one another
• We n – Esther Dyson
chan
beco
center
own k
unive
that c
recon
itself
our n
– Mic
Mal
• We've reached a tipping point in history: today more y
data is being manufactured by machines, servers, e
and cell phones, than by people
– Michael E. Driscoll
• Every century, a new technology-steam power,
electricity, atomic energy, or microprocessors-has
swept away the old world with a vision of a new one.
Today, we seem to be entering the era of Big Data
– Michael Coren
1Copyright 2015 by Data Blueprint Slide #
Shannon Kempe
Executive Editor at DATAVERSITY.net
2Copyright 2015 by Data Blueprint Slide #
Steven MacLauchlan
• 10 years of experience in Application
Development and Data Modeling with a
focus on Healthcare solutions.
• Delivers tailored data management
solutions that provide focus on data’s
business value while enhancing clients’
overall capability to manage data
• Certified Data Management Professional (CDMP)
• Computer Science degree from Virginia Commonwealth
University
• Most recent focus: Understanding emerging
data modeling trends and how these can
best be leveraged for the Enterprise.
3Copyright 2015 by Data Blueprint Slide #
Get Social With Us!
Live Twitter Feed
Join the conversation!
Follow us:
@datablueprint
@paiken
Ask questions and submit
your comments: #dataed
Like Us on Facebook
www.facebook.com/
datablueprint
Post questions and comments
Find industry news, insightful
content
and event updates.
Join the Group
Data Management &
Business Intelligence
Ask questions, gain insights
and collaborate with fellow
data management
professionals
4Copyright 2015 by Data Blueprint Slide #
Peter Aiken, Ph.D.
• 30+ years in data management
• Repeated international recognition
• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS (vcu.edu)
• DAMA International (dama.org)
• 9 books and dozens of articles
• Experienced w/ 500+ data
management practices
• Multi-year immersions:
– US DoD
– Nokia
– Deutsche Bank
– Wells Fargo
– Walmart
– …
• DAMA International President 2009-2013
• DAMA International Achievement Award 2001 (with
Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
PETERAIKEN WITH JUANITABILLINGS
F OR EW O RD B Y J O H N B OTTEGA
MONETIZING
DATA M AN AGEM EN T
Unlocking the Value in Your Organization’s
Most Important Asset.
TheCaseforthe
Chief ta fficer
Recasting uite erage
Your Most aluable A
Peter Aikenand
Michael Gorman
5Copyright 2015 by Data Blueprint Slide #
Josh Bartels
• Data management consultant and
leader
– Over (10) years of experience
– Multiple industries (Finance, Defense,
Insurance)
• Certifications
– Certified Data Management
Professional (CDMP)
– Project Manager (PMP)
– Data Vault 2.0 Practitioner (CDVP2)
• Education
– Masters in Business Administration
– Masters in Information Systems
• Current Efforts
– focus on the creation and migration to
new data platforms for clients in the
financial and insurance industries.
6Copyright 2015 by Data Blueprint Slide #
Presented by Peter Aiken, Ph.D., Josh Bartels, Steven MacLauchlan
A Framework for Implementing
NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right
Approach for Implementing Big Data Techniques
7Copyright 2015 by Data Blueprint Slide #
A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
Tweeting now at: #dataed
8Copyright 2015 by Data Blueprint Slide #
A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
Tweeting now at: #dataed
10Copyright 2015 by Data Blueprint Slide #
Myth #1: Big Data has a clear definition
Fact:
• The term is used so often
and in many contexts that
its meaning has become
vague and ambiguous
• Industry experts and
scientists often disagree
http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics
10Copyright 2015 by Data Blueprint Slide #
Big Data(has something to do with Vs - doesn't it?)
• Volume
– Amount of data
• Velocity
– Speed of data in and out
• Variety
– Range of data types and sources
• 2001 Doug Laney
• Variability
– Many options or variable interpretations confound analysis
• 2011 ISRC
•Vitality
–A dynamically changing Big Data environment in which analysis and predictive models
must continually be updated as changes occur to seize opportunities as they arrive
• 2011 CIA
•Virtual
– Scoping the discussion to only include online assets
• 2012 Courtney Lambert
• Value/Veracity
• Stuart Madnick (John Norris Maguire Professor of Information Technology, MIT Sloan School of
Management & Professor of Engineering Systems, MIT School of Engineering)
11Copyright 2015 by Data Blueprint Slide #
Defining Big Data
• Big Data are high-volume, high-velocity, and/or high-variety
information assets that require new forms of processing to
enable enhanced decision making, insight discovery
and process optimization.
– Gartner 2012
• Big data refers to datasets whose size is beyond the ability of
typical database software tools to capture, store, manage, and analyze.
– IBM 2012
• An all-encompassing term for any collection of data sets so large and complex that it
becomes difficult to process using on-hand data management tools or traditional data
processing applications
– Wikipedia 2014
• Shorthand for advancing trends in technology that open the door to a new approach
to understanding the world and making decisions.
– NY Times 2012
• The broad range of new and massive data types that have appeared over the last
decade
– Tom Davenport 2014
• Data of a very large size, typically to the extent that its manipulation and management
present significant logistical challenges.”
– Oxford English Dictionary 2014
• Big data is about putting the "I" back into IT.
– PeterAiken 2007
12Copyright 2015 by Data Blueprint Slide #
Big Data Techniques
• New techniques available to impact the productivity (order of
magnitude) of any analytical insight cycle that compliment,
enhance, or replace conventional (existing) analysis methods
• Big data techniques are currently characterized by:
– Continuous, instantaneously
available data sources
– Non-von Neumann
Processing (defined later in the presentation)
– Capabilities approaching
or past human comprehension
– Architecturally enhanceable
identity/security capabilities
– Other tradeoff-focused data processing
• So a good question becomes "where in our existing architecture
can we most effectively apply Big Data Techniques?"
13Copyright 2015 by Data Blueprint Slide #
Big Data Technologies by themselves, are a One Legged Stool
Governance is the major means
of preventing over reliance on
one legged stools!
14Copyright 2015 by Data Blueprint Slide #
The Big Data Landscape
Copyright Dave Feinleib, bigdatalandscape.com
15Copyright 2015 by Data Blueprint Slide #
Rela%onalzone
Microso^
Non+rela%onalzone
LotusNotes
Objec/vity
MarkLogic
Ac/an
Versant
InterSystems
Caché
McObject
Starcounter
ArangoDB
Founda/onDB
Neo4J
InfiniteGraph
Cloudant
RethinkDB
CouchDB
BerkeleyDB
RavenDB LevelDB
OracleNoSQL
Riak
Couchbase
Redis
Handlersocket
Cassandra.io
GoogleApp
Engine Datastore
GoogleCloud
Datastore
Accumulo
YarcDataCassandra
HBase
Verizon Splice
Machine
Firebird
Ac/an Ingres
SAPSybaseASE
EnterpriseDB
SQL
Server
MySQL
Informix
Exasol
MariaDB Oracle IBM
Database DB2
SAP
HANA
Database.com
AWS RDS
ClearDB
GoogleCloud SQL
HPCloudRDB
forMySQL
FathomDB
StormDB
RackspaceCloud
Databases
Azure SQL
Database
Teradata
Aster
OracleBig Data
Appliance
SciDB HPCC
Cloudera
HortonworksMapR IBM
BigInsights
ZeWaset
NGDATA
LucidWorks
BigData
Infochimps
Metamarkets
Metascale
Mortar
Data
Al/scale
Rackspace
Qubole
Voldemort
TokuDB
CortexDB Aerospike
RainStor
IBMPureData
forAnaly/cs
SQream
Teradata
Kogni/o
LucidDB
KxSystems
Ac/an Matrix
IBMInfoSphere
ParStream
SAP SybaseIQ
HPVer/ca
Pivotal Greenplum
MonetDB
LogicBlox
SpaceCurve
XtremeData
MetamarketsDruid
Ac/an Vector
MySQLClusterClustrix ScaleDB
ScaleBase
ScaleArc
Tesora
CodeFutures
Con/nuent
Datomic
CockroachDB
JustOneDB
TransLa[ce
NuoDB
Drizzle
Pivotal GemFire XD
ZimoryScale
Galera
DeepDB
FairCom MemSQL GenieDB
Infobright
FlockDB
Allegrograph
HypergraphDB
AffinityDB MongoDB
SPARQLBASE
Giraph
Trinity MemCachier
Redis Labs
MemcachedCloud
BitYota
IronCache
Grid/cachezone
Memcached
Ehcache
ScaleOut
So^ware
IBM
eXtreme
Scale
Oracle
Coherence
GigaSpacesXAPGridGain
Pivotal
GemFire
CloudTran
InfiniSpan
Hazelcast
Oracle
Exaly/cs
OracleEndeca
Server A[ v io
Elas/csearch
Towards
enterprise search
Lucene/Solr
IBMInfoSphere
DataExplorer
Sumo
Logic
A Towards
E*discovery
Database
Tamino
XMLServer
Documentum
xDB
UniData
UniVerse
Adabas
OrientDB
Ipedo XML
ObjectStore
AWS
Elas/Cache
IBMIMS
WakandaDB
Sparksee
https://
E 451research.com/
dashboard/dpa
©2014by451ResearchLLC.
Allrightsreserved
HyperDex
TIBCO
Ac/veSpaces
Titan
BigMemory
FatDB
GrapheneDB
Hypertable
Al/base HDB
Al/base XDB
JumboDB
Stardog
Datacaching
Datagrid
Search
Appliances
Inememory
Streamprocessing
Redshi^
1010data
Google
BigQuery
AWS
TempoIQ
InfluxDB
WebScaleSQL
2
D
E
D
RedHat JBoss
DataGrid
654
Iris Couch
MongoLab
Compose
Redis Labs
Redis Cloud
ObjectRocket
Azure DocumentDB
TokuMX
CloudBird
1 3
AWSDynamoDB
RedisGreen
Redisetoego
AWSSimpleDB
AWS Elas/Cache
with Redis
MagnetoDB
ObjectRocket
Redis
Databricks/Spark
OracleBig
DataCloud
SQLite
Ac/an PSQL
ProgressOpenEdge
OracleTimesTen
solidDB
Heroku
Postgres
Treasure
Data
vFabric Postgres
PostgreSQL
Percona
SAPSybaseSQLAnywhere
Presto Impala JethroData
IBM
Big SQL CitusDB Hadapt
Pivotal
HD/HAWQ
DataStax
Enterprise
Sqrrl
Enterprise
Microso^
HDInsight
HP
Autonomy
Oracle
Exadata
IBM
PureData
Apache
Drill
SQLServer
PDW
Apache
Tajo
Apache
Hive
MammothDB
SRCH2
TIBCO
LogLogic
Splunk
Towards
SIEM
Loggly
Logentries
InfiniSQL
Savvis
So^layer
xPlenty
Trafodion
MariaDBEnterprise
ApacheStorm
ApacheS4
IBM
InfoSphere
Streams
TIBCO
StreamBase
AWS
Kinesis
SQLStream
DataTorrent
Feedzai
So^wareAG
Guavus
Lokad
Data
Platforms
Map
October 2014
Key:
General purpose
Specialist analy/c
easeaeService
BigTables
Graph
Document Key
value storesKey
value direct
access
Hadoop
MySQLecosystem
Advanced
clustering/sharding
NewSQL databases
OpenStackTrove
MySQL
FabricSpider
A
B
C
TeSystems
B
C
2 43 5
PostgreseXL
Azure GoogleCloud
Dataflow
Search
1 6
VoltDB
AWS
EMR
Google
Compute
Engine
Stra/o
16Copyright 2015 by Data Blueprint Slide #
C2 DataStax Enterprise C6 HPVer/ca B5 Microso^ SQLServer PDW C4 ScaleDB hWps://451research.com/dashboard/dpa
17Copyright 2015 by Data Blueprint Slide #
INDE
X D6
D2
B3 C6
1010data
Accumulo
Ac/an
Ingres
Ac/an
Matrix
A2
C3
D
4
C1
C4
DataTorrent
Datomic
DeepDB
Documentum
xDB Drizzle
B6
D
2
E1
C2
B4
HPCC
HyperDex
Hypergraph
DB
Hypertable
IBM Big SQL
D
6
D
2
E2
A3
B4
MonetDB
MongoDB
MongoLab
Mortar
Data
MySQL
E3
B6
A
3
A
2
C5
ScaleOut
So^ware SciDB
So^layer
So^ware
AG
solidDB
B5 Ac/an PSQL E5 Ehcache A5 IBM BigInsights C4 MySQL Cluster D6 SpaceCurve
C6 Ac/an Vector A1 Elas/csearch B4 IBM DB2 C4 MySQL Fabric C1 Sparksee
E1 Ac/an Versant B3 EnterpriseDB E6 IBM eXtreme Scale C1 Neo4J E1 SPARQLBASE
D1 Adabas C4 CodeFutures D1 IBM IMS B2 NGDATA C4 Spider
C2 Aerospike C4 CodeFutures C6 IBM InfoSphere C3 NuoDB B3 Splice Machine
E1 AffinityDB E2 Compose B2 IBM InfoSphere Data Explorer E1 Objec/vity B2 Splunk
E1 Allegrograph D4 Con/nuent A2 IBM InfoSphere Streams E2 ObjectRocket B3 SQLite
D3 Al/base HDB C2 Couchbase B4 IBM PureData D2 ObjectRocket Redis A2 SQLStream
D3 Al/base XDB D2 CouchDB B6 IBM PureData for Analy/cs D1 ObjectStore B6 SQream
A3 Al/scale D5 Database.com B5 Impala C5 OpenStack Trove B2 Sqrrl Enterprise
B4 Apache Drill A5 Databricks/Spark E6 InfiniSpan A5 Oracle Big Data Appliance A1 SRCH2
B4 Apache Hive C2 DataStax Enterprise C3 InfiniSQL A5 Oracle Big Data Cloud B2 Starcounter
A2 Apache S4 A2 DataTorrent E1 InfiniteGraph E5 Oracle Coherence D1 Stardog
A2 Apache Storm C3 Datomic D6 InfluxDB B4 Oracle Database C5 StormDB
B3 Apache Tajo D4 DeepDB C4 Infobright A1 Oracle Endeca Server A6 Stra/o
B2 ArangoDB E2 DocumentDB A3 Infochimps B4 Oracle Exadata B1 Sumo Logic
A1 A[vio C1 Documentum xDB B5 Informix B6 Oracle Exaly/cs A3 TeSystems
E2 AWS DynamoDB C5 Drizzle E1 IntersystemsCaché D2 Oracle NoSQL C1 Tamino XML
Server
E4 AWS Elas/Cache E5 Ehcache C1 Ipedo XML Database C5 Oracle TimesTen D6 TempoIQ
E2 AWS Elas/Cache with Redis A1 Elas/csearch E2 Iris Couch C1 OrientDB B6 Teradata
A4 AWS EMR B3 EnterpriseDB E4 IronCache C6 ParStream B6 Teradata Aster
A2 AWS Kinesis C5 Exasol B5 JethroData B3 Percona C4 Tesora
D5 AWS RDS C3 FairCom D2 JumboDB E4 Pivotal GemFire E4 TIBCO
Ac/veSpaces
D6 AWS Redshi^ C2 FatDB C3 JustOneDB D6 Pivotal Greenplum B1 TIBCO LogLogic
E2 AWS SimpleDB D5 FathomDB C6 Kogni/o B5 Pivotal HD/HAWQ A2 TIBCO
StreamBase
E2 Azure DocumentDB A2 FeedZai C6 Kx Systems D3 Pivotal SQLFire D1 Titan
B2 Azure Search B3 Firebird D2 LevelDB B3 PostgreseXL C4 TokuDB
D5 Azure SQL Database D1 FlockDB B1 Logentries B3 PostgreSQL D2 TokuMX
D2 BerkeleyDB C2 Founda/onDB B1 Loggly B4 Presto B3 Trafodion
E4 BigCache D4 Galera D6 LogicBlox C5 ProgressOpenEdge D3 TransLa[ce
E4 BigMemory C4 GenieDB A2 Lokad A3 Qubole A4 Treasure Data
D6 BitYota E4 GigaSpaces XAP E2 Lotus Notes A3 Rackspace E1 Trinity
C2 Cassandra E1 Giraph A1 Lucene/Solr C5 Rackspace Cloud Databases C1 UniData
D2 Cassandra.io D5 Google BigQuery C6 LucidDB B6 RainStor C1 UniVerse
B5 CitusDB D2 Google App Engine Datastore B2 LucidWorks Big Data D2 RavenDB A3 Verizon
D5 ClearDB A2 Google Cloud Dataflow E2 MagnetoDB E6 Red Hat JBoss Data Grid B3 vFabric Postgres
E2 Cloudant D2 Google Cloud Datastore B4 MammothDB C2 Redis D2 Voldemort
D2 CloudBird C5 Google Cloud SQL A4 MapR E3 Redis Labs Memcached Cloud C3 VoltDB
A5 Cloudera A4 Google Compute Engine B3 MariaDB E2 Redis Labs Redis Cloud D1 WakandaDB
E5 CloudTran D1 GrapheneDB B3 MariaDB Enterprise E2 Redisetoego D5 WebScaleSQL
C4 Clusrix E3 GridGain B2 MarkLogic E2 RedisGreen A3 xPlenty
C3 CockroachDB A2 Guavus D1 McObject D2 RethinkDB B6 XtremeData
C4 CodeFutures B5 Hadapt E5 Memcached C2 Riak C1 YarcData
D2 Compose C2 Handlersocket E3 MemCachier B5 SAP HANA A4 ZeWaset
D4 Con/nuent E5 Hazelcast C3 MemSQL B3 SAP Sybase ASE D4 Zimory Scale
B2 CortexDB C2 HBase A3 Metamarkets C6 SAP Sybase IQ
C2 Couchbase C5 Heroku Postgres C6 Metamarkets Druid B3 SAP Sybase SQL Anywhere
D2 CouchDB A5 Hortonworks A5 Metascale A3 Savvis
D5 Database.com A1 HP Autonomy A5 Microso^ HD Insight C4 ScaleArc
A5 Databricks/Spark D5 HP Cloud RDB for MySQL B5 Microso^ SQL Server C4 ScaleBase
Myth #2: Everyone should invest in Big Data
Fact:
• Not every company will
benefit from Big Data
• It depends on your size
and your ability
– Local pizza shop vs.
state-wide or national
chain
18Copyright 2015 by Data Blueprint Slide #
Big Data can create significant financial value across sectors
• Some (not all)
companies can
take advantage
of Big Data to
create value if
they want to
compete
20Copyright 2015 by Data Blueprint Slide #
A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
Tweeting now at: #dataed
20Copyright 2015 by Data Blueprint Slide #
Big Data = Big Spending
• Enterprises are spending wildly on Big Data but don’t
know if it’s worth it yet (Business Insider, 2012)
• Big Data Technology Spending Trend:
• 83% increase over the next 3 years (worldwide):
– 2012: $28 billion
– 2013: $34 billion
– 2016: $232 billion
• Caution:
– Don’t fall victim to SOS (Shiny Object
Syndrome)
– A lot of money is being invested but
is it generating the expected return?
– Gartner Hype Cycle suggests results
are going to be disappointing http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe
http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html
http://www.gartner.com/DisplayDocument?id=2195915&ref=clientFriendlyUrl
21Copyright 2015 by Data Blueprint Slide #
Who wrote this … ?
23
Copyright 2015 by Data Blueprint
• In considering any new
subject, there is
frequently a tendency
first to overrate what
we find to be already
interesting or
remarkable, and
secondly - by a sort of
natural reaction - to
undervalue the true
state of the case.
• AugustaAda King,
Countess of Lovelace - aka
Ada Lovelace, publisher of
the first computing program
Gartner Five-phase Hype Cycle
http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp
Peak of Inflated Expectations: Early publicity produces a number of
success stories—often accompanied by scores of failures. Some
companies take action; many do not.
Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the
technology shake out or fail. Investments continue only if the surviving providers improve their products to the
satisfaction of early adopters.
Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest
trigger significant publicity. Often no usable products exist and commercial viability is unproven.
Slope of Enlightenment: More instances of how the technology can benefit the
enterprise start to crystallize and become more widely understood. Second- and third-
generation products appear from technology providers. More enterprises fund pilots;
conservative companies remain cautious.
Plateau of Productivity: Mainstream adoption starts to
take off. Criteria for assessing provider viability are more
clearly defined. The technology’s broad market
applicability and relevance are clearly paying off.
23Copyright 2015 by Data Blueprint Slide #
Gartner Hype Cycle
"A focus on big data is not a substitute for the
fundamentals of information management."
24Copyright 2015 by Data Blueprint Slide #
2012 Big Data in Gartner’s Hype Cycle
25Copyright 2015 by Data Blueprint Slide #
2013 Big Data in Gartner’s Hype Cycle
26Copyright 2015 by Data Blueprint Slide #
2014 Big Data in Gartner’s Hype Cycle
27Copyright 2015 by Data Blueprint Slide #
Big Data Gartner Hype Cycle
Copyright 2015 by Data Blueprint Slide #
29
Myth #3: Big Data is innovative
Fact:
• Big Data techniques are
innovative
• ROI and insights depend
on the size of the business
and the amount of data
used and produced, e.g.
– Local pizza place vs. Papa
John’s
– Retail
29Copyright 2015 by Data Blueprint Slide #
My Barn must pass a foundation inspection
• Before further construction can proceed
• No IT equivalent in most organizations
30Copyright 2015 by Data Blueprint Slide #
Frameworks
• A system of ideas
for guiding
analyses
• A means of
organizing project
data
• Data integration
priorities decision
making
framework
• A means of
assessing
progress
8 31Copyright 2015 by Data Blueprint Slide #
"There’s now a blurring between the storage world and the memory world"
• Faster processors outstripped
not only the hard disk, but main
memory
– Hard disk too slow
– Memory too small
• Flash drives remove both
bottlenecks
– Combined Apple and Yahoo have
spend more than $500 million to
date
• Make it look like traditional
storage or more system
memory
– Minimum 10x improvements
– Dragonstone server is 3.2 tb flash
memory (Facebook)
• Bottom line - new capabilities!
8 32Copyright 2015 by Data Blueprint Slide #
Non-von Neumann Processing/Efficiencies
• von Neumann
bottleneck
(computer science)
– "An inefficiency inherent in
the design of any von
Neumann machine that
arises from the fact that
most computer time is
spent in moving
information between
storage and the central
processing unit rather than
operating on it"
[http://encyclopedia2.thefreedictionary.com/von+Neumann+bottleneck]
• Michael Stonebraker
– Ingres (Berkeley/MIT)
– Modern database
processing is
approximately 4%
efficient
• Many big data
architectures are
attempts to address
this, but:
– Zero sum game
– Trade characteristics
against each other
• Reliability
• Predictability
– Google/MapReduce/
Bigtable
– Amazon/Dynamo
– Netflix/Chaos Monkey
– Hadoop
– McDipper
• Big data techniques
exploit non-von
Neumann processing
8 33Copyright 2015 by Data Blueprint Slide #
m
• Decomposition
• Reassembly
– not optional!
8 34Copyright 2015 by Data Blueprint Slide #
One of Data Blueprint's Big Data Clusters
8 35Copyright 2015 by Data Blueprint Slide #
<-Feedback
Exploitable
Insight
• Patterns/objects,
hypotheses emerge
– What can be observed?
• Operationalizing
– The dots can be
repeatedly connected
Analytics Insight Cycle
Exis&ng
Knowledge
/base
• Things are happening
– Sensemaking
techniques address
"what" is happening?
• Patterns/objects,
hypotheses emerge
– What can be observed?
• Operationalizing
– The dots can be
repeatedly connected
– "Big Data" contributions
are shown in orange
• Margaret Boden's
computational
creativity
– Exploratory
– Combinational
– Transformational
Volume
Variety
Velocity
Potential/
actual
insights
Pattern/Object
Emergence
Analytical
bottleneck
8 36Copyright 2015 by Data Blueprint Slide #
Big Data: Two prominent use cases
• Sandwich offers a good analogy
of the big data and existing
technologies
• Landing Zone (less expensive)
– Especially useful in cases were data
is highly disposable
• Existing technologies are the
– Contents sandwiched and
complemented landing zone and
archival capabilities
• Archiving/Offloading (less need
for structure)
– "Cold" transactional and analytic
data
Adapted from Nancy Kopp:
http://ibmdatamag.com/2013/08/relishing-the-big-data-burger/
Landing Zone
Archiving Offloading
Existing
Data Architectural
Processing
8 37Copyright 2015 by Data Blueprint Slide #
What is NoSQL?
• Commonly interpreted as "Not Only SQL
• Broad class of database management technologies that
provide a mechanism for storage and retrieval of data that
doesn’t follow traditional relational database methodology.
• Motivations
– Simplicity of design
– Horizontal scaling
– Finer control over availability of the data.
• The data structures used by NoSQL databases differ from
those used in relational databases, making some
operations faster in NoSQL
and others faster in relational
databases.
8 38Copyright 2015 by Data Blueprint Slide #
What is Hadoop?
• A data storage and processing
system, that runs on clusters of commodity servers.
• Able to store any kind of data in its native format.
• Perform a wide variety of analyses and transformations.
• Store terabytes, and even petabytes, of data
inexpensively.
• Handles hardware and system failures automatically,
without losing data or interrupting data analyses.
• Critical components of Hadoop:
– HDFS- The Hadoop Distributed File System is the storage system
for a Hadoop cluster, responsible for distribution of data across the
servers.
– Mapreduce- The inner workings of Hadoop that allows for distributed
and parallel analytical job execution.
40Copyright 2015 by Data Blueprint Slide #
Why NoSQL? Why Hadoop?
• Large number of users (read: the internet)
• Rapid app development and deployment
• Large number of mission critical writes (sensors/etc)
• Small, continuous reads and writes, especially where
“Consistency” is less important (social networks)
• Hadoop solves the hard scaling problems caused by large
amounts of complex data.
• As the amount of data in a cluster grows,
new servers can be added to a Hadoop
cluster incrementally and inexpensively
to store and analyze it.
40Copyright 2015 by Data Blueprint Slide #
Hadoop Use Cases in the Real World
• Risk Modeling
• Customer Churn Analysis
• Recommendation Engine
• Ad Targeting
• Point of Sale Transaction Analysis
• Social Sentiment on Social Media
• Analyzing network data to predict failure
• Threat analysis
• Trade Surveillance
41Copyright 2015 by Data Blueprint Slide #
http://blogs.informatica.com/perspectives/uk/2011/08/09/hadoop-enriches-data-science-part-2-of-hadoop-series/
42Copyright 2015 by Data Blueprint Slide #
44
Copyright 2015 by Data Blueprint
• Data analysis struggles with the social
– Your brain is excellent at social cognition - people can
• Mirror each other’s emotional states
• Detect uncooperative behavior
• Assign value to things through emotion
– Data analysis measures the quantity of social
interactions but not the quality
• Map interactions with co-workers you see during work days
• Can't capture devotion to childhood friends seen annually
– When making (personal) decisions about social
relationships, it’s foolish to swap the amazing machine
in your skull for the crude machine on your desk
• Data struggles with context
– Decisions are embedded in sequences and contexts
– Brains think in stories - weaving together multiple
causes and multiple contexts
– Data analysis is pretty bad at
• Narratives / Emergent thinking / Explaining
• Data creates bigger haystacks
– More data leads to more statistically significant
correlations
– Most are spurious and deceive us
– Falsity grows exponentially greater amounts of data
we collect
• Big data has trouble with big problems
– For example: the economic stimulus debate
– No one has been persuaded by data to switch sides
• Data favors memes over masterpieces
– Detect when large numbers of people take an instant
liking to some cultural product
– Products are hated initially because they are unfamiliar
• Data obscures values
– Data is never raw; it’s always structured according to
somebody’s predispositions and values
Some Big Data Limitations
Myth #4: Big Data is just another IT project
Copyright 2013 by Data Blueprint
Fact:
• Big Data is not your typical IT
project
– Does not answer typical IT questions
– Trend analysis, agile, actionable, etc.
– Fundamentally different approach
• Big Data Projects are exploratory
• Big Data enables new capabilities
• Big Data can be a disruptive
technology
• It might sound simple but that
doesn’t mean it’s easy
• Beware of SOS (Shiny Object
Syndrome)
44
http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics
Copyright 2013 by Data Blueprint
Myth #4: Big Data is new
Fact:
• The term originated in the Silicon
Valley in the 1990s
• The concept has been used
previously
– 800 year old linguistic datasets
– Use in sciences in 1600s
– Kepler, Sloan Digital Sky Survey,
Statisticians’ view
• Much harder to leverage Big Data
when you lack appropriate
techniques
45
The Bills of Mortality was an Early Data Collection
47Copyright 2015 by Data Blueprint Slide #
Mortality Geocoding
Where is it happening?
Copyright 2015 by Data Blueprint
47
("Whereas of the Plague")
Plague Peak
When is it happening?
Copyright 2015 by Data Blueprint
48
Black Rats or Rattus Rattus
Why is it happening?
50
Copyright 2015 by Data Blueprint
What Will Happen? What will happen?
51
Copyright 2015 by Data Blueprint
Formalizing Data Management
• Defend the Realm:
The authorized history of MI5
by Christopher Andrew
• World War I
• 1914
• At war with much
of Europe
• 14,000,000 Germans living
in the United Kingdom
• How to efficiently and
effectively manage
information on that many
individuals?
• The Security Service is responsible for "protecting
the UK against threats to national security from
espionage, terrorism and sabotage, from the activities
of agents of foreign powers, and from actions intended
to overthrow or undermine parliamentary democracy by
political, industrial or violent means."
51Copyright 2015 by Data Blueprint Slide #
“As a final thought, how about a machine that
would send, via closed-circuit television, visual and
oral information needed immediately at high-level
conferences or briefings? Let’s say that a group of
senior officers are contemplating a covert action
program for Afghanistan. Things go well until
someone asks “Well, just how many schools are
there in the country, and what is the literacy rate?”
No one in the room knows. (Remember, this is an
imaginary situation). So the junior member present
dials a code number into a device at one end of the
table. Thirty seconds later, on the screen overhead,
a teletype printer begins to hammer out the
required data. Before the meeting is over, the group
has been given, through the same method, the
names of countries that have airlines into
Afghanistan, a biographical profile of the Soviet
ambassador there, and the Pakistani order of battle
along the Afghanistan frontier. Neat, no?”
• Predicted use of
not just
computing in the
intelligence
community
• Also forecast
predictive
analytics
• Accompanying
privacy
challenges
52Copyright 2015 by Data Blueprint Slide #
A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
Tweeting now at: #dataed
53Copyright 2015 by Data Blueprint Slide #
http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics
Copyright 2013 by Data Blueprint
Myth #6: Big Data provides all the Answers
Fact:
• Big Data does not mean the end of
scientific theory
• Be careful or you’ll end up with
spurious correlations
– Don’t just go fishing for correlations and
hope they will explain the world
• To get to the WHY of things, you
need ideas, hypotheses and theories
• Having more data does not
substitute for thinking hard,
recognizing anomalies and exploring
deep truths
• You need the right approach
54
Copyright 2013 by Data Blueprint
55
• Identify business opportunity
Copyright 2013 by Data Blueprint
• How can data be leveraged in
exploring
– External market place
• Analyze opportunities and threats
– Internal efficiencies
• Analyze strengths and weaknesses
56
Example: 2012 Olympic Summer Games
Copyright 2013 by Data Blueprint
1. Volume: 845 million FB users averaging 15 TB
+ of data/day
2. Velocity: 60 GB of data per second
3. Variety: 8.5 billion devices connected
4. Variability: Sponsor data, athlete data, etc.
5. Vitality: Data Art project “Emoto”
6. Virtual: Social media
57
• Based on my 6 V analysis, do I need a Big Data solution
Copyright 2013 by Data Blueprint
or does my current BI solution address my business
opportunity?
– Do the 6 Vs indicate general Big Data characteristics?
– What are the limitations of my current Bi environment?
(Technology constraint)
– What are my budgetary restrictions? (Financial constraint)
– What is my current Big Data knowledge base? (Knowledge
constraint)
58
• MUST have both
Foundational and
Technical practice
expertise
60
Copyright 2013 by Data Blueprint
Copyright 2013 by Data Blueprint
60
• Data Strategy
Copyright 2013 by Data Blueprint
• Data Governance
• Data Architecture
• Data Education
61
• Data Quality
Copyright 2013 by Data Blueprint
• Data Integration
• Data Platforms
• BI/Analytics
62
• Needs to be actionable
• Generally well understood by
business
• Document what has been learned
Copyright 2013 by Data Blueprint
63
• Perfect results are not
necessary
• Reiterate and refine
• Iterative process to
reach decision point
• Use as feedback for
next exploration
Copyright 2013 by Data Blueprint
64
Copyright 2013 by Data Blueprint
65
Myth #7: You need Big Data for Insights
Fact:
• Distinction between Big Data and
doing analytics
– Big Data is defined by the technology stack
that you use
– Big Data is used for predictive and
prescriptive analytics
• Use existing data for reporting, figure
out bottlenecks and optimize current
business model
• Understand how is your data
structured, architected and stored
Copyright 2013 by Data Blueprint
66
A Framework for Implementing NoSQL, Hadoop
Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques
• Big Data Context
– We are using the wrong vocabulary to discuss this topic
• More Precise Definitions
– Framework
– Non Von Neuman Architectures
– Hadoop/Nosql
• Big Data
– Historical Perspective
• Big Data Approach
– Crawl, Walk, Run
• Framework Examples
– Social
– Operational BWB
• Take Aways and Q&A
68Copyright 2015 by Data Blueprint Slide #
Tweeting now at: #dataed
Social Sentiment Analysis
• One of the burgeoning areas
for use of Big Data / Hadoop
platforms.
• Allows for the landing of
multiple sources of
unstructured data. (Twitter,
Facebook, Linked In, etc.)
• Data than can be analyzed
with algorithms looking for
keywords that determine
positive/negative feedback
Copyright 2013 by Data Blueprint
69
Operational Use
• Utilize real time pricing data from multiple sources to dynamically
update the pricing for books in the Amazon Marketplace.
• Ingested data from multiple sources looking for real time changes
in price.
• Would apply predictive model to determine best price point and set
price of their books on the marketplace.
• Increased conversion rate, but created a race to the bottom
situation if not monitored
Copyright 2013 by Data Blueprint
79
Healthcare Example: Patient Data
Copyright 2013 by Data Blueprint
• Clinical data:
– Diagnosis/prognosis/treatment
– Genetic data
• Patient demographic data
• Insurance data:
– Insurance provider
– Claims data
• Prescriptions & pharmacy information
• Physical fitness data
– Activity tracking through
smartphone apps & social media
• Health history
• Medical research data
70
http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/
Copyright 2013 by Data Blueprint
Retail Example: Loyalty Programs & Big Data
• Companies need to understand current wants and needs AND
predict future tendencies
• Customer -> Repeat Customer -> Brand Advocate
• Customer loyalty programs & retention strategies
– Track what is being purchased and how often
– Coupons based on purchasing history
– Targeted communications, campaigns & special offers
– Social media for additional interactions
– Personalize consumer interactions
• Customer purchase history influences
product placements
– Retailers rapidly respond to consumer demands
– Product placements, planogram optimization, etc.
71
References
Copyright 2013 by Data Blueprint
• The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November
20, 2012)
• McKinsey: Big Data: The next frontier for innovation, competition and productivity
(http://www.mckinsey.com/insights/business_technology/
big_data_the_next_frontier_for_innovation?p=1)
• The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/
2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics)
• Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving
Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/
2575515)
• The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/
2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&)
• CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/
429681/five_steps_how_better_manage_your_data/)
• Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’But Don’t Know If It’s
Worth It Yet (http://www.businessinsider.com/enterprise-big-data-
spending-2012-11#ixzz2cdT8shhe)
• Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/
kathleen-kim/big-data-spending-to-increase-for-it-industry.html)
• Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/xerox/
2013/09/27/big-data-boosts-customer-loyalty-no-really/)
72
Data Management Maturity
July 14, 2015 @ 2:00 PM ET/11:00 AM PT
Trends in Data Modeling
August 11, 2015 @ 2:00 PM ET/11:00 AM PT
Sign up here:
www.datablueprint.com/webinar-schedule
or www.dataversity.net
Upcoming Events
Copyright 2013 by Data Blueprint
73
10124 W. Broad Street, Suite C
GlenAllen, Virginia 23060
804.521.4056
Copyright 2013 by Data Blueprint
77
Potential Tradeoffs:
CAP theorem: consistency, availability and partition-tolerance
Small datasets can be both consistent & available
Partition
(Fault)
Tolerance
AvailabilityConsistency
Atomicity
Consistency
Isolation
Durability
Basic
Availability
Soft-state
Eventual consistency
Additional Context
Copyright 2013 by Data Blueprint
76
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1
Copyright 2013 by Data Blueprint
5 Ways in which Data creates Business Value
1. Information is transparent
and usable at much higher
frequency
2. Expose variability and
boost performance
3. Narrow segmentation of
customers and more
precisely tailored products
or services
4. Sophisticated analytics and
improved decision-making
5. Improved development of
the next generation of
products and services
77
• We are at an inflection point: The
sheer volume of data generated,
stored, and mined for insights has
become economically relevant to
businesses, government, and
consumers (McKinsey)
• We believe the same important
principles still apply:
– What problem are you trying to solve for
your business? Your solution needs to fit
your problem
– Doing data for (big) data’s sake is not going
to solve any problems
– Risk of spending a lot of money on chasing
Big Data that will realize little to no returns -
especially at this hype cycle stage
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1
Why the Big Deal about Big Data?
80
Copyright 2013 by Data Blueprint
http://www.cio.com.au/article/429681/five_steps_how_better_manage_your_data
Copyright 2013 by Data Blueprint
Business Information
Market: $1.1 Trillion a
Year
• Enterprises spend an
average of $38 million
on information/year
• Small and medium
sized businesses on
average spend
$332,000
79
Take Aways-Big Data Context
Copyright 2013 by Data Blueprint
• Technology continues to evolve at
increasing speeds
• Big Data is here
– We have the potential to
create insights
• Spend wisely & strategically:
– Big Data is not going to solve
all your problems.
• Fact:
– Big Data is not for everyone
• Fact:
– Lack of a clear definition
• Hype Cycle:
– Current: Peak of Inflated Expectations
– Soon: Trough of Disillusionment
80
Take Aways: Big Data Challenges Today
Copyright 2013 by Data Blueprint
• Fact: Big Data techniques are innovative but
“Big Data” is not
• Challenges are both foundational and
technical, today as well as in 1600s
• Technology continues to advance rapidly (4
Vs)
• Challenges associated with Big Data are not
new:
– Well-known foundational data management issues
– Need to align data and business with rapidly
changing environment
– Duplicity, accessibility, availability
– Foundational business issues
81
Take Aways-Approach: Crawl, Walk, Run
Copyright 2013 by Data Blueprint
• Crawl:
– Identify business opportunity and
determine whether you truly need
a Big Data solution
• Walk:
– Apply a combination of
foundational and technical data
management practices.
Document your insights and
make sure they are actionable
• Run:
– Recycle and explore. Staying
agile allows you to be exploratory.
82
Take Aways-Design Principles: Foundational & Technical
Copyright 2013 by Data Blueprint
• Foundational data management
principles still apply
• Beware of SOS (Shiny Object
Syndrome)
• You must have a data strategy before
you can have a Big Data strategy
• Fact: You don’t need Big Data to gain
insights
• Big Data integration requirements evolve
from your strategy
• Fact: Bigger Data is not always better
83
Take Aways: In Summary
Copyright 2013 by Data Blueprint
• Big data techniques are innovative
but “Big Data” is not
• Big Data characteristics: 6 Vs
– Volume, Velocity, Variety, Variability, Vitality,
Virtual
• Approach: Crawl-Walk-Run
• Big Data challenges require solutions
that are based on foundational and
technical data management practices
• Beware of SOS (Shiny Object
Syndrome):
– Spend wisely and strategically
– Big Data is not going to solve all your
problems
84
Foundational Practice: Data Strategy
• Your data strategy must
align to your organizational
business strategy and
operating model
• As the market place
becomes more data-
driven, a data-focused
business strategy is an
imperative
• Must have data strategy
before you have a Big
Data strategy
Copyright 2013 by Data Blueprint
85
Data Strategy Considerations
• What are the questions that
you cannot answer today?
• Is there a direct reliance on
understanding customer
behavior to drive revenue?
• Do you have information
overload and are you trying to
find the signal in the noise?
• Which is more important:
– Establishing value from current
data assets/data reporting?
– Exploring Big Data
opportunities?
Copyright 2013 by Data Blueprint
86
Foundational Practice: Data Architecture
• Common vocabulary expressing
integrated requirements ensuring
that data assets are stored,
arranged, managed, and used in
systems in support of
organizational strategy [Aiken
2010]
• Most organizations have data
assets that are not supportive of
strategies
• Big question:
– How can organizations more
effectively use their information
architectures to support
strategy implementation?
90
Copyright 2013 by Data Blueprint
Data Architecture Considerations
• Does your current architecture for
BI and analytics support Big Data?
• Are you getting enough value out of
your current architecture?
• Can you easily integrate and share
information across your
organization?
• Do you struggle to extract the value
from your data because it is too
cumbersome to navigate and
access?
• Are you confident your data is
organized to meet the needs of
your business?
Copyright 2013 by Data Blueprint
88
Technical Practice: Data Integration
• A data-centric
organization requires
unified data
• Integrating data across
organizational silos
creates new insights
• It is also the biggest
challenge
• Big Data techniques can
be used to complement
existing integration efforts
Copyright 2013 by Data Blueprint
89
Data Integration Considerations
• The complexity of your data
integration challenge depends on
the questions you’re trying to
answer
• Integration requirements for Big
Data are dependent on the types of
questions you’re asking:
– Integration here may be more fuzzy than
discrete
– Integration is domain-based (based on
time, customer concept, geographic
distribution)
• Those requirements should evolve
from your strategy
Copyright 2013 by Data Blueprint
90
Technical Practice: Data Quality
• Quality is driven by fit for purpose
considerations
• Big Data quality is different:
– Basic
– Availability
– Soft-state
– Eventual consistency
• Directional accuracy is the goal
• Focus on your most important data
assets and ensure our solutions
address the root cause of any quality
issues – so that your data is correct
when it is first created
• Experience has shown that
organizations can never get in front of
their data quality issues if they only use
the ‘find-and-fix’ approach
Copyright 2013 by Data Blueprint
91
Data Quality Considerations
• Big Data is trying to be
predictive
• What are the questions you
are trying to answer?
– What level of accuracy are you
looking for?
– What confidence levels?
– Example: Do I need to know
exactly what the customer is
going to buy or do I just need to
know the range of products he/
she is going to choose from?
Copyright 2013 by Data Blueprint
92
Technical Practice: Data Platforms
• Do you want to measure
critical operational process
performance?
• No one data platform can
answer all your questions. This
is commonly misunderstood
and often leads to very
expensive, bloated and
ineffective data platforms.
• Understanding the questions
that need to be asked and how
to build the right data platform
or how to optimize an existing
one
Copyright 2013 by Data Blueprint
93
Data Platforms Considerations
• Commonalities between most big data
stacks with file storage, columnar store,
querying engine, etc.
• Big data stack generally looks the same
until you get into appliances
– Algorithms are built into appliance
themselves, e.g. Netezza, Teradata,
etc.)
• Ask these questions:
– Do you want insights on your
customer’s behavior?
– Do you need real-time customer
transactional information?
– Do you need historical data or just
access to the latest transactions?
– Where do you go to find the single
version of the truth about your
customers?
Copyright 2013 by Data Blueprint
94

More Related Content

What's hot

Ppt syariat, tarekat, hakikat, dan makrifat
Ppt syariat, tarekat, hakikat, dan makrifatPpt syariat, tarekat, hakikat, dan makrifat
Ppt syariat, tarekat, hakikat, dan makrifatGatot Birowo - STIE AAS
 
Tatacara mengemban
Tatacara mengembanTatacara mengemban
Tatacara mengembanImas Siti M
 
Suku Jawa / Etnis Jawa
Suku Jawa / Etnis JawaSuku Jawa / Etnis Jawa
Suku Jawa / Etnis JawaKrisdiana 1911
 
APBN Negara Khilafah VS APBN NKRI 2015 kapitalis demokrasi - sistem ekonomi...
APBN Negara Khilafah VS APBN NKRI 2015 kapitalis demokrasi  -  sistem ekonomi...APBN Negara Khilafah VS APBN NKRI 2015 kapitalis demokrasi  -  sistem ekonomi...
APBN Negara Khilafah VS APBN NKRI 2015 kapitalis demokrasi - sistem ekonomi...Ahmad Harmoko
 
Istighosah dan terjemahnya
Istighosah dan terjemahnyaIstighosah dan terjemahnya
Istighosah dan terjemahnyazaenal mukodir
 
sistem digital-Rangkaian penjumlah
sistem digital-Rangkaian penjumlahsistem digital-Rangkaian penjumlah
sistem digital-Rangkaian penjumlahDhiah Febri
 
Pengertian metodologi studi islam
Pengertian metodologi studi islamPengertian metodologi studi islam
Pengertian metodologi studi islamEdwarn Abazel
 
2012 04-14 明慧-生命活動的基本物質-氣血津液
2012 04-14 明慧-生命活動的基本物質-氣血津液2012 04-14 明慧-生命活動的基本物質-氣血津液
2012 04-14 明慧-生命活動的基本物質-氣血津液yangmarissa
 
Materi khilafah dan dakwah
Materi khilafah dan dakwahMateri khilafah dan dakwah
Materi khilafah dan dakwahel-hafiy
 
Bab ii discrete time
Bab ii   discrete timeBab ii   discrete time
Bab ii discrete timeRumah Belajar
 
Tp1 architecture m.zarboubi
Tp1 architecture m.zarboubiTp1 architecture m.zarboubi
Tp1 architecture m.zarboubiMOHAMED ZARBOUBI
 

What's hot (20)

Materi tarbiyah
Materi tarbiyahMateri tarbiyah
Materi tarbiyah
 
Ppt syariat, tarekat, hakikat, dan makrifat
Ppt syariat, tarekat, hakikat, dan makrifatPpt syariat, tarekat, hakikat, dan makrifat
Ppt syariat, tarekat, hakikat, dan makrifat
 
Tatacara mengemban
Tatacara mengembanTatacara mengemban
Tatacara mengemban
 
Suku Jawa / Etnis Jawa
Suku Jawa / Etnis JawaSuku Jawa / Etnis Jawa
Suku Jawa / Etnis Jawa
 
APBN Negara Khilafah VS APBN NKRI 2015 kapitalis demokrasi - sistem ekonomi...
APBN Negara Khilafah VS APBN NKRI 2015 kapitalis demokrasi  -  sistem ekonomi...APBN Negara Khilafah VS APBN NKRI 2015 kapitalis demokrasi  -  sistem ekonomi...
APBN Negara Khilafah VS APBN NKRI 2015 kapitalis demokrasi - sistem ekonomi...
 
Adil
AdilAdil
Adil
 
Zakat, Puasa dan Haji
Zakat, Puasa dan HajiZakat, Puasa dan Haji
Zakat, Puasa dan Haji
 
Istighosah dan terjemahnya
Istighosah dan terjemahnyaIstighosah dan terjemahnya
Istighosah dan terjemahnya
 
sistem digital-Rangkaian penjumlah
sistem digital-Rangkaian penjumlahsistem digital-Rangkaian penjumlah
sistem digital-Rangkaian penjumlah
 
Ringkasan Kitab Mafahim HT
Ringkasan Kitab Mafahim HTRingkasan Kitab Mafahim HT
Ringkasan Kitab Mafahim HT
 
Islam dan Ruang Lingkupnya
Islam dan Ruang LingkupnyaIslam dan Ruang Lingkupnya
Islam dan Ruang Lingkupnya
 
Materi pkn kls xii bab 7
Materi pkn kls xii bab 7Materi pkn kls xii bab 7
Materi pkn kls xii bab 7
 
Pengertian metodologi studi islam
Pengertian metodologi studi islamPengertian metodologi studi islam
Pengertian metodologi studi islam
 
2012 04-14 明慧-生命活動的基本物質-氣血津液
2012 04-14 明慧-生命活動的基本物質-氣血津液2012 04-14 明慧-生命活動的基本物質-氣血津液
2012 04-14 明慧-生命活動的基本物質-氣血津液
 
Wawasan Agama Islam
Wawasan Agama IslamWawasan Agama Islam
Wawasan Agama Islam
 
Materi khilafah dan dakwah
Materi khilafah dan dakwahMateri khilafah dan dakwah
Materi khilafah dan dakwah
 
Bab ii discrete time
Bab ii   discrete timeBab ii   discrete time
Bab ii discrete time
 
Tp1 architecture m.zarboubi
Tp1 architecture m.zarboubiTp1 architecture m.zarboubi
Tp1 architecture m.zarboubi
 
Sekte Asy'ariyyah
Sekte Asy'ariyyahSekte Asy'ariyyah
Sekte Asy'ariyyah
 
Mengenal mabda kapitalisme
Mengenal  mabda kapitalismeMengenal  mabda kapitalisme
Mengenal mabda kapitalisme
 

Viewers also liked

Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management  Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data Blueprint
 
Data Ed: Best Practices with the DMM
Data Ed: Best Practices with the DMMData Ed: Best Practices with the DMM
Data Ed: Best Practices with the DMMData Blueprint
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData Blueprint
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData Blueprint
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data Blueprint
 
Data-Ed: Metadata Strategies
 Data-Ed: Metadata Strategies Data-Ed: Metadata Strategies
Data-Ed: Metadata StrategiesData Blueprint
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData Blueprint
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies Data-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies Data Blueprint
 
Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data Blueprint
 
Data-Ed: Best Practices with the Data Management Maturity Model
Data-Ed: Best Practices with the Data Management Maturity ModelData-Ed: Best Practices with the Data Management Maturity Model
Data-Ed: Best Practices with the Data Management Maturity ModelData Blueprint
 
Strategy and roadmap slides
Strategy and roadmap slidesStrategy and roadmap slides
Strategy and roadmap slidesData Blueprint
 

Viewers also liked (12)

Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management  Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management
 
Data Ed: Best Practices with the DMM
Data Ed: Best Practices with the DMMData Ed: Best Practices with the DMM
Data Ed: Best Practices with the DMM
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance StrategiesData-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM
 
Data-Ed: Metadata Strategies
 Data-Ed: Metadata Strategies Data-Ed: Metadata Strategies
Data-Ed: Metadata Strategies
 
Data-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing StrategiesData-Ed: Data Warehousing Strategies
Data-Ed: Data Warehousing Strategies
 
Data-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies Data-Ed: Data Governance Strategies
Data-Ed: Data Governance Strategies
 
Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures Data-Ed: Design and Manage Data Structures
Data-Ed: Design and Manage Data Structures
 
Data-Ed: Best Practices with the Data Management Maturity Model
Data-Ed: Best Practices with the Data Management Maturity ModelData-Ed: Best Practices with the Data Management Maturity Model
Data-Ed: Best Practices with the Data Management Maturity Model
 
Strategy and roadmap slides
Strategy and roadmap slidesStrategy and roadmap slides
Strategy and roadmap slides
 

Similar to Data-Ed: A Framework for no sql and Hadoop

Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationSmart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationDATAVERSITY
 
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterImplementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterDATAVERSITY
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDATAVERSITY
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigDataValarmathi V
 
Sentara Linked Data Workshop - Sept 10, 2012
Sentara Linked Data Workshop - Sept 10, 2012Sentara Linked Data Workshop - Sept 10, 2012
Sentara Linked Data Workshop - Sept 10, 20123 Round Stones
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusersBob Hardaway
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Pedro Mac Dowell Innecco
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your BusinessData-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your BusinessDATAVERSITY
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - IntroductionTomy Rhymond
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1RUHULAMINHAZARIKA
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data ScienceThinkful
 

Similar to Data-Ed: A Framework for no sql and Hadoop (20)

Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL GenerationSmart Data Webinar: Advances in Natural Language Processing II - NL Generation
Smart Data Webinar: Advances in Natural Language Processing II - NL Generation
 
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) BetterImplementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
Implementing Big Data, NoSQL, & Hadoop - Bigger Is (Usually) Better
 
DataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best PracticesDataEd Slides: Data Management Best Practices
DataEd Slides: Data Management Best Practices
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
An Overview of BigData
An Overview of BigDataAn Overview of BigData
An Overview of BigData
 
Sentara Linked Data Workshop - Sept 10, 2012
Sentara Linked Data Workshop - Sept 10, 2012Sentara Linked Data Workshop - Sept 10, 2012
Sentara Linked Data Workshop - Sept 10, 2012
 
Big data
Big dataBig data
Big data
 
Big data4businessusers
Big data4businessusersBig data4businessusers
Big data4businessusers
 
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
Auxilion - The Implications of Big Data on the Roadmap Towards Business Intel...
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your BusinessData-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
Data-Ed Slides: Data-Centric Strategy & Roadmap - Supercharging Your Business
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Big data
Big dataBig data
Big data
 
Big data intro.pptx
Big data intro.pptxBig data intro.pptx
Big data intro.pptx
 
Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1Big Data Analytics Materials, Chapter: 1
Big Data Analytics Materials, Chapter: 1
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
SKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSISSKILLWISE-BIGDATA ANALYSIS
SKILLWISE-BIGDATA ANALYSIS
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 

More from Data Blueprint

Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data Blueprint
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 
Data-Ed: Emerging Trends in Data Jobs
Data-Ed: Emerging Trends in Data JobsData-Ed: Emerging Trends in Data Jobs
Data-Ed: Emerging Trends in Data JobsData Blueprint
 
Data-Ed: Data-centric Strategy & Roadmap
Data-Ed: Data-centric Strategy & RoadmapData-Ed: Data-centric Strategy & Roadmap
Data-Ed: Data-centric Strategy & RoadmapData Blueprint
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data Blueprint
 
Data-Ed: Unlock Business Value through Document & Content Management
Data-Ed: Unlock Business Value through Document & Content ManagementData-Ed: Unlock Business Value through Document & Content Management
Data-Ed: Unlock Business Value through Document & Content ManagementData Blueprint
 
Data-Ed: Unlock Business Value Through Reference & MDM
Data-Ed: Unlock Business Value Through Reference & MDM Data-Ed: Unlock Business Value Through Reference & MDM
Data-Ed: Unlock Business Value Through Reference & MDM Data Blueprint
 
Data-Ed: Show Me the Money: Monetizing Data Management
Data-Ed: Show Me the Money: Monetizing Data ManagementData-Ed: Show Me the Money: Monetizing Data Management
Data-Ed: Show Me the Money: Monetizing Data ManagementData Blueprint
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Blueprint
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Blueprint
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData Blueprint
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data Blueprint
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData Blueprint
 
Data-Ed: Unlock Business Value through Data Governance
Data-Ed: Unlock Business Value through Data GovernanceData-Ed: Unlock Business Value through Data Governance
Data-Ed: Unlock Business Value through Data GovernanceData Blueprint
 
Leading the Data Asset Management Team: CDO or Top Data Job?
Leading the Data Asset Management Team: CDO or Top Data Job?Leading the Data Asset Management Team: CDO or Top Data Job?
Leading the Data Asset Management Team: CDO or Top Data Job?Data Blueprint
 
Data-Ed: Building the Case for the Top Data Job
Data-Ed: Building the Case for the Top Data JobData-Ed: Building the Case for the Top Data Job
Data-Ed: Building the Case for the Top Data JobData Blueprint
 
Data-Ed: Unlocking business value through data modeling and data architecture...
Data-Ed: Unlocking business value through data modeling and data architecture...Data-Ed: Unlocking business value through data modeling and data architecture...
Data-Ed: Unlocking business value through data modeling and data architecture...Data Blueprint
 

More from Data Blueprint (18)

Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management Data-Ed: Monetizing Data Management
Data-Ed: Monetizing Data Management
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 
2014 dqe handouts
2014 dqe handouts2014 dqe handouts
2014 dqe handouts
 
Data-Ed: Emerging Trends in Data Jobs
Data-Ed: Emerging Trends in Data JobsData-Ed: Emerging Trends in Data Jobs
Data-Ed: Emerging Trends in Data Jobs
 
Data-Ed: Data-centric Strategy & Roadmap
Data-Ed: Data-centric Strategy & RoadmapData-Ed: Data-centric Strategy & Roadmap
Data-Ed: Data-centric Strategy & Roadmap
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Data-Ed: Unlock Business Value through Document & Content Management
Data-Ed: Unlock Business Value through Document & Content ManagementData-Ed: Unlock Business Value through Document & Content Management
Data-Ed: Unlock Business Value through Document & Content Management
 
Data-Ed: Unlock Business Value Through Reference & MDM
Data-Ed: Unlock Business Value Through Reference & MDM Data-Ed: Unlock Business Value Through Reference & MDM
Data-Ed: Unlock Business Value Through Reference & MDM
 
Data-Ed: Show Me the Money: Monetizing Data Management
Data-Ed: Show Me the Money: Monetizing Data ManagementData-Ed: Show Me the Money: Monetizing Data Management
Data-Ed: Show Me the Money: Monetizing Data Management
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
 
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: MetadataData-Ed: Data Systems Integration & Business Value PT. 1: Metadata
Data-Ed: Data Systems Integration & Business Value PT. 1: Metadata
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Data-Ed: Demystifying Big Data
Data-Ed: Demystifying Big DataData-Ed: Demystifying Big Data
Data-Ed: Demystifying Big Data
 
Data-Ed: Unlock Business Value through Data Governance
Data-Ed: Unlock Business Value through Data GovernanceData-Ed: Unlock Business Value through Data Governance
Data-Ed: Unlock Business Value through Data Governance
 
Leading the Data Asset Management Team: CDO or Top Data Job?
Leading the Data Asset Management Team: CDO or Top Data Job?Leading the Data Asset Management Team: CDO or Top Data Job?
Leading the Data Asset Management Team: CDO or Top Data Job?
 
Data-Ed: Building the Case for the Top Data Job
Data-Ed: Building the Case for the Top Data JobData-Ed: Building the Case for the Top Data Job
Data-Ed: Building the Case for the Top Data Job
 
Data-Ed: Unlocking business value through data modeling and data architecture...
Data-Ed: Unlocking business value through data modeling and data architecture...Data-Ed: Unlocking business value through data modeling and data architecture...
Data-Ed: Unlocking business value through data modeling and data architecture...
 

Recently uploaded

Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-ProfitsTimothy Spann
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTimothy Spann
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingMarketingTrips
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe321k
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseThinkInnovation
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1bengalurutug
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxjkmrshll88
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfmxlos0
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptxFurkanTasci3
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxShammiRai3
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfJasonBoboKyaw
 
PPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggPPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggbhadratanusenapati1
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...ferisulianta.com
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentAggregage
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media PlatformsMahmoud Yasser
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfdcphostmaster
 

Recently uploaded (20)

Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits2024 Build Generative AI for Non-Profits
2024 Build Generative AI for Non-Profits
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
TCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI PipelinesTCFPro24 Building Real-Time Generative AI Pipelines
TCFPro24 Building Real-Time Generative AI Pipelines
 
Báo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân MarketingBáo cáo Social Media Benchmark 2024 cho dân Marketing
Báo cáo Social Media Benchmark 2024 cho dân Marketing
 
The market for cross-border mortgages in Europe
The market for cross-border mortgages in EuropeThe market for cross-border mortgages in Europe
The market for cross-border mortgages in Europe
 
Using DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data WarehouseUsing DAX & Time-based Analysis in Data Warehouse
Using DAX & Time-based Analysis in Data Warehouse
 
Target_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110millionTarget_Company_Data_breach_2013_110million
Target_Company_Data_breach_2013_110million
 
Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1Bengaluru Tableau UG event- 2nd March 2024 Q1
Bengaluru Tableau UG event- 2nd March 2024 Q1
 
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdfNeo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
Neo4j_Jesus Barrasa_The Art of the Possible with Graph.pptx.pdf
 
Stochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptxStochastic Dynamic Programming and You.pptx
Stochastic Dynamic Programming and You.pptx
 
Microeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdfMicroeconomic Group Presentation Apple.pdf
Microeconomic Group Presentation Apple.pdf
 
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptxSTOCK PRICE ANALYSIS  Furkan Ali TASCI --.pptx
STOCK PRICE ANALYSIS Furkan Ali TASCI --.pptx
 
Brain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptxBrain Tumor Detection with Machine Learning.pptx
Brain Tumor Detection with Machine Learning.pptx
 
Air Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdfAir Con Energy Rating Info411 Presentation.pdf
Air Con Energy Rating Info411 Presentation.pdf
 
PPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfggggPPT for Presiding Officer.pptxvvdffdfgggg
PPT for Presiding Officer.pptxvvdffdfgggg
 
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
Prediction Of Cryptocurrency Prices Using Lstm, Svm And Polynomial Regression...
 
How to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product DevelopmentHow to Build an Experimentation Culture for Data-Driven Product Development
How to Build an Experimentation Culture for Data-Driven Product Development
 
Data Collection from Social Media Platforms
Data Collection from Social Media PlatformsData Collection from Social Media Platforms
Data Collection from Social Media Platforms
 
Paul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdfPaul Martin (Gartner) - Show Me the AI Money.pdf
Paul Martin (Gartner) - Show Me the AI Money.pdf
 

Data-Ed: A Framework for no sql and Hadoop

  • 1. • Big Data could know us better than we know ourselves – Dan Gardner • We'll see this as the time in history wh the world's information was transformed from inert, passive stat and put into a unified system th brings that information alive – Michael Nielsen ow have a ce to en me the of our nowledge rse, one an onstantly e, figures to match eeds hael S. one at A Framework for Implementing NoSQL, Hadoop • N • Today a street stall in Mumbai can access more b information, maps, statistics, academic papers, price n trends, futures markets, and data than a U.S. c President could only a few decades ago – – Juan Enriquez ot everything that can e counted counts, and ot everything that ounts can be counted Albert Einstein Big Data and NoSQL continue to make headlines everywhere. However, most of what has been written about these topics is focused on the hardware, services, and scale out. But what about a Big Data and NoSQL Strategy, one that supports your business strategy? Virtually every major organization thinking about these data platforms is faced with the challenge of figuring out the appropriate approach and the requirements. This presentation will provide guidance on how to think about and establish realistic Big Data management plans and expectations. We will introduce a framework for evaluating the various choices when it comes to implementing and succeeding with Big Data/NoSQL and show how to demonstrate a sample use case. Takeaways: • A Framework for evaluating Big Data techniques • Deciding on a Big Data platform – How do you know which one is a good fit for you? • The means by which big data techniques can complement existing data management practices • The prototyping nature of practicing big data techniques • The distinct ways in which utilizing Big Data can generate business value Date: Time: Presenter: June 9, 2015 2:00 PM ET/11:00AM PT PeterAiken, Ph.D. & Josh Bartels • Soon we will salt the oceans, the land, and the sk with uncounted numbers of sensors invisible to th eyes but visible to one another • We n – Esther Dyson chan beco center own k unive that c recon itself our n – Mic Mal • We've reached a tipping point in history: today more y data is being manufactured by machines, servers, e and cell phones, than by people – Michael E. Driscoll • Every century, a new technology-steam power, electricity, atomic energy, or microprocessors-has swept away the old world with a vision of a new one. Today, we seem to be entering the era of Big Data – Michael Coren 1Copyright 2015 by Data Blueprint Slide #
  • 2. Shannon Kempe Executive Editor at DATAVERSITY.net 2Copyright 2015 by Data Blueprint Slide #
  • 3. Steven MacLauchlan • 10 years of experience in Application Development and Data Modeling with a focus on Healthcare solutions. • Delivers tailored data management solutions that provide focus on data’s business value while enhancing clients’ overall capability to manage data • Certified Data Management Professional (CDMP) • Computer Science degree from Virginia Commonwealth University • Most recent focus: Understanding emerging data modeling trends and how these can best be leveraged for the Enterprise. 3Copyright 2015 by Data Blueprint Slide #
  • 4. Get Social With Us! Live Twitter Feed Join the conversation! Follow us: @datablueprint @paiken Ask questions and submit your comments: #dataed Like Us on Facebook www.facebook.com/ datablueprint Post questions and comments Find industry news, insightful content and event updates. Join the Group Data Management & Business Intelligence Ask questions, gain insights and collaborate with fellow data management professionals 4Copyright 2015 by Data Blueprint Slide #
  • 5. Peter Aiken, Ph.D. • 30+ years in data management • Repeated international recognition • Founder, Data Blueprint (datablueprint.com) • Associate Professor of IS (vcu.edu) • DAMA International (dama.org) • 9 books and dozens of articles • Experienced w/ 500+ data management practices • Multi-year immersions: – US DoD – Nokia – Deutsche Bank – Wells Fargo – Walmart – … • DAMA International President 2009-2013 • DAMA International Achievement Award 2001 (with Dr. E. F. "Ted" Codd • DAMA International Community Award 2005 PETERAIKEN WITH JUANITABILLINGS F OR EW O RD B Y J O H N B OTTEGA MONETIZING DATA M AN AGEM EN T Unlocking the Value in Your Organization’s Most Important Asset. TheCaseforthe Chief ta fficer Recasting uite erage Your Most aluable A Peter Aikenand Michael Gorman 5Copyright 2015 by Data Blueprint Slide #
  • 6. Josh Bartels • Data management consultant and leader – Over (10) years of experience – Multiple industries (Finance, Defense, Insurance) • Certifications – Certified Data Management Professional (CDMP) – Project Manager (PMP) – Data Vault 2.0 Practitioner (CDVP2) • Education – Masters in Business Administration – Masters in Information Systems • Current Efforts – focus on the creation and migration to new data platforms for clients in the financial and insurance industries. 6Copyright 2015 by Data Blueprint Slide #
  • 7. Presented by Peter Aiken, Ph.D., Josh Bartels, Steven MacLauchlan A Framework for Implementing NoSQL, Hadoop Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques 7Copyright 2015 by Data Blueprint Slide #
  • 8. A Framework for Implementing NoSQL, Hadoop Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques • Big Data Context – We are using the wrong vocabulary to discuss this topic • More Precise Definitions – Framework – Non Von Neuman Architectures – Hadoop/Nosql • Big Data – Historical Perspective • Big Data Approach – Crawl, Walk, Run • Framework Examples – Social – Operational BWB • Take Aways and Q&A Tweeting now at: #dataed 8Copyright 2015 by Data Blueprint Slide #
  • 9. A Framework for Implementing NoSQL, Hadoop Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques • Big Data Context – We are using the wrong vocabulary to discuss this topic • More Precise Definitions – Framework – Non Von Neuman Architectures – Hadoop/Nosql • Big Data – Historical Perspective • Big Data Approach – Crawl, Walk, Run • Framework Examples – Social – Operational BWB • Take Aways and Q&A Tweeting now at: #dataed 10Copyright 2015 by Data Blueprint Slide #
  • 10. Myth #1: Big Data has a clear definition Fact: • The term is used so often and in many contexts that its meaning has become vague and ambiguous • Industry experts and scientists often disagree http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics 10Copyright 2015 by Data Blueprint Slide #
  • 11. Big Data(has something to do with Vs - doesn't it?) • Volume – Amount of data • Velocity – Speed of data in and out • Variety – Range of data types and sources • 2001 Doug Laney • Variability – Many options or variable interpretations confound analysis • 2011 ISRC •Vitality –A dynamically changing Big Data environment in which analysis and predictive models must continually be updated as changes occur to seize opportunities as they arrive • 2011 CIA •Virtual – Scoping the discussion to only include online assets • 2012 Courtney Lambert • Value/Veracity • Stuart Madnick (John Norris Maguire Professor of Information Technology, MIT Sloan School of Management & Professor of Engineering Systems, MIT School of Engineering) 11Copyright 2015 by Data Blueprint Slide #
  • 12. Defining Big Data • Big Data are high-volume, high-velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization. – Gartner 2012 • Big data refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze. – IBM 2012 • An all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process using on-hand data management tools or traditional data processing applications – Wikipedia 2014 • Shorthand for advancing trends in technology that open the door to a new approach to understanding the world and making decisions. – NY Times 2012 • The broad range of new and massive data types that have appeared over the last decade – Tom Davenport 2014 • Data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.” – Oxford English Dictionary 2014 • Big data is about putting the "I" back into IT. – PeterAiken 2007 12Copyright 2015 by Data Blueprint Slide #
  • 13. Big Data Techniques • New techniques available to impact the productivity (order of magnitude) of any analytical insight cycle that compliment, enhance, or replace conventional (existing) analysis methods • Big data techniques are currently characterized by: – Continuous, instantaneously available data sources – Non-von Neumann Processing (defined later in the presentation) – Capabilities approaching or past human comprehension – Architecturally enhanceable identity/security capabilities – Other tradeoff-focused data processing • So a good question becomes "where in our existing architecture can we most effectively apply Big Data Techniques?" 13Copyright 2015 by Data Blueprint Slide #
  • 14. Big Data Technologies by themselves, are a One Legged Stool Governance is the major means of preventing over reliance on one legged stools! 14Copyright 2015 by Data Blueprint Slide #
  • 15. The Big Data Landscape Copyright Dave Feinleib, bigdatalandscape.com 15Copyright 2015 by Data Blueprint Slide #
  • 16. Rela%onalzone Microso^ Non+rela%onalzone LotusNotes Objec/vity MarkLogic Ac/an Versant InterSystems Caché McObject Starcounter ArangoDB Founda/onDB Neo4J InfiniteGraph Cloudant RethinkDB CouchDB BerkeleyDB RavenDB LevelDB OracleNoSQL Riak Couchbase Redis Handlersocket Cassandra.io GoogleApp Engine Datastore GoogleCloud Datastore Accumulo YarcDataCassandra HBase Verizon Splice Machine Firebird Ac/an Ingres SAPSybaseASE EnterpriseDB SQL Server MySQL Informix Exasol MariaDB Oracle IBM Database DB2 SAP HANA Database.com AWS RDS ClearDB GoogleCloud SQL HPCloudRDB forMySQL FathomDB StormDB RackspaceCloud Databases Azure SQL Database Teradata Aster OracleBig Data Appliance SciDB HPCC Cloudera HortonworksMapR IBM BigInsights ZeWaset NGDATA LucidWorks BigData Infochimps Metamarkets Metascale Mortar Data Al/scale Rackspace Qubole Voldemort TokuDB CortexDB Aerospike RainStor IBMPureData forAnaly/cs SQream Teradata Kogni/o LucidDB KxSystems Ac/an Matrix IBMInfoSphere ParStream SAP SybaseIQ HPVer/ca Pivotal Greenplum MonetDB LogicBlox SpaceCurve XtremeData MetamarketsDruid Ac/an Vector MySQLClusterClustrix ScaleDB ScaleBase ScaleArc Tesora CodeFutures Con/nuent Datomic CockroachDB JustOneDB TransLa[ce NuoDB Drizzle Pivotal GemFire XD ZimoryScale Galera DeepDB FairCom MemSQL GenieDB Infobright FlockDB Allegrograph HypergraphDB AffinityDB MongoDB SPARQLBASE Giraph Trinity MemCachier Redis Labs MemcachedCloud BitYota IronCache Grid/cachezone Memcached Ehcache ScaleOut So^ware IBM eXtreme Scale Oracle Coherence GigaSpacesXAPGridGain Pivotal GemFire CloudTran InfiniSpan Hazelcast Oracle Exaly/cs OracleEndeca Server A[ v io Elas/csearch Towards enterprise search Lucene/Solr IBMInfoSphere DataExplorer Sumo Logic A Towards E*discovery Database Tamino XMLServer Documentum xDB UniData UniVerse Adabas OrientDB Ipedo XML ObjectStore AWS Elas/Cache IBMIMS WakandaDB Sparksee https:// E 451research.com/ dashboard/dpa ©2014by451ResearchLLC. Allrightsreserved HyperDex TIBCO Ac/veSpaces Titan BigMemory FatDB GrapheneDB Hypertable Al/base HDB Al/base XDB JumboDB Stardog Datacaching Datagrid Search Appliances Inememory Streamprocessing Redshi^ 1010data Google BigQuery AWS TempoIQ InfluxDB WebScaleSQL 2 D E D RedHat JBoss DataGrid 654 Iris Couch MongoLab Compose Redis Labs Redis Cloud ObjectRocket Azure DocumentDB TokuMX CloudBird 1 3 AWSDynamoDB RedisGreen Redisetoego AWSSimpleDB AWS Elas/Cache with Redis MagnetoDB ObjectRocket Redis Databricks/Spark OracleBig DataCloud SQLite Ac/an PSQL ProgressOpenEdge OracleTimesTen solidDB Heroku Postgres Treasure Data vFabric Postgres PostgreSQL Percona SAPSybaseSQLAnywhere Presto Impala JethroData IBM Big SQL CitusDB Hadapt Pivotal HD/HAWQ DataStax Enterprise Sqrrl Enterprise Microso^ HDInsight HP Autonomy Oracle Exadata IBM PureData Apache Drill SQLServer PDW Apache Tajo Apache Hive MammothDB SRCH2 TIBCO LogLogic Splunk Towards SIEM Loggly Logentries InfiniSQL Savvis So^layer xPlenty Trafodion MariaDBEnterprise ApacheStorm ApacheS4 IBM InfoSphere Streams TIBCO StreamBase AWS Kinesis SQLStream DataTorrent Feedzai So^wareAG Guavus Lokad Data Platforms Map October 2014 Key: General purpose Specialist analy/c easeaeService BigTables Graph Document Key value storesKey value direct access Hadoop MySQLecosystem Advanced clustering/sharding NewSQL databases OpenStackTrove MySQL FabricSpider A B C TeSystems B C 2 43 5 PostgreseXL Azure GoogleCloud Dataflow Search 1 6 VoltDB AWS EMR Google Compute Engine Stra/o 16Copyright 2015 by Data Blueprint Slide #
  • 17. C2 DataStax Enterprise C6 HPVer/ca B5 Microso^ SQLServer PDW C4 ScaleDB hWps://451research.com/dashboard/dpa 17Copyright 2015 by Data Blueprint Slide # INDE X D6 D2 B3 C6 1010data Accumulo Ac/an Ingres Ac/an Matrix A2 C3 D 4 C1 C4 DataTorrent Datomic DeepDB Documentum xDB Drizzle B6 D 2 E1 C2 B4 HPCC HyperDex Hypergraph DB Hypertable IBM Big SQL D 6 D 2 E2 A3 B4 MonetDB MongoDB MongoLab Mortar Data MySQL E3 B6 A 3 A 2 C5 ScaleOut So^ware SciDB So^layer So^ware AG solidDB B5 Ac/an PSQL E5 Ehcache A5 IBM BigInsights C4 MySQL Cluster D6 SpaceCurve C6 Ac/an Vector A1 Elas/csearch B4 IBM DB2 C4 MySQL Fabric C1 Sparksee E1 Ac/an Versant B3 EnterpriseDB E6 IBM eXtreme Scale C1 Neo4J E1 SPARQLBASE D1 Adabas C4 CodeFutures D1 IBM IMS B2 NGDATA C4 Spider C2 Aerospike C4 CodeFutures C6 IBM InfoSphere C3 NuoDB B3 Splice Machine E1 AffinityDB E2 Compose B2 IBM InfoSphere Data Explorer E1 Objec/vity B2 Splunk E1 Allegrograph D4 Con/nuent A2 IBM InfoSphere Streams E2 ObjectRocket B3 SQLite D3 Al/base HDB C2 Couchbase B4 IBM PureData D2 ObjectRocket Redis A2 SQLStream D3 Al/base XDB D2 CouchDB B6 IBM PureData for Analy/cs D1 ObjectStore B6 SQream A3 Al/scale D5 Database.com B5 Impala C5 OpenStack Trove B2 Sqrrl Enterprise B4 Apache Drill A5 Databricks/Spark E6 InfiniSpan A5 Oracle Big Data Appliance A1 SRCH2 B4 Apache Hive C2 DataStax Enterprise C3 InfiniSQL A5 Oracle Big Data Cloud B2 Starcounter A2 Apache S4 A2 DataTorrent E1 InfiniteGraph E5 Oracle Coherence D1 Stardog A2 Apache Storm C3 Datomic D6 InfluxDB B4 Oracle Database C5 StormDB B3 Apache Tajo D4 DeepDB C4 Infobright A1 Oracle Endeca Server A6 Stra/o B2 ArangoDB E2 DocumentDB A3 Infochimps B4 Oracle Exadata B1 Sumo Logic A1 A[vio C1 Documentum xDB B5 Informix B6 Oracle Exaly/cs A3 TeSystems E2 AWS DynamoDB C5 Drizzle E1 IntersystemsCaché D2 Oracle NoSQL C1 Tamino XML Server E4 AWS Elas/Cache E5 Ehcache C1 Ipedo XML Database C5 Oracle TimesTen D6 TempoIQ E2 AWS Elas/Cache with Redis A1 Elas/csearch E2 Iris Couch C1 OrientDB B6 Teradata A4 AWS EMR B3 EnterpriseDB E4 IronCache C6 ParStream B6 Teradata Aster A2 AWS Kinesis C5 Exasol B5 JethroData B3 Percona C4 Tesora D5 AWS RDS C3 FairCom D2 JumboDB E4 Pivotal GemFire E4 TIBCO Ac/veSpaces D6 AWS Redshi^ C2 FatDB C3 JustOneDB D6 Pivotal Greenplum B1 TIBCO LogLogic E2 AWS SimpleDB D5 FathomDB C6 Kogni/o B5 Pivotal HD/HAWQ A2 TIBCO StreamBase E2 Azure DocumentDB A2 FeedZai C6 Kx Systems D3 Pivotal SQLFire D1 Titan B2 Azure Search B3 Firebird D2 LevelDB B3 PostgreseXL C4 TokuDB D5 Azure SQL Database D1 FlockDB B1 Logentries B3 PostgreSQL D2 TokuMX D2 BerkeleyDB C2 Founda/onDB B1 Loggly B4 Presto B3 Trafodion E4 BigCache D4 Galera D6 LogicBlox C5 ProgressOpenEdge D3 TransLa[ce E4 BigMemory C4 GenieDB A2 Lokad A3 Qubole A4 Treasure Data D6 BitYota E4 GigaSpaces XAP E2 Lotus Notes A3 Rackspace E1 Trinity C2 Cassandra E1 Giraph A1 Lucene/Solr C5 Rackspace Cloud Databases C1 UniData D2 Cassandra.io D5 Google BigQuery C6 LucidDB B6 RainStor C1 UniVerse B5 CitusDB D2 Google App Engine Datastore B2 LucidWorks Big Data D2 RavenDB A3 Verizon D5 ClearDB A2 Google Cloud Dataflow E2 MagnetoDB E6 Red Hat JBoss Data Grid B3 vFabric Postgres E2 Cloudant D2 Google Cloud Datastore B4 MammothDB C2 Redis D2 Voldemort D2 CloudBird C5 Google Cloud SQL A4 MapR E3 Redis Labs Memcached Cloud C3 VoltDB A5 Cloudera A4 Google Compute Engine B3 MariaDB E2 Redis Labs Redis Cloud D1 WakandaDB E5 CloudTran D1 GrapheneDB B3 MariaDB Enterprise E2 Redisetoego D5 WebScaleSQL C4 Clusrix E3 GridGain B2 MarkLogic E2 RedisGreen A3 xPlenty C3 CockroachDB A2 Guavus D1 McObject D2 RethinkDB B6 XtremeData C4 CodeFutures B5 Hadapt E5 Memcached C2 Riak C1 YarcData D2 Compose C2 Handlersocket E3 MemCachier B5 SAP HANA A4 ZeWaset D4 Con/nuent E5 Hazelcast C3 MemSQL B3 SAP Sybase ASE D4 Zimory Scale B2 CortexDB C2 HBase A3 Metamarkets C6 SAP Sybase IQ C2 Couchbase C5 Heroku Postgres C6 Metamarkets Druid B3 SAP Sybase SQL Anywhere D2 CouchDB A5 Hortonworks A5 Metascale A3 Savvis D5 Database.com A1 HP Autonomy A5 Microso^ HD Insight C4 ScaleArc A5 Databricks/Spark D5 HP Cloud RDB for MySQL B5 Microso^ SQL Server C4 ScaleBase
  • 18. Myth #2: Everyone should invest in Big Data Fact: • Not every company will benefit from Big Data • It depends on your size and your ability – Local pizza shop vs. state-wide or national chain 18Copyright 2015 by Data Blueprint Slide #
  • 19. Big Data can create significant financial value across sectors • Some (not all) companies can take advantage of Big Data to create value if they want to compete 20Copyright 2015 by Data Blueprint Slide #
  • 20. A Framework for Implementing NoSQL, Hadoop Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques • Big Data Context – We are using the wrong vocabulary to discuss this topic • More Precise Definitions – Framework – Non Von Neuman Architectures – Hadoop/Nosql • Big Data – Historical Perspective • Big Data Approach – Crawl, Walk, Run • Framework Examples – Social – Operational BWB • Take Aways and Q&A Tweeting now at: #dataed 20Copyright 2015 by Data Blueprint Slide #
  • 21. Big Data = Big Spending • Enterprises are spending wildly on Big Data but don’t know if it’s worth it yet (Business Insider, 2012) • Big Data Technology Spending Trend: • 83% increase over the next 3 years (worldwide): – 2012: $28 billion – 2013: $34 billion – 2016: $232 billion • Caution: – Don’t fall victim to SOS (Shiny Object Syndrome) – A lot of money is being invested but is it generating the expected return? – Gartner Hype Cycle suggests results are going to be disappointing http://www.businessinsider.com/enterprise-big-data-spending-2012-11#ixzz2cdT8shhe http://www.inc.com/kathleen-kim/big-data-spending-to-increase-for-it-industry.html http://www.gartner.com/DisplayDocument?id=2195915&ref=clientFriendlyUrl 21Copyright 2015 by Data Blueprint Slide #
  • 22. Who wrote this … ? 23 Copyright 2015 by Data Blueprint • In considering any new subject, there is frequently a tendency first to overrate what we find to be already interesting or remarkable, and secondly - by a sort of natural reaction - to undervalue the true state of the case. • AugustaAda King, Countess of Lovelace - aka Ada Lovelace, publisher of the first computing program
  • 23. Gartner Five-phase Hype Cycle http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp Peak of Inflated Expectations: Early publicity produces a number of success stories—often accompanied by scores of failures. Some companies take action; many do not. Trough of Disillusionment: Interest wanes as experiments and implementations fail to deliver. Producers of the technology shake out or fail. Investments continue only if the surviving providers improve their products to the satisfaction of early adopters. Technology Trigger: A potential technology breakthrough kicks things off. Early proof-of-concept stories and media interest trigger significant publicity. Often no usable products exist and commercial viability is unproven. Slope of Enlightenment: More instances of how the technology can benefit the enterprise start to crystallize and become more widely understood. Second- and third- generation products appear from technology providers. More enterprises fund pilots; conservative companies remain cautious. Plateau of Productivity: Mainstream adoption starts to take off. Criteria for assessing provider viability are more clearly defined. The technology’s broad market applicability and relevance are clearly paying off. 23Copyright 2015 by Data Blueprint Slide #
  • 24. Gartner Hype Cycle "A focus on big data is not a substitute for the fundamentals of information management." 24Copyright 2015 by Data Blueprint Slide #
  • 25. 2012 Big Data in Gartner’s Hype Cycle 25Copyright 2015 by Data Blueprint Slide #
  • 26. 2013 Big Data in Gartner’s Hype Cycle 26Copyright 2015 by Data Blueprint Slide #
  • 27. 2014 Big Data in Gartner’s Hype Cycle 27Copyright 2015 by Data Blueprint Slide #
  • 28. Big Data Gartner Hype Cycle Copyright 2015 by Data Blueprint Slide # 29
  • 29. Myth #3: Big Data is innovative Fact: • Big Data techniques are innovative • ROI and insights depend on the size of the business and the amount of data used and produced, e.g. – Local pizza place vs. Papa John’s – Retail 29Copyright 2015 by Data Blueprint Slide #
  • 30. My Barn must pass a foundation inspection • Before further construction can proceed • No IT equivalent in most organizations 30Copyright 2015 by Data Blueprint Slide #
  • 31. Frameworks • A system of ideas for guiding analyses • A means of organizing project data • Data integration priorities decision making framework • A means of assessing progress 8 31Copyright 2015 by Data Blueprint Slide #
  • 32. "There’s now a blurring between the storage world and the memory world" • Faster processors outstripped not only the hard disk, but main memory – Hard disk too slow – Memory too small • Flash drives remove both bottlenecks – Combined Apple and Yahoo have spend more than $500 million to date • Make it look like traditional storage or more system memory – Minimum 10x improvements – Dragonstone server is 3.2 tb flash memory (Facebook) • Bottom line - new capabilities! 8 32Copyright 2015 by Data Blueprint Slide #
  • 33. Non-von Neumann Processing/Efficiencies • von Neumann bottleneck (computer science) – "An inefficiency inherent in the design of any von Neumann machine that arises from the fact that most computer time is spent in moving information between storage and the central processing unit rather than operating on it" [http://encyclopedia2.thefreedictionary.com/von+Neumann+bottleneck] • Michael Stonebraker – Ingres (Berkeley/MIT) – Modern database processing is approximately 4% efficient • Many big data architectures are attempts to address this, but: – Zero sum game – Trade characteristics against each other • Reliability • Predictability – Google/MapReduce/ Bigtable – Amazon/Dynamo – Netflix/Chaos Monkey – Hadoop – McDipper • Big data techniques exploit non-von Neumann processing 8 33Copyright 2015 by Data Blueprint Slide #
  • 34. m • Decomposition • Reassembly – not optional! 8 34Copyright 2015 by Data Blueprint Slide #
  • 35. One of Data Blueprint's Big Data Clusters 8 35Copyright 2015 by Data Blueprint Slide #
  • 36. <-Feedback Exploitable Insight • Patterns/objects, hypotheses emerge – What can be observed? • Operationalizing – The dots can be repeatedly connected Analytics Insight Cycle Exis&ng Knowledge /base • Things are happening – Sensemaking techniques address "what" is happening? • Patterns/objects, hypotheses emerge – What can be observed? • Operationalizing – The dots can be repeatedly connected – "Big Data" contributions are shown in orange • Margaret Boden's computational creativity – Exploratory – Combinational – Transformational Volume Variety Velocity Potential/ actual insights Pattern/Object Emergence Analytical bottleneck 8 36Copyright 2015 by Data Blueprint Slide #
  • 37. Big Data: Two prominent use cases • Sandwich offers a good analogy of the big data and existing technologies • Landing Zone (less expensive) – Especially useful in cases were data is highly disposable • Existing technologies are the – Contents sandwiched and complemented landing zone and archival capabilities • Archiving/Offloading (less need for structure) – "Cold" transactional and analytic data Adapted from Nancy Kopp: http://ibmdatamag.com/2013/08/relishing-the-big-data-burger/ Landing Zone Archiving Offloading Existing Data Architectural Processing 8 37Copyright 2015 by Data Blueprint Slide #
  • 38. What is NoSQL? • Commonly interpreted as "Not Only SQL • Broad class of database management technologies that provide a mechanism for storage and retrieval of data that doesn’t follow traditional relational database methodology. • Motivations – Simplicity of design – Horizontal scaling – Finer control over availability of the data. • The data structures used by NoSQL databases differ from those used in relational databases, making some operations faster in NoSQL and others faster in relational databases. 8 38Copyright 2015 by Data Blueprint Slide #
  • 39. What is Hadoop? • A data storage and processing system, that runs on clusters of commodity servers. • Able to store any kind of data in its native format. • Perform a wide variety of analyses and transformations. • Store terabytes, and even petabytes, of data inexpensively. • Handles hardware and system failures automatically, without losing data or interrupting data analyses. • Critical components of Hadoop: – HDFS- The Hadoop Distributed File System is the storage system for a Hadoop cluster, responsible for distribution of data across the servers. – Mapreduce- The inner workings of Hadoop that allows for distributed and parallel analytical job execution. 40Copyright 2015 by Data Blueprint Slide #
  • 40. Why NoSQL? Why Hadoop? • Large number of users (read: the internet) • Rapid app development and deployment • Large number of mission critical writes (sensors/etc) • Small, continuous reads and writes, especially where “Consistency” is less important (social networks) • Hadoop solves the hard scaling problems caused by large amounts of complex data. • As the amount of data in a cluster grows, new servers can be added to a Hadoop cluster incrementally and inexpensively to store and analyze it. 40Copyright 2015 by Data Blueprint Slide #
  • 41. Hadoop Use Cases in the Real World • Risk Modeling • Customer Churn Analysis • Recommendation Engine • Ad Targeting • Point of Sale Transaction Analysis • Social Sentiment on Social Media • Analyzing network data to predict failure • Threat analysis • Trade Surveillance 41Copyright 2015 by Data Blueprint Slide #
  • 43. 44 Copyright 2015 by Data Blueprint • Data analysis struggles with the social – Your brain is excellent at social cognition - people can • Mirror each other’s emotional states • Detect uncooperative behavior • Assign value to things through emotion – Data analysis measures the quantity of social interactions but not the quality • Map interactions with co-workers you see during work days • Can't capture devotion to childhood friends seen annually – When making (personal) decisions about social relationships, it’s foolish to swap the amazing machine in your skull for the crude machine on your desk • Data struggles with context – Decisions are embedded in sequences and contexts – Brains think in stories - weaving together multiple causes and multiple contexts – Data analysis is pretty bad at • Narratives / Emergent thinking / Explaining • Data creates bigger haystacks – More data leads to more statistically significant correlations – Most are spurious and deceive us – Falsity grows exponentially greater amounts of data we collect • Big data has trouble with big problems – For example: the economic stimulus debate – No one has been persuaded by data to switch sides • Data favors memes over masterpieces – Detect when large numbers of people take an instant liking to some cultural product – Products are hated initially because they are unfamiliar • Data obscures values – Data is never raw; it’s always structured according to somebody’s predispositions and values Some Big Data Limitations
  • 44. Myth #4: Big Data is just another IT project Copyright 2013 by Data Blueprint Fact: • Big Data is not your typical IT project – Does not answer typical IT questions – Trend analysis, agile, actionable, etc. – Fundamentally different approach • Big Data Projects are exploratory • Big Data enables new capabilities • Big Data can be a disruptive technology • It might sound simple but that doesn’t mean it’s easy • Beware of SOS (Shiny Object Syndrome) 44
  • 45. http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics Copyright 2013 by Data Blueprint Myth #4: Big Data is new Fact: • The term originated in the Silicon Valley in the 1990s • The concept has been used previously – 800 year old linguistic datasets – Use in sciences in 1600s – Kepler, Sloan Digital Sky Survey, Statisticians’ view • Much harder to leverage Big Data when you lack appropriate techniques 45
  • 46. The Bills of Mortality was an Early Data Collection 47Copyright 2015 by Data Blueprint Slide #
  • 47. Mortality Geocoding Where is it happening? Copyright 2015 by Data Blueprint 47
  • 48. ("Whereas of the Plague") Plague Peak When is it happening? Copyright 2015 by Data Blueprint 48
  • 49. Black Rats or Rattus Rattus Why is it happening? 50 Copyright 2015 by Data Blueprint
  • 50. What Will Happen? What will happen? 51 Copyright 2015 by Data Blueprint
  • 51. Formalizing Data Management • Defend the Realm: The authorized history of MI5 by Christopher Andrew • World War I • 1914 • At war with much of Europe • 14,000,000 Germans living in the United Kingdom • How to efficiently and effectively manage information on that many individuals? • The Security Service is responsible for "protecting the UK against threats to national security from espionage, terrorism and sabotage, from the activities of agents of foreign powers, and from actions intended to overthrow or undermine parliamentary democracy by political, industrial or violent means." 51Copyright 2015 by Data Blueprint Slide #
  • 52. “As a final thought, how about a machine that would send, via closed-circuit television, visual and oral information needed immediately at high-level conferences or briefings? Let’s say that a group of senior officers are contemplating a covert action program for Afghanistan. Things go well until someone asks “Well, just how many schools are there in the country, and what is the literacy rate?” No one in the room knows. (Remember, this is an imaginary situation). So the junior member present dials a code number into a device at one end of the table. Thirty seconds later, on the screen overhead, a teletype printer begins to hammer out the required data. Before the meeting is over, the group has been given, through the same method, the names of countries that have airlines into Afghanistan, a biographical profile of the Soviet ambassador there, and the Pakistani order of battle along the Afghanistan frontier. Neat, no?” • Predicted use of not just computing in the intelligence community • Also forecast predictive analytics • Accompanying privacy challenges 52Copyright 2015 by Data Blueprint Slide #
  • 53. A Framework for Implementing NoSQL, Hadoop Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques • Big Data Context – We are using the wrong vocabulary to discuss this topic • More Precise Definitions – Framework – Non Von Neuman Architectures – Hadoop/Nosql • Big Data – Historical Perspective • Big Data Approach – Crawl, Walk, Run • Framework Examples – Social – Operational BWB • Take Aways and Q&A Tweeting now at: #dataed 53Copyright 2015 by Data Blueprint Slide #
  • 54. http://articles.washingtonpost.com/2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics Copyright 2013 by Data Blueprint Myth #6: Big Data provides all the Answers Fact: • Big Data does not mean the end of scientific theory • Be careful or you’ll end up with spurious correlations – Don’t just go fishing for correlations and hope they will explain the world • To get to the WHY of things, you need ideas, hypotheses and theories • Having more data does not substitute for thinking hard, recognizing anomalies and exploring deep truths • You need the right approach 54
  • 55. Copyright 2013 by Data Blueprint 55
  • 56. • Identify business opportunity Copyright 2013 by Data Blueprint • How can data be leveraged in exploring – External market place • Analyze opportunities and threats – Internal efficiencies • Analyze strengths and weaknesses 56
  • 57. Example: 2012 Olympic Summer Games Copyright 2013 by Data Blueprint 1. Volume: 845 million FB users averaging 15 TB + of data/day 2. Velocity: 60 GB of data per second 3. Variety: 8.5 billion devices connected 4. Variability: Sponsor data, athlete data, etc. 5. Vitality: Data Art project “Emoto” 6. Virtual: Social media 57
  • 58. • Based on my 6 V analysis, do I need a Big Data solution Copyright 2013 by Data Blueprint or does my current BI solution address my business opportunity? – Do the 6 Vs indicate general Big Data characteristics? – What are the limitations of my current Bi environment? (Technology constraint) – What are my budgetary restrictions? (Financial constraint) – What is my current Big Data knowledge base? (Knowledge constraint) 58
  • 59. • MUST have both Foundational and Technical practice expertise 60 Copyright 2013 by Data Blueprint
  • 60. Copyright 2013 by Data Blueprint 60
  • 61. • Data Strategy Copyright 2013 by Data Blueprint • Data Governance • Data Architecture • Data Education 61
  • 62. • Data Quality Copyright 2013 by Data Blueprint • Data Integration • Data Platforms • BI/Analytics 62
  • 63. • Needs to be actionable • Generally well understood by business • Document what has been learned Copyright 2013 by Data Blueprint 63
  • 64. • Perfect results are not necessary • Reiterate and refine • Iterative process to reach decision point • Use as feedback for next exploration Copyright 2013 by Data Blueprint 64
  • 65. Copyright 2013 by Data Blueprint 65
  • 66. Myth #7: You need Big Data for Insights Fact: • Distinction between Big Data and doing analytics – Big Data is defined by the technology stack that you use – Big Data is used for predictive and prescriptive analytics • Use existing data for reporting, figure out bottlenecks and optimize current business model • Understand how is your data structured, architected and stored Copyright 2013 by Data Blueprint 66
  • 67. A Framework for Implementing NoSQL, Hadoop Demystifying Big Data 2.0: Developing the Right Approach for Implementing Big Data Techniques • Big Data Context – We are using the wrong vocabulary to discuss this topic • More Precise Definitions – Framework – Non Von Neuman Architectures – Hadoop/Nosql • Big Data – Historical Perspective • Big Data Approach – Crawl, Walk, Run • Framework Examples – Social – Operational BWB • Take Aways and Q&A 68Copyright 2015 by Data Blueprint Slide # Tweeting now at: #dataed
  • 68. Social Sentiment Analysis • One of the burgeoning areas for use of Big Data / Hadoop platforms. • Allows for the landing of multiple sources of unstructured data. (Twitter, Facebook, Linked In, etc.) • Data than can be analyzed with algorithms looking for keywords that determine positive/negative feedback Copyright 2013 by Data Blueprint 69
  • 69. Operational Use • Utilize real time pricing data from multiple sources to dynamically update the pricing for books in the Amazon Marketplace. • Ingested data from multiple sources looking for real time changes in price. • Would apply predictive model to determine best price point and set price of their books on the marketplace. • Increased conversion rate, but created a race to the bottom situation if not monitored Copyright 2013 by Data Blueprint 79
  • 70. Healthcare Example: Patient Data Copyright 2013 by Data Blueprint • Clinical data: – Diagnosis/prognosis/treatment – Genetic data • Patient demographic data • Insurance data: – Insurance provider – Claims data • Prescriptions & pharmacy information • Physical fitness data – Activity tracking through smartphone apps & social media • Health history • Medical research data 70
  • 71. http://www.forbes.com/sites/xerox/2013/09/27/big-data-boosts-customer-loyalty-no-really/ Copyright 2013 by Data Blueprint Retail Example: Loyalty Programs & Big Data • Companies need to understand current wants and needs AND predict future tendencies • Customer -> Repeat Customer -> Brand Advocate • Customer loyalty programs & retention strategies – Track what is being purchased and how often – Coupons based on purchasing history – Targeted communications, campaigns & special offers – Social media for additional interactions – Personalize consumer interactions • Customer purchase history influences product placements – Retailers rapidly respond to consumer demands – Product placements, planogram optimization, etc. 71
  • 72. References Copyright 2013 by Data Blueprint • The Human Face of Big Data, Rick Smolan & Jennifer Erwitt, First Edition edition (November 20, 2012) • McKinsey: Big Data: The next frontier for innovation, competition and productivity (http://www.mckinsey.com/insights/business_technology/ big_data_the_next_frontier_for_innovation?p=1) • The Washington Post: Five Myths about Big Data (http://articles.washingtonpost.com/ 2013-08-16/opinions/41416362_1_big-data-data-crunching-marketing-analytics) • Gartner: Gartner’s 2013 Hype Cycle for Emerging Technologies Maps Out Evolving Relationship Between Humans and Machines (http://www.gartner.com/newsroom/id/ 2575515) • The New York Times | Opinion Pages: What Data Can’t Do (http://www.nytimes.com/ 2013/02/19/opinion/brooks-what-data-cant-do.html?_r=1&) • CIO.com: Five Steps for How to Better Manage Your Data (http://www.cio.com.au/article/ 429681/five_steps_how_better_manage_your_data/) • Business Insider: Enterprises Aren’t Spending Wildly on ‘Big Data’But Don’t Know If It’s Worth It Yet (http://www.businessinsider.com/enterprise-big-data- spending-2012-11#ixzz2cdT8shhe) • Inc.com: Big Data, Big Money: IT Industry to Increase Spending (http://www.inc.com/ kathleen-kim/big-data-spending-to-increase-for-it-industry.html) • Forbes: Big Data Boosts Customer Loyalty. No, Really. (http://www.forbes.com/sites/xerox/ 2013/09/27/big-data-boosts-customer-loyalty-no-really/) 72
  • 73. Data Management Maturity July 14, 2015 @ 2:00 PM ET/11:00 AM PT Trends in Data Modeling August 11, 2015 @ 2:00 PM ET/11:00 AM PT Sign up here: www.datablueprint.com/webinar-schedule or www.dataversity.net Upcoming Events Copyright 2013 by Data Blueprint 73
  • 74. 10124 W. Broad Street, Suite C GlenAllen, Virginia 23060 804.521.4056
  • 75. Copyright 2013 by Data Blueprint 77 Potential Tradeoffs: CAP theorem: consistency, availability and partition-tolerance Small datasets can be both consistent & available Partition (Fault) Tolerance AvailabilityConsistency Atomicity Consistency Isolation Durability Basic Availability Soft-state Eventual consistency
  • 76. Additional Context Copyright 2013 by Data Blueprint 76
  • 77. http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1 Copyright 2013 by Data Blueprint 5 Ways in which Data creates Business Value 1. Information is transparent and usable at much higher frequency 2. Expose variability and boost performance 3. Narrow segmentation of customers and more precisely tailored products or services 4. Sophisticated analytics and improved decision-making 5. Improved development of the next generation of products and services 77
  • 78. • We are at an inflection point: The sheer volume of data generated, stored, and mined for insights has become economically relevant to businesses, government, and consumers (McKinsey) • We believe the same important principles still apply: – What problem are you trying to solve for your business? Your solution needs to fit your problem – Doing data for (big) data’s sake is not going to solve any problems – Risk of spending a lot of money on chasing Big Data that will realize little to no returns - especially at this hype cycle stage http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovation?p=1 Why the Big Deal about Big Data? 80 Copyright 2013 by Data Blueprint
  • 79. http://www.cio.com.au/article/429681/five_steps_how_better_manage_your_data Copyright 2013 by Data Blueprint Business Information Market: $1.1 Trillion a Year • Enterprises spend an average of $38 million on information/year • Small and medium sized businesses on average spend $332,000 79
  • 80. Take Aways-Big Data Context Copyright 2013 by Data Blueprint • Technology continues to evolve at increasing speeds • Big Data is here – We have the potential to create insights • Spend wisely & strategically: – Big Data is not going to solve all your problems. • Fact: – Big Data is not for everyone • Fact: – Lack of a clear definition • Hype Cycle: – Current: Peak of Inflated Expectations – Soon: Trough of Disillusionment 80
  • 81. Take Aways: Big Data Challenges Today Copyright 2013 by Data Blueprint • Fact: Big Data techniques are innovative but “Big Data” is not • Challenges are both foundational and technical, today as well as in 1600s • Technology continues to advance rapidly (4 Vs) • Challenges associated with Big Data are not new: – Well-known foundational data management issues – Need to align data and business with rapidly changing environment – Duplicity, accessibility, availability – Foundational business issues 81
  • 82. Take Aways-Approach: Crawl, Walk, Run Copyright 2013 by Data Blueprint • Crawl: – Identify business opportunity and determine whether you truly need a Big Data solution • Walk: – Apply a combination of foundational and technical data management practices. Document your insights and make sure they are actionable • Run: – Recycle and explore. Staying agile allows you to be exploratory. 82
  • 83. Take Aways-Design Principles: Foundational & Technical Copyright 2013 by Data Blueprint • Foundational data management principles still apply • Beware of SOS (Shiny Object Syndrome) • You must have a data strategy before you can have a Big Data strategy • Fact: You don’t need Big Data to gain insights • Big Data integration requirements evolve from your strategy • Fact: Bigger Data is not always better 83
  • 84. Take Aways: In Summary Copyright 2013 by Data Blueprint • Big data techniques are innovative but “Big Data” is not • Big Data characteristics: 6 Vs – Volume, Velocity, Variety, Variability, Vitality, Virtual • Approach: Crawl-Walk-Run • Big Data challenges require solutions that are based on foundational and technical data management practices • Beware of SOS (Shiny Object Syndrome): – Spend wisely and strategically – Big Data is not going to solve all your problems 84
  • 85. Foundational Practice: Data Strategy • Your data strategy must align to your organizational business strategy and operating model • As the market place becomes more data- driven, a data-focused business strategy is an imperative • Must have data strategy before you have a Big Data strategy Copyright 2013 by Data Blueprint 85
  • 86. Data Strategy Considerations • What are the questions that you cannot answer today? • Is there a direct reliance on understanding customer behavior to drive revenue? • Do you have information overload and are you trying to find the signal in the noise? • Which is more important: – Establishing value from current data assets/data reporting? – Exploring Big Data opportunities? Copyright 2013 by Data Blueprint 86
  • 87. Foundational Practice: Data Architecture • Common vocabulary expressing integrated requirements ensuring that data assets are stored, arranged, managed, and used in systems in support of organizational strategy [Aiken 2010] • Most organizations have data assets that are not supportive of strategies • Big question: – How can organizations more effectively use their information architectures to support strategy implementation? 90 Copyright 2013 by Data Blueprint
  • 88. Data Architecture Considerations • Does your current architecture for BI and analytics support Big Data? • Are you getting enough value out of your current architecture? • Can you easily integrate and share information across your organization? • Do you struggle to extract the value from your data because it is too cumbersome to navigate and access? • Are you confident your data is organized to meet the needs of your business? Copyright 2013 by Data Blueprint 88
  • 89. Technical Practice: Data Integration • A data-centric organization requires unified data • Integrating data across organizational silos creates new insights • It is also the biggest challenge • Big Data techniques can be used to complement existing integration efforts Copyright 2013 by Data Blueprint 89
  • 90. Data Integration Considerations • The complexity of your data integration challenge depends on the questions you’re trying to answer • Integration requirements for Big Data are dependent on the types of questions you’re asking: – Integration here may be more fuzzy than discrete – Integration is domain-based (based on time, customer concept, geographic distribution) • Those requirements should evolve from your strategy Copyright 2013 by Data Blueprint 90
  • 91. Technical Practice: Data Quality • Quality is driven by fit for purpose considerations • Big Data quality is different: – Basic – Availability – Soft-state – Eventual consistency • Directional accuracy is the goal • Focus on your most important data assets and ensure our solutions address the root cause of any quality issues – so that your data is correct when it is first created • Experience has shown that organizations can never get in front of their data quality issues if they only use the ‘find-and-fix’ approach Copyright 2013 by Data Blueprint 91
  • 92. Data Quality Considerations • Big Data is trying to be predictive • What are the questions you are trying to answer? – What level of accuracy are you looking for? – What confidence levels? – Example: Do I need to know exactly what the customer is going to buy or do I just need to know the range of products he/ she is going to choose from? Copyright 2013 by Data Blueprint 92
  • 93. Technical Practice: Data Platforms • Do you want to measure critical operational process performance? • No one data platform can answer all your questions. This is commonly misunderstood and often leads to very expensive, bloated and ineffective data platforms. • Understanding the questions that need to be asked and how to build the right data platform or how to optimize an existing one Copyright 2013 by Data Blueprint 93
  • 94. Data Platforms Considerations • Commonalities between most big data stacks with file storage, columnar store, querying engine, etc. • Big data stack generally looks the same until you get into appliances – Algorithms are built into appliance themselves, e.g. Netezza, Teradata, etc.) • Ask these questions: – Do you want insights on your customer’s behavior? – Do you need real-time customer transactional information? – Do you need historical data or just access to the latest transactions? – Where do you go to find the single version of the truth about your customers? Copyright 2013 by Data Blueprint 94