SlideShare a Scribd company logo
1 of 45
Download to read offline
Grab some coffee and enjoy 
the pre-show banter before 
the top of the hour!
Step By Step – A Process for Building Analytical Insights 
The Briefing Room
Twitter Tag: #briefr 
The Briefing Room 
Welcome 
Host: 
Eric Kavanagh 
eric.kavanagh@bloorgroup.com 
@eric_kavanagh
! Reveal the essential characteristics of enterprise software, 
good and bad 
! Provide a forum for detailed analysis of today’s innovative 
technologies 
! Give vendors a chance to explain their product to savvy 
analysts 
! Allow audience members to pose serious questions... and get 
answers! 
Twitter Tag: #briefr 
The Briefing Room 
Mission
Twitter Tag: #briefr 
The Briefing Room 
Topics 
This Month: BIG DATA 
March: CLOUD 
April: BIG DATA 
2014 Editorial Calendar at 
www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr 
The Briefing Room 
Big Data 
“In God we trust. 
All others must bring 
data.” 
~W. Edwards Deming, Statistician
Twitter Tag: #briefr 
The Briefing Room 
Analyst: Kirk Borne 
Kirk Borne is a Transdisciplinary Data Scientist and an 
Astrophysicist. He is Professor of Astrophysics and 
Computational Science at George Mason University. 
He has been at Mason since 2003, where he does 
research, teaches, and advises students in the Data 
Science program. Previously, he spent nearly 20 years 
in positions supporting NASA projects, including an 
assignment as NASA's Data Archive Project Scientist 
for the Hubble Space Telescope, and as Project 
Manager in NASA's Space Science Data Operations 
Office. He has extensive experience in big data and 
data science and he is on the editorial boards of 
several scientific research journals and is an officer in 
several national and international professional 
societies devoted to data science, data mining, and 
statistics. He has published over 200 articles 
(research papers, conference papers, and book 
chapters), and given over 200 invited talks at 
conferences and universities worldwide. 
@KirkDBorne 
http: //kirkborne.net
Twitter Tag: #briefr 
The Briefing Room 
Actian 
! Actian is a database and software development company 
! The Actian Analytics Platform connects to data and Big Data 
sources to perform actionable and advanced analytics 
! The platform is comprised of Actian DataFlow (formerly 
Pervasive DataRush), Actian Matrix (formerly ParAccel) and 
Actian Vector
Twitter Tag: #briefr 
The Briefing Room 
Guest: John Santaferraro 
John Santaferraro is the Vice President of Product 
Marketing at Actian. Prior to joining Actian, Santaferraro 
was an independent industry analyst in the business 
intelligence and analytics market. Before that he 
developed and executed a vertical market strategy for 
Hewlett Packard's BI group, focusing on energy, 
communications, retail, healthcare and financial services; 
he was also instrumental in helping establish HP’s new BI 
business group with a combination of solutions, products 
and consulting. In 2000, John founded a marketing and 
sales consulting company, Ferraro Consulting, providing 
business acceleration strategy for technology companies.
Suppor'ng 
the 
Data 
Scien'st 
Accelera'ng 
Big 
Data 
2.0 
John 
Santaferraro 
– 
VP 
of 
Solu'ons 
and 
Product 
Marke'ng 
Confiden'al 
© 
2014 
10 
Ac'an 
Corpora'on
Only 
the 
Privileged 
Excel 
in 
Big 
Data 
Analy'cs 
Confiden'al 
© 
2014 
11 
Ac'an 
Corpora'on 
Data 
Value
The 
“Moneyball” 
Effect 
! Analy'cs 
Go 
Mainstream 
■ Major 
League 
Baseball 
■ Hire 
the 
best 
team 
■ NSA 
and 
Big 
Data 
■ ??????????????? 
■ Target 
and 
Pregnancy 
■ Predic'ng 
pregnancies 
Confiden'al 
© 
2014 
12 
Ac'an 
Corpora'on
What 
is 
a 
Data 
Scien'st? 
Confiden'al 
© 
2014 
13 
Ac'an 
Corpora'on
What 
is 
a 
Data 
Scien'st? 
A 
data 
scien'st 
“…incorporates 
varying 
elements 
and 
builds 
on 
techniques 
and 
theories 
from 
many 
fields, 
including 
mathema'cs, 
sta's'cs, 
data 
engineering, 
paZern 
recogni'on 
and 
learning, 
advanced 
compu'ng, 
visualiza'on, 
uncertainty 
modeling, 
data 
warehousing, 
and 
high 
performance 
compu'ng 
with 
the 
goal 
of 
extrac'ng 
meaning 
from 
data 
and 
crea'ng 
data 
products.” 
Confiden'al 
© 
2014 
14 
Ac'an 
Corpora'on 
Created by Calvin Andrus, depicts a mash-up of 
disciplines from which Data Science is derived, 13 July 
http://en.wikipedia.org/wiki/Data_science 2012
Data 
Science 
Challenges 
Confiden'al 
© 
2014 
15 
Ac'an 
Corpora'on 
Less 
than 
20% 
of 
data 
scien'sts 
have 
the 
data 
and 
compute 
power 
they 
need 
to 
do 
their 
jobs 
The 
average 
data 
scien'st 
spends 
70% 
of 
their 
'me 
finding 
data, 
manipula'ng 
data, 
and 
wai'ng 
for 
queries 
to 
run 
15 
More 
than 
60% 
of 
all 
data 
scien'sts 
working 
Hadoop 
are 
s'll 
trying 
to 
created 
a 
business 
case
What 
is 
a 
Business 
Scien'st? 
“A 
business 
scien'st 
is 
an 
expert 
in 
the 
science 
of 
business, 
si]ng 
between 
the 
business 
analyst 
and 
the 
data 
scien'st, 
pulling 
together 
cross-­‐ 
func'onal 
exper'se 
from 
data 
science, 
analy'cs, 
business 
applica'ons, 
business 
processes, 
and 
Confiden'al 
© 
2014 
16 
Ac'an 
Corpora'on 
business 
strategy.
Business 
Science 
Skillset 
Understand How Analytics Work 
Understand Emerging Data Types 
Understand Business Operations & Strategy 
Learn Quickly 
Think Outside the Box 
Tell Compelling Stories 
Confiden'al 
© 
2014 
17 
Ac'an 
Corpora'on
The 
Tools 
of 
the 
Business 
Scien'st 
Confiden'al 
© 
2014 
18 
Ac'an 
Corpora'on 
! Libraries 
of 
Analy'c 
Func'ons 
Run 
at 
Extreme 
Speed 
■ Transforma'onal 
Analy'cs 
■ Sta's'cal 
Analy'cs 
■ Machine 
Learning 
Analy'cs 
■ Clustering 
Analy'cs 
■ Discovery 
Analy'cs 
! Visual 
Framework 
for 
Data 
Discovery, 
Prepara'on 
and 
Analy'cs 
■ Drag 
and 
Drop 
Interac'on 
■ Libraries 
of 
Data 
Prepara'on 
Operators 
■ Libraries 
of 
Analy'c 
Operators 
■ High-­‐Performance, 
Parallel 
Processing 
on 
Hadoop 
(or 
other 
file 
systems)
The 
Ac'an 
Analy'cs 
Pladorm: 
Accelera'ng 
Big 
Data 
2.0TM 
Data Value 
WWW Machine 
Confiden'al 
© 
2014 
19 
Data 
Ac'an 
Corpora'on 
Customer 
Delight 
Competitive 
Advantage 
Extreme 
Agility 
Act 
Connect 
Extreme 
Scale 
Extreme 
Performance 
Actian Analytics PlatformTM 
Analyze 
Actian Analytics Accelerators 
Accelerate 
Hadoop 
Accelerate 
Analytics 
Accelerate 
Business 
Intelligence 
Enterprise 
Applications Data 
Warehouse 
Social 
Internet of Things 
SaaS 
Mobile World-Class Risk 
Management 
Disruptive New 
Business Models Traditional NoSQL
Ac'an 
Analy'cs 
Pladorm: 
The 
High 
Performance 
Exoskeleton 
for 
Hadoop 
Amazon 
Redshift 
Confiden'al 
© 
2014 
20 
Ac'an 
Corpora'on 
Actian Analytics PlatformTM 
Select From Libraries of Analytics 
Manage Data Flows and Deliver Data Services 
Hadoop Move Into a 
Exchange Data 
and Workloads 
High 
Performance 
Analytic 
Engine for Low 
Latency 
Connect to Any Data Source 
Enterprise Data 
Machine Data 
Social Data 
Users 
Machines 
Business 
Processes 
Applications 
Data Warehouse 
Deliver Analytic 
Services 
SaaS Data
Ac'an 
Analy'cs 
Pladorm: 
The 
High 
Performance 
Exoskeleton 
for 
Hadoop 
Confiden'al 
© 
2014 
21 
Ac'an 
Corpora'on 
Actian Analytics PlatformTM 
Actian AnalyticsTM 
Actian DataFlowTM 
Hadoop Actian MatrixTM 
On Demand 
Integration 
Actian DataConnectTM 
Enterprise Data 
Machine Data 
Social Data 
Users 
Machines 
Business 
Processes 
Applications 
Data Warehouse 
On Demand 
Analytic Services 
Actian VectorTM 
SaaS Data 
• Visual, drag and drop interface for all data management on Hadoop 
• High performance data management and analytics natively on HDFS 
• SQL access to Hadoop data for low latency analytics 
• High speed data transfer across relational and non-relational 
Amazon 
Redshift
Ac'an 
Analy'cs 
Pladorm 
– 
High 
Performance 
Data 
Management 
and 
Analy'cs 
Na'vely 
on 
HDFS 
Confiden'al 
© 
2014 
22 
Ac'an 
Corpora'on 
Take processing to 
where the data lives, 
runs natively on any 
Hadoop distribution 
Confidential © 2014 Actian Corporation 22 
Ac)an 
Analy)cs 
Pla0orm 
Query Pipelining 
CPU Pipelining 
Optimized, On-HDFS Processing 
Hadoop – Leader Node 
Reuse and share all 
components from 
operators to 
workflows 
Optimize 
Choose from five sets 
of operators: 
Connections 
Transformation 
Data Quality 
Analytics 
Data Science 
Automatically detect 
resources, plan 
optimal utilization, 
and parallelize all 
workloads on Hadoop 
Use dual pipeline 
parallelism to 
accelerate 
performance 30X 
Run fully optimized 
processing directly on 
the Hadoop node via 
YARN 
Visual Framework 
Manage the entire 
analytic process in a 
visual framework with 
no coding required. 
≠ ☼ ≡ ∞ Δ Σ √ ≈ Σ = ? # ~ ‰
Ac'an 
Analy'cs 
Pladorm 
– 
High 
Performance, 
Low 
Latency 
Analy'cs 
on 
Hadoop 
Data 
Confiden'al 
© 
2014 
23 
Ac'an 
Corpora'on 
Confidential © 2014 Actian Corporation 23 
On-Demand Analytics 
On-Demand Integration 
Orchestration 
Analytic Libraries 
Optimizer 
LEADER NODE 
700+ in-database, 
analytic functions 
Massively Parallel 
Columnar 
Compressed 
Compiled 
Connected 
Node-to-node, bi-directional 
sharing of 
analytics & processes 
with Hadoop nodes 
Serve up high-performance 
analytic 
processing for any app 
Connect to any data 
source at the point of 
the query 
Manage data flows 
across the entire 
analytic process 
5 LEVELS OF 
OPTIMIZATION: 
SQL 
Planning 
Execution 
Communications 
Memory 
H H H H 
H H H H 
H H H H
Ac'an 
Analy'cs 
Pladorm: 
High 
Speed 
Interac'on 
Between 
Rela'onal 
and 
Non-­‐Rela'onal 
Amazon 
Redshift 
Confiden'al 
© 
2014 
24 
Ac'an 
Corpora'on 
Actian Analytics PlatformTM 
Actian AnalyticsTM 
Hadoop Actian MatrixTM 
Node-to-Node 
Connection 
Actian DataConnectTM 
Enterprise Data 
Machine Data 
Social Data 
Users 
Machines 
Business 
Processes 
Applications 
Data Warehouse 
On Demand 
Analytic Services 
Actian DataFlowTM 
SaaS Data 
On Demand 
Integration
Ac'an 
Analy'cs 
Pladorm: 
Deep 
Integra'on 
for 
High 
Performance 
SQL 
Analy'cs 
Amazon 
Redshift 
Confiden'al 
© 
2014 
25 
Ac'an 
Corpora'on 
Actian Analytics PlatformTM 
Actian AnalyticsTM 
Hadoop Actian MatrixTM 
Actian DataConnectTM 
Enterprise Data 
Machine Data 
Social Data 
Users 
Machines 
Business 
Processes 
Applications 
Data Warehouse 
On Demand 
Analytic Services 
Actian DataFlowTM 
SaaS Data 
On Demand 
Integration 
HCatalog 
Hive
Ac'an 
Analy'cs 
Pladorm: 
Deep 
Integra'on 
for 
High 
Performance 
SQL 
Analy'cs 
Amazon 
Redshift 
Confiden'al 
© 
2014 
26 
Ac'an 
Corpora'on 
Actian Analytics PlatformTM 
Actian AnalyticsTM 
Hadoop Actian MatrixTM 
Actian DataConnectTM 
Enterprise Data 
Machine Data 
Social Data 
Users 
Machines 
Business 
Processes 
Applications 
Data Warehouse 
On Demand 
Analytic Services 
Actian DataFlowTM 
SaaS Data 
On Demand 
Integration 
SQL, Python, 
Java
Ac'an 
Analy'cs 
Pladorm: 
The 
High 
Performance 
Exoskeleton 
for 
Hadoop 
Amazon 
Redshift 
Confiden'al 
© 
2014 
27 
Ac'an 
Corpora'on 
Actian Analytics PlatformTM 
Select From Libraries of Analytics 
Manage Data Flows and Deliver Data Services 
Hadoop Move Into a 
Exchange Data 
and Workloads 
High 
Performance 
Analytic 
Engine for Low 
Latency 
Connect to Any Data Source 
Enterprise Data 
Machine Data 
Social Data 
Users 
Machines 
Business 
Processes 
Applications 
Data Warehouse 
Deliver Analytic 
Services 
SaaS Data
A 
Tradi'onal 
Approach 
to 
Churn 
Analysis 
Account Info and 
Demographics 
Confiden'al 
© 
2014 
28 
Ac'an 
Corpora'on 
CRM 
CONNECT ANALYZE ACT 
LOGISTIC 
REGRESSION 
LONG MODEL 
TURNS 
LIMITED MODEL 
INPUTS 
MINIMUM 
ACCURACY 
GROUP DERIVE 
FIELDS CUSTOMER 
CHURN 
PREDICTION 
PRE-SET 
VISUALIZATIONS 
SMALL 
INCREASE IN 
EXISTING CUST 
REVENUE 
LIMITED 
HISTORICAL DATA 
PULLS
An 
Enriched 
Approach 
to 
Churn 
Analysis 
Account Info and 
Demographics 
Mobile Device Mgmt 
Customer and 
Network Call Quality 
Geospatial 
Dimensions 
Confiden'al 
© 
2014 
29 
Ac'an 
Corpora'on 
CRM 
CONNECT ANALYZE ACT 
JOIN 
AGGREGATE DECREASED 
PROVIDER 
FEES 
FILE 
PARSER 
GEOSPATIAL 
NETWORK 
ANALYSIS 
FAST NETWORK 
ISSUE ALERTS 
CDR Logs 
FILE 
PARSER 
FILE 
PARSER 
LOGISTIC 
REGRESSION 
GROUP DERIVE 
FIELDS 
CDR Logs 
JOIN 
GROUP DERIVE 
FIELDS 
CUSTOMER 
CHURN 
PREDICTION 
WITH 
TARGETED 
CUSTOMER 
CONTACT 
LIST 
SIGNIFICANT 
INCREASE IN 
EXISTING CUST 
REVENUE 
Device 
Data 
SFDC 
Event Filter 
IMPROVED 
INDUSTRY CUST 
SATISFACTION 
SCORES
An 
Expanding 
Approach 
to 
Churn 
Analysis 
Account Info and 
Demographics 
Mobile Device Mgmt 
Competitive 
Offerings 
Customer and 
Network Call Quality 
Geospatial 
Dimensions 
Confiden'al 
© 
2014 
30 
FILE 
PARSER 
Ac'an 
Corpora'on 
CRM 
CONNECT ANALYZE ACT 
JOIN 
AGGREGATE DECREASED 
PROVIDER 
FEES 
FILE 
PARSER 
GEOSPATIAL 
NETWORK 
ANALYSIS 
FAST NETWORK 
CDR Logs ISSUE ALERTS 
FILE 
PARSER 
LOGISTIC 
REGRESSION 
GROUP DERIVE 
FIELDS 
CDR Logs 
JOIN 
GROUP DERIVE 
FIELDS 
CUSTOMER 
CHURN 
PREDICTION 
WITH 
TARGETED 
CUSTOMER 
CONTACT 
LIST 
SIGNIFICANT 
INCREASE IN 
EXISTING CUST 
REVENUE 
Device 
Data 
SFDC 
Event 
Filter 
IMPROVED 
CUSTOMER 
SATISFACTION 
SCORES 
MARKET 
DATA 
FILE 
PARSER
Ques'ons 
Amazon 
Redshift 
Confiden'al 
© 
2014 
31 
Ac'an 
Corpora'on 
Actian Analytics PlatformTM 
Actian AnalyticsTM 
Hadoop Actian MatrixTM 
Actian DataConnectTM 
Enterprise Data 
Machine Data 
Social Data 
Users 
Machines 
Business 
Processes 
Applications 
Data Warehouse 
On Demand 
Analytic Services 
Actian DataFlowTM 
SaaS Data 
On Demand 
Integration 
SQL, Python, 
Java
Thank 
You 
www.ac'an.com 
facebook.com/ac'ancorp 
@ac'ancorp 
Confiden'al 
© 
2014 
32 
Ac'an 
Corpora'on
Disclaimer 
This 
document 
is 
for 
informa'onal 
purposes 
only 
and 
is 
subject 
to 
change 
at 
any 
'me 
without 
no'ce. 
The 
informa'on 
in 
this 
document 
is 
proprietary 
to 
Ac'an 
and 
no 
part 
of 
this 
document 
may 
be 
reproduced, 
copied, 
or 
transmiZed 
in 
any 
form 
or 
for 
any 
purpose 
without 
the 
express 
prior 
wriZen 
permission 
of 
Ac'an. 
This 
document 
is 
not 
intended 
to 
be 
binding 
upon 
Ac'an 
to 
any 
par'cular 
course 
of 
business, 
pricing, 
product 
strategy, 
and/or 
development. 
Ac'an 
assumes 
no 
responsibility 
for 
errors 
or 
omissions 
in 
this 
document. 
Ac'an 
shall 
have 
no 
liability 
for 
damages 
of 
any 
kind 
including 
without 
limita'on 
direct, 
special, 
indirect, 
or 
consequen'al 
damages 
that 
may 
result 
from 
the 
use 
of 
these 
materials. 
Ac'an 
does 
not 
warrant 
the 
accuracy 
or 
completeness 
of 
the 
informa'on, 
text, 
graphics, 
links, 
or 
other 
items 
contained 
within 
this 
material. 
This 
document 
is 
provided 
without 
a 
warranty 
of 
any 
kind, 
either 
express 
or 
implied, 
including 
but 
not 
limited 
to 
the 
implied 
warran'es 
of 
merchantability, 
fitness 
for 
a 
par'cular 
purpose, 
or 
non-­‐infringement. 
Confiden'al 
© 
2014 
33 
Ac'an 
Corpora'on
Twitter Tag: #briefr 
The Briefing Room 
Perceptions & Questions 
Analyst: 
Kirk Borne
Data Science for 
Everything 
Kirk Borne 
@KirkDBorne 
School of Physics, Astronomy, & Computational Sciences 
College of Science, George Mason University, Fairfax, VA
Let us start with a Big Data Quiz … 
Complete this sentence: Big Data is … 
a) the new oil. 
b) the new black. 
c) the new bacon. 
d) sexy. 
e) everything, quantified and tracked! 
f) All of the above
Definitions of Big Data 
From Wikipedia: 
• Big Data refers to any 
collection of data sets so large 
and complex that it becomes 
difficult to process using on-hand 
database management 
tools or traditional data 
processing applications. 
. 
My suggestion: 
• Big Data refers to 
“Everything, Quantified 
and Tracked!” 
• Examples: 
– Smart Cities 
– Retail Analytics 
– Personalized Healthcare (myDNA) 
– Cybersecurity 
– National Security 
– Big Data Science Projects 
– Social Networks 
– IoT = Internet of Things 
– M2M = Machine-to-Machine 
– … everything!
Rationale for Big Data Science 
• If we collect a thorough set of parameters (high-dimensional 
data) for a complete set of items 
within our domain of study, then we would have 
a “perfect” statistical model for that domain. 
• In other words, Big Data becomes the model for 
a domain X = we call this X-informatics. 
• Anything we want to know about that domain is 
specified and encoded within the data. 
• The goal of Big Data Science is to find those 
encodings, patterns, and knowledge nuggets. 
• See article by IBM’s James Kobielus: “Big-Data Vision? 
Whole-population analytics” at http://bit.ly/QB0uYi
Characterizing and Exposing 
the Big Data Hype: 3 V’s or ? 
n If the only distinguishing characteristic was that we have lots 
of data, we would call it “Lots of Data” (or a Tonnabytes!) 
n Big Data characteristics: the 3+n V’s = 
1. Volume (lots of data = “Tonnabytes”) 
2. Variety (complexity, curse of dimensionality, many formats) 
3. Velocity (high rate of data and information flow, real-time, incoming!) 
4. Veracity (necessary & sufficient data to test many hypotheses) 
5. Value 
6. Variability 
7. Venue 
8. Vocabulary
The Data Scientist toolkit 
n It is a collection of mathematical, computational, scientific, 
and domain-specific methods, tools, and algorithms to be 
applied to Big Data for discovery, decision support, and 
data-to-knowledge transformation:… 
n Statistics 
n Data Mining (Machine Learning) & Analytics (KDD) 
n Data & Information Visualization 
n Semantics (Natural Language Processing, Ontologies) 
n Data-intensive Computing (e.g., Hadoop, Cloud, …) 
n Modeling & Simulation 
n Metadata for Indexing, Search, & Retrieval 
n Advanced Data Management & Data Structures 
n Domain-Specific Data Analysis Tools 
40
The 6 Commandments of Data Science 
(Based on “The 5 Fundamental Concepts of Data Science” : 
http://www.statisticsviews.com/details/feature/5459931/Five-Fundamental-Concepts-of-Data-Science.html) 
1. Begin with the end in mind (= goal-based, data-driven 
decision making, “knowledge discovery by design”) 
2. Data Science is Science (= hypothesis testing, and all that) 
3. Know thy data (= data profiling, unsupervised exploration) 
4. Love thy data (= including ugly data: skewed distributions, 
outliers, long & fat tails) 
5. Overfitting is a sin (= “models should be as simple as possible, 
but no simpler” ~ A.Einstein) 
6. Honor thy data’s first mile and last mile 
(a) The First Mile is the hardest. 
(ubiquitous heterogeneous data) 
(b) The Last Mile is the hardest. 
(actionable intelligence) 
http://www.datagovernance.com/cartoon_17.html
Questions to Actian Corporation: 
1. Most things in the world that are labeled “2.0” typically enable some sort of social 
experience or social networking characteristic. How is ‘Big Data 2.0’ like that, 
and how is it different? 
2. You talk about Unconstrained Analytics. That sounds like “Data Science Unleashed” 
– is that a reasonable analogy? How so? 
3. How important are visual cues and visual analytics in Actian’s Big Data 2.0 design 
and implementation? And how have you incorporated them? 
4. I/O bottlenecks (for data access and movement) are typically the most severe 
technological constraints in Big Data. How does Actian manage the big constraints 
imposed by big data inertia? 
5. Data Science is truly science insofar as it involves hypothesis generation, 
experimental design, testing, analysis, and hypothesis refinement – what are some 
of the unique ways that Actian empowers and enables a data scientist to perform 
different steps in this process? 
6. One solution to the Big Data and Data Scientist talent gap is to put powerful tools 
into schools and into the hands of students, and/or to provide financial incentives to 
students (e.g., scholarships). Is Actian planning any university programs like that? 
7. Some say that Big Data 3.0 will be based on the semantics, context, and meaning 
of data – does Actian have goals or a vision in this direction? 
8. What do you see as the next evolutionary step in Big Data Science?
Twitter Tag: #briefr 
The Briefing Room
This Month: BIG DATA 
March: CLOUD 
April: BIG DATA 
www.insideanalysis.com/webcasts/the-briefing-room 
Twitter Tag: #briefr 
The Briefing Room 
Upcoming Topics 
2014 Editorial Calendar at 
www.insideanalysis.com
Twitter Tag: #briefr 
THANK YOU 
for your 
ATTENTION! 
The Briefing Room

More Related Content

More from Inside Analysis

Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessInside Analysis
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationInside Analysis
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownInside Analysis
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security Inside Analysis
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataInside Analysis
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionInside Analysis
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsInside Analysis
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingInside Analysis
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLInside Analysis
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelInside Analysis
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureInside Analysis
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataInside Analysis
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseInside Analysis
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopInside Analysis
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldInside Analysis
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave DuggalInside Analysis
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyInside Analysis
 

More from Inside Analysis (20)

Agile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for SuccessAgile, Automated, Aware: How to Model for Success
Agile, Automated, Aware: How to Model for Success
 
First in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter IntegrationFirst in Class: Optimizing the Data Lake for Tighter Integration
First in Class: Optimizing the Data Lake for Tighter Integration
 
Fit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data LetdownFit For Purpose: Preventing a Big Data Letdown
Fit For Purpose: Preventing a Big Data Letdown
 
To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security To Serve and Protect: Making Sense of Hadoop Security
To Serve and Protect: Making Sense of Hadoop Security
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Introducing: A Complete Algebra of Data
Introducing: A Complete Algebra of DataIntroducing: A Complete Algebra of Data
Introducing: A Complete Algebra of Data
 
The Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop AdoptionThe Role of Data Wrangling in Driving Hadoop Adoption
The Role of Data Wrangling in Driving Hadoop Adoption
 
Ahead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time AnalyticsAhead of the Stream: How to Future-Proof Real-Time Analytics
Ahead of the Stream: How to Future-Proof Real-Time Analytics
 
All Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of EverythingAll Together Now: Connected Analytics for the Internet of Everything
All Together Now: Connected Analytics for the Internet of Everything
 
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETLGoodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
Goodbye, Bottlenecks: How Scale-Out and In-Memory Solve ETL
 
The Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global LevelThe Biggest Picture: Situational Awareness on a Global Level
The Biggest Picture: Situational Awareness on a Global Level
 
Structurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your ArchitectureStructurally Sound: How to Tame Your Architecture
Structurally Sound: How to Tame Your Architecture
 
SQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the RiskSQL In Hadoop: Big Data Innovation Without the Risk
SQL In Hadoop: Big Data Innovation Without the Risk
 
The Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big DataThe Perfect Fit: Scalable Graph for Big Data
The Perfect Fit: Scalable Graph for Big Data
 
A Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data WarehouseA Revolutionary Approach to Modernizing the Data Warehouse
A Revolutionary Approach to Modernizing the Data Warehouse
 
The Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of HadoopThe Maturity Model: Taking the Growing Pains Out of Hadoop
The Maturity Model: Taking the Growing Pains Out of Hadoop
 
Rethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile WorldRethinking Data Availability and Governance in a Mobile World
Rethinking Data Availability and Governance in a Mobile World
 
DisrupTech - Dave Duggal
DisrupTech - Dave DuggalDisrupTech - Dave Duggal
DisrupTech - Dave Duggal
 
Modus Operandi
Modus OperandiModus Operandi
Modus Operandi
 
Phasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey MalafskyPhasic Systems - Dr. Geoffrey Malafsky
Phasic Systems - Dr. Geoffrey Malafsky
 

Recently uploaded

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Step by Step – A Process for Building Analytical Insights

  • 1. Grab some coffee and enjoy the pre-show banter before the top of the hour!
  • 2. Step By Step – A Process for Building Analytical Insights The Briefing Room
  • 3. Twitter Tag: #briefr The Briefing Room Welcome Host: Eric Kavanagh eric.kavanagh@bloorgroup.com @eric_kavanagh
  • 4. ! Reveal the essential characteristics of enterprise software, good and bad ! Provide a forum for detailed analysis of today’s innovative technologies ! Give vendors a chance to explain their product to savvy analysts ! Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room Mission
  • 5. Twitter Tag: #briefr The Briefing Room Topics This Month: BIG DATA March: CLOUD April: BIG DATA 2014 Editorial Calendar at www.insideanalysis.com/webcasts/the-briefing-room
  • 6. Twitter Tag: #briefr The Briefing Room Big Data “In God we trust. All others must bring data.” ~W. Edwards Deming, Statistician
  • 7. Twitter Tag: #briefr The Briefing Room Analyst: Kirk Borne Kirk Borne is a Transdisciplinary Data Scientist and an Astrophysicist. He is Professor of Astrophysics and Computational Science at George Mason University. He has been at Mason since 2003, where he does research, teaches, and advises students in the Data Science program. Previously, he spent nearly 20 years in positions supporting NASA projects, including an assignment as NASA's Data Archive Project Scientist for the Hubble Space Telescope, and as Project Manager in NASA's Space Science Data Operations Office. He has extensive experience in big data and data science and he is on the editorial boards of several scientific research journals and is an officer in several national and international professional societies devoted to data science, data mining, and statistics. He has published over 200 articles (research papers, conference papers, and book chapters), and given over 200 invited talks at conferences and universities worldwide. @KirkDBorne http: //kirkborne.net
  • 8. Twitter Tag: #briefr The Briefing Room Actian ! Actian is a database and software development company ! The Actian Analytics Platform connects to data and Big Data sources to perform actionable and advanced analytics ! The platform is comprised of Actian DataFlow (formerly Pervasive DataRush), Actian Matrix (formerly ParAccel) and Actian Vector
  • 9. Twitter Tag: #briefr The Briefing Room Guest: John Santaferraro John Santaferraro is the Vice President of Product Marketing at Actian. Prior to joining Actian, Santaferraro was an independent industry analyst in the business intelligence and analytics market. Before that he developed and executed a vertical market strategy for Hewlett Packard's BI group, focusing on energy, communications, retail, healthcare and financial services; he was also instrumental in helping establish HP’s new BI business group with a combination of solutions, products and consulting. In 2000, John founded a marketing and sales consulting company, Ferraro Consulting, providing business acceleration strategy for technology companies.
  • 10. Suppor'ng the Data Scien'st Accelera'ng Big Data 2.0 John Santaferraro – VP of Solu'ons and Product Marke'ng Confiden'al © 2014 10 Ac'an Corpora'on
  • 11. Only the Privileged Excel in Big Data Analy'cs Confiden'al © 2014 11 Ac'an Corpora'on Data Value
  • 12. The “Moneyball” Effect ! Analy'cs Go Mainstream ■ Major League Baseball ■ Hire the best team ■ NSA and Big Data ■ ??????????????? ■ Target and Pregnancy ■ Predic'ng pregnancies Confiden'al © 2014 12 Ac'an Corpora'on
  • 13. What is a Data Scien'st? Confiden'al © 2014 13 Ac'an Corpora'on
  • 14. What is a Data Scien'st? A data scien'st “…incorporates varying elements and builds on techniques and theories from many fields, including mathema'cs, sta's'cs, data engineering, paZern recogni'on and learning, advanced compu'ng, visualiza'on, uncertainty modeling, data warehousing, and high performance compu'ng with the goal of extrac'ng meaning from data and crea'ng data products.” Confiden'al © 2014 14 Ac'an Corpora'on Created by Calvin Andrus, depicts a mash-up of disciplines from which Data Science is derived, 13 July http://en.wikipedia.org/wiki/Data_science 2012
  • 15. Data Science Challenges Confiden'al © 2014 15 Ac'an Corpora'on Less than 20% of data scien'sts have the data and compute power they need to do their jobs The average data scien'st spends 70% of their 'me finding data, manipula'ng data, and wai'ng for queries to run 15 More than 60% of all data scien'sts working Hadoop are s'll trying to created a business case
  • 16. What is a Business Scien'st? “A business scien'st is an expert in the science of business, si]ng between the business analyst and the data scien'st, pulling together cross-­‐ func'onal exper'se from data science, analy'cs, business applica'ons, business processes, and Confiden'al © 2014 16 Ac'an Corpora'on business strategy.
  • 17. Business Science Skillset Understand How Analytics Work Understand Emerging Data Types Understand Business Operations & Strategy Learn Quickly Think Outside the Box Tell Compelling Stories Confiden'al © 2014 17 Ac'an Corpora'on
  • 18. The Tools of the Business Scien'st Confiden'al © 2014 18 Ac'an Corpora'on ! Libraries of Analy'c Func'ons Run at Extreme Speed ■ Transforma'onal Analy'cs ■ Sta's'cal Analy'cs ■ Machine Learning Analy'cs ■ Clustering Analy'cs ■ Discovery Analy'cs ! Visual Framework for Data Discovery, Prepara'on and Analy'cs ■ Drag and Drop Interac'on ■ Libraries of Data Prepara'on Operators ■ Libraries of Analy'c Operators ■ High-­‐Performance, Parallel Processing on Hadoop (or other file systems)
  • 19. The Ac'an Analy'cs Pladorm: Accelera'ng Big Data 2.0TM Data Value WWW Machine Confiden'al © 2014 19 Data Ac'an Corpora'on Customer Delight Competitive Advantage Extreme Agility Act Connect Extreme Scale Extreme Performance Actian Analytics PlatformTM Analyze Actian Analytics Accelerators Accelerate Hadoop Accelerate Analytics Accelerate Business Intelligence Enterprise Applications Data Warehouse Social Internet of Things SaaS Mobile World-Class Risk Management Disruptive New Business Models Traditional NoSQL
  • 20. Ac'an Analy'cs Pladorm: The High Performance Exoskeleton for Hadoop Amazon Redshift Confiden'al © 2014 20 Ac'an Corpora'on Actian Analytics PlatformTM Select From Libraries of Analytics Manage Data Flows and Deliver Data Services Hadoop Move Into a Exchange Data and Workloads High Performance Analytic Engine for Low Latency Connect to Any Data Source Enterprise Data Machine Data Social Data Users Machines Business Processes Applications Data Warehouse Deliver Analytic Services SaaS Data
  • 21. Ac'an Analy'cs Pladorm: The High Performance Exoskeleton for Hadoop Confiden'al © 2014 21 Ac'an Corpora'on Actian Analytics PlatformTM Actian AnalyticsTM Actian DataFlowTM Hadoop Actian MatrixTM On Demand Integration Actian DataConnectTM Enterprise Data Machine Data Social Data Users Machines Business Processes Applications Data Warehouse On Demand Analytic Services Actian VectorTM SaaS Data • Visual, drag and drop interface for all data management on Hadoop • High performance data management and analytics natively on HDFS • SQL access to Hadoop data for low latency analytics • High speed data transfer across relational and non-relational Amazon Redshift
  • 22. Ac'an Analy'cs Pladorm – High Performance Data Management and Analy'cs Na'vely on HDFS Confiden'al © 2014 22 Ac'an Corpora'on Take processing to where the data lives, runs natively on any Hadoop distribution Confidential © 2014 Actian Corporation 22 Ac)an Analy)cs Pla0orm Query Pipelining CPU Pipelining Optimized, On-HDFS Processing Hadoop – Leader Node Reuse and share all components from operators to workflows Optimize Choose from five sets of operators: Connections Transformation Data Quality Analytics Data Science Automatically detect resources, plan optimal utilization, and parallelize all workloads on Hadoop Use dual pipeline parallelism to accelerate performance 30X Run fully optimized processing directly on the Hadoop node via YARN Visual Framework Manage the entire analytic process in a visual framework with no coding required. ≠ ☼ ≡ ∞ Δ Σ √ ≈ Σ = ? # ~ ‰
  • 23. Ac'an Analy'cs Pladorm – High Performance, Low Latency Analy'cs on Hadoop Data Confiden'al © 2014 23 Ac'an Corpora'on Confidential © 2014 Actian Corporation 23 On-Demand Analytics On-Demand Integration Orchestration Analytic Libraries Optimizer LEADER NODE 700+ in-database, analytic functions Massively Parallel Columnar Compressed Compiled Connected Node-to-node, bi-directional sharing of analytics & processes with Hadoop nodes Serve up high-performance analytic processing for any app Connect to any data source at the point of the query Manage data flows across the entire analytic process 5 LEVELS OF OPTIMIZATION: SQL Planning Execution Communications Memory H H H H H H H H H H H H
  • 24. Ac'an Analy'cs Pladorm: High Speed Interac'on Between Rela'onal and Non-­‐Rela'onal Amazon Redshift Confiden'al © 2014 24 Ac'an Corpora'on Actian Analytics PlatformTM Actian AnalyticsTM Hadoop Actian MatrixTM Node-to-Node Connection Actian DataConnectTM Enterprise Data Machine Data Social Data Users Machines Business Processes Applications Data Warehouse On Demand Analytic Services Actian DataFlowTM SaaS Data On Demand Integration
  • 25. Ac'an Analy'cs Pladorm: Deep Integra'on for High Performance SQL Analy'cs Amazon Redshift Confiden'al © 2014 25 Ac'an Corpora'on Actian Analytics PlatformTM Actian AnalyticsTM Hadoop Actian MatrixTM Actian DataConnectTM Enterprise Data Machine Data Social Data Users Machines Business Processes Applications Data Warehouse On Demand Analytic Services Actian DataFlowTM SaaS Data On Demand Integration HCatalog Hive
  • 26. Ac'an Analy'cs Pladorm: Deep Integra'on for High Performance SQL Analy'cs Amazon Redshift Confiden'al © 2014 26 Ac'an Corpora'on Actian Analytics PlatformTM Actian AnalyticsTM Hadoop Actian MatrixTM Actian DataConnectTM Enterprise Data Machine Data Social Data Users Machines Business Processes Applications Data Warehouse On Demand Analytic Services Actian DataFlowTM SaaS Data On Demand Integration SQL, Python, Java
  • 27. Ac'an Analy'cs Pladorm: The High Performance Exoskeleton for Hadoop Amazon Redshift Confiden'al © 2014 27 Ac'an Corpora'on Actian Analytics PlatformTM Select From Libraries of Analytics Manage Data Flows and Deliver Data Services Hadoop Move Into a Exchange Data and Workloads High Performance Analytic Engine for Low Latency Connect to Any Data Source Enterprise Data Machine Data Social Data Users Machines Business Processes Applications Data Warehouse Deliver Analytic Services SaaS Data
  • 28. A Tradi'onal Approach to Churn Analysis Account Info and Demographics Confiden'al © 2014 28 Ac'an Corpora'on CRM CONNECT ANALYZE ACT LOGISTIC REGRESSION LONG MODEL TURNS LIMITED MODEL INPUTS MINIMUM ACCURACY GROUP DERIVE FIELDS CUSTOMER CHURN PREDICTION PRE-SET VISUALIZATIONS SMALL INCREASE IN EXISTING CUST REVENUE LIMITED HISTORICAL DATA PULLS
  • 29. An Enriched Approach to Churn Analysis Account Info and Demographics Mobile Device Mgmt Customer and Network Call Quality Geospatial Dimensions Confiden'al © 2014 29 Ac'an Corpora'on CRM CONNECT ANALYZE ACT JOIN AGGREGATE DECREASED PROVIDER FEES FILE PARSER GEOSPATIAL NETWORK ANALYSIS FAST NETWORK ISSUE ALERTS CDR Logs FILE PARSER FILE PARSER LOGISTIC REGRESSION GROUP DERIVE FIELDS CDR Logs JOIN GROUP DERIVE FIELDS CUSTOMER CHURN PREDICTION WITH TARGETED CUSTOMER CONTACT LIST SIGNIFICANT INCREASE IN EXISTING CUST REVENUE Device Data SFDC Event Filter IMPROVED INDUSTRY CUST SATISFACTION SCORES
  • 30. An Expanding Approach to Churn Analysis Account Info and Demographics Mobile Device Mgmt Competitive Offerings Customer and Network Call Quality Geospatial Dimensions Confiden'al © 2014 30 FILE PARSER Ac'an Corpora'on CRM CONNECT ANALYZE ACT JOIN AGGREGATE DECREASED PROVIDER FEES FILE PARSER GEOSPATIAL NETWORK ANALYSIS FAST NETWORK CDR Logs ISSUE ALERTS FILE PARSER LOGISTIC REGRESSION GROUP DERIVE FIELDS CDR Logs JOIN GROUP DERIVE FIELDS CUSTOMER CHURN PREDICTION WITH TARGETED CUSTOMER CONTACT LIST SIGNIFICANT INCREASE IN EXISTING CUST REVENUE Device Data SFDC Event Filter IMPROVED CUSTOMER SATISFACTION SCORES MARKET DATA FILE PARSER
  • 31. Ques'ons Amazon Redshift Confiden'al © 2014 31 Ac'an Corpora'on Actian Analytics PlatformTM Actian AnalyticsTM Hadoop Actian MatrixTM Actian DataConnectTM Enterprise Data Machine Data Social Data Users Machines Business Processes Applications Data Warehouse On Demand Analytic Services Actian DataFlowTM SaaS Data On Demand Integration SQL, Python, Java
  • 32. Thank You www.ac'an.com facebook.com/ac'ancorp @ac'ancorp Confiden'al © 2014 32 Ac'an Corpora'on
  • 33. Disclaimer This document is for informa'onal purposes only and is subject to change at any 'me without no'ce. The informa'on in this document is proprietary to Ac'an and no part of this document may be reproduced, copied, or transmiZed in any form or for any purpose without the express prior wriZen permission of Ac'an. This document is not intended to be binding upon Ac'an to any par'cular course of business, pricing, product strategy, and/or development. Ac'an assumes no responsibility for errors or omissions in this document. Ac'an shall have no liability for damages of any kind including without limita'on direct, special, indirect, or consequen'al damages that may result from the use of these materials. Ac'an does not warrant the accuracy or completeness of the informa'on, text, graphics, links, or other items contained within this material. This document is provided without a warranty of any kind, either express or implied, including but not limited to the implied warran'es of merchantability, fitness for a par'cular purpose, or non-­‐infringement. Confiden'al © 2014 33 Ac'an Corpora'on
  • 34. Twitter Tag: #briefr The Briefing Room Perceptions & Questions Analyst: Kirk Borne
  • 35. Data Science for Everything Kirk Borne @KirkDBorne School of Physics, Astronomy, & Computational Sciences College of Science, George Mason University, Fairfax, VA
  • 36. Let us start with a Big Data Quiz … Complete this sentence: Big Data is … a) the new oil. b) the new black. c) the new bacon. d) sexy. e) everything, quantified and tracked! f) All of the above
  • 37. Definitions of Big Data From Wikipedia: • Big Data refers to any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. . My suggestion: • Big Data refers to “Everything, Quantified and Tracked!” • Examples: – Smart Cities – Retail Analytics – Personalized Healthcare (myDNA) – Cybersecurity – National Security – Big Data Science Projects – Social Networks – IoT = Internet of Things – M2M = Machine-to-Machine – … everything!
  • 38. Rationale for Big Data Science • If we collect a thorough set of parameters (high-dimensional data) for a complete set of items within our domain of study, then we would have a “perfect” statistical model for that domain. • In other words, Big Data becomes the model for a domain X = we call this X-informatics. • Anything we want to know about that domain is specified and encoded within the data. • The goal of Big Data Science is to find those encodings, patterns, and knowledge nuggets. • See article by IBM’s James Kobielus: “Big-Data Vision? Whole-population analytics” at http://bit.ly/QB0uYi
  • 39. Characterizing and Exposing the Big Data Hype: 3 V’s or ? n If the only distinguishing characteristic was that we have lots of data, we would call it “Lots of Data” (or a Tonnabytes!) n Big Data characteristics: the 3+n V’s = 1. Volume (lots of data = “Tonnabytes”) 2. Variety (complexity, curse of dimensionality, many formats) 3. Velocity (high rate of data and information flow, real-time, incoming!) 4. Veracity (necessary & sufficient data to test many hypotheses) 5. Value 6. Variability 7. Venue 8. Vocabulary
  • 40. The Data Scientist toolkit n It is a collection of mathematical, computational, scientific, and domain-specific methods, tools, and algorithms to be applied to Big Data for discovery, decision support, and data-to-knowledge transformation:… n Statistics n Data Mining (Machine Learning) & Analytics (KDD) n Data & Information Visualization n Semantics (Natural Language Processing, Ontologies) n Data-intensive Computing (e.g., Hadoop, Cloud, …) n Modeling & Simulation n Metadata for Indexing, Search, & Retrieval n Advanced Data Management & Data Structures n Domain-Specific Data Analysis Tools 40
  • 41. The 6 Commandments of Data Science (Based on “The 5 Fundamental Concepts of Data Science” : http://www.statisticsviews.com/details/feature/5459931/Five-Fundamental-Concepts-of-Data-Science.html) 1. Begin with the end in mind (= goal-based, data-driven decision making, “knowledge discovery by design”) 2. Data Science is Science (= hypothesis testing, and all that) 3. Know thy data (= data profiling, unsupervised exploration) 4. Love thy data (= including ugly data: skewed distributions, outliers, long & fat tails) 5. Overfitting is a sin (= “models should be as simple as possible, but no simpler” ~ A.Einstein) 6. Honor thy data’s first mile and last mile (a) The First Mile is the hardest. (ubiquitous heterogeneous data) (b) The Last Mile is the hardest. (actionable intelligence) http://www.datagovernance.com/cartoon_17.html
  • 42. Questions to Actian Corporation: 1. Most things in the world that are labeled “2.0” typically enable some sort of social experience or social networking characteristic. How is ‘Big Data 2.0’ like that, and how is it different? 2. You talk about Unconstrained Analytics. That sounds like “Data Science Unleashed” – is that a reasonable analogy? How so? 3. How important are visual cues and visual analytics in Actian’s Big Data 2.0 design and implementation? And how have you incorporated them? 4. I/O bottlenecks (for data access and movement) are typically the most severe technological constraints in Big Data. How does Actian manage the big constraints imposed by big data inertia? 5. Data Science is truly science insofar as it involves hypothesis generation, experimental design, testing, analysis, and hypothesis refinement – what are some of the unique ways that Actian empowers and enables a data scientist to perform different steps in this process? 6. One solution to the Big Data and Data Scientist talent gap is to put powerful tools into schools and into the hands of students, and/or to provide financial incentives to students (e.g., scholarships). Is Actian planning any university programs like that? 7. Some say that Big Data 3.0 will be based on the semantics, context, and meaning of data – does Actian have goals or a vision in this direction? 8. What do you see as the next evolutionary step in Big Data Science?
  • 43. Twitter Tag: #briefr The Briefing Room
  • 44. This Month: BIG DATA March: CLOUD April: BIG DATA www.insideanalysis.com/webcasts/the-briefing-room Twitter Tag: #briefr The Briefing Room Upcoming Topics 2014 Editorial Calendar at www.insideanalysis.com
  • 45. Twitter Tag: #briefr THANK YOU for your ATTENTION! The Briefing Room