The Briefing Room with Dr. Kirk Borne and Actian
Live Webcast February 18, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=5be6631268cccc1e605b12b31b58ee08
Change is everywhere in the world of analytics these days, and not just due to Big Data. The maturation of parallel processing is transforming how data can be loaded, prepared and processed, which means the window of possibility has widened dramatically in terms of what can be done. This is good news for just about everyone, especially the dedicated business analyst, who can now accomplish in hours what used to take days or weeks.
Register for this episode of The Briefing Room to hear Data Science luminary Dr. Kirk Borne of George Mason University, as he describes the changing landscape of analytics. He'll be briefed by John Santaferraro of Actian, who will tout his company's analytical platform. While Santaferraro gives his talk, he'll be accompanied by a data analyst who will walk through a demonstration of the various steps for building analytical solutions.
Visit InsideAnlaysis.com for more information.
Step by Step – A Process for Building Analytical Insights
1. Grab some coffee and enjoy
the pre-show banter before
the top of the hour!
2. Step By Step – A Process for Building Analytical Insights
The Briefing Room
3. Twitter Tag: #briefr
The Briefing Room
Welcome
Host:
Eric Kavanagh
eric.kavanagh@bloorgroup.com
@eric_kavanagh
4. ! Reveal the essential characteristics of enterprise software,
good and bad
! Provide a forum for detailed analysis of today’s innovative
technologies
! Give vendors a chance to explain their product to savvy
analysts
! Allow audience members to pose serious questions... and get
answers!
Twitter Tag: #briefr
The Briefing Room
Mission
5. Twitter Tag: #briefr
The Briefing Room
Topics
This Month: BIG DATA
March: CLOUD
April: BIG DATA
2014 Editorial Calendar at
www.insideanalysis.com/webcasts/the-briefing-room
6. Twitter Tag: #briefr
The Briefing Room
Big Data
“In God we trust.
All others must bring
data.”
~W. Edwards Deming, Statistician
7. Twitter Tag: #briefr
The Briefing Room
Analyst: Kirk Borne
Kirk Borne is a Transdisciplinary Data Scientist and an
Astrophysicist. He is Professor of Astrophysics and
Computational Science at George Mason University.
He has been at Mason since 2003, where he does
research, teaches, and advises students in the Data
Science program. Previously, he spent nearly 20 years
in positions supporting NASA projects, including an
assignment as NASA's Data Archive Project Scientist
for the Hubble Space Telescope, and as Project
Manager in NASA's Space Science Data Operations
Office. He has extensive experience in big data and
data science and he is on the editorial boards of
several scientific research journals and is an officer in
several national and international professional
societies devoted to data science, data mining, and
statistics. He has published over 200 articles
(research papers, conference papers, and book
chapters), and given over 200 invited talks at
conferences and universities worldwide.
@KirkDBorne
http: //kirkborne.net
8. Twitter Tag: #briefr
The Briefing Room
Actian
! Actian is a database and software development company
! The Actian Analytics Platform connects to data and Big Data
sources to perform actionable and advanced analytics
! The platform is comprised of Actian DataFlow (formerly
Pervasive DataRush), Actian Matrix (formerly ParAccel) and
Actian Vector
9. Twitter Tag: #briefr
The Briefing Room
Guest: John Santaferraro
John Santaferraro is the Vice President of Product
Marketing at Actian. Prior to joining Actian, Santaferraro
was an independent industry analyst in the business
intelligence and analytics market. Before that he
developed and executed a vertical market strategy for
Hewlett Packard's BI group, focusing on energy,
communications, retail, healthcare and financial services;
he was also instrumental in helping establish HP’s new BI
business group with a combination of solutions, products
and consulting. In 2000, John founded a marketing and
sales consulting company, Ferraro Consulting, providing
business acceleration strategy for technology companies.
35. Data Science for
Everything
Kirk Borne
@KirkDBorne
School of Physics, Astronomy, & Computational Sciences
College of Science, George Mason University, Fairfax, VA
36. Let us start with a Big Data Quiz …
Complete this sentence: Big Data is …
a) the new oil.
b) the new black.
c) the new bacon.
d) sexy.
e) everything, quantified and tracked!
f) All of the above
37. Definitions of Big Data
From Wikipedia:
• Big Data refers to any
collection of data sets so large
and complex that it becomes
difficult to process using on-hand
database management
tools or traditional data
processing applications.
.
My suggestion:
• Big Data refers to
“Everything, Quantified
and Tracked!”
• Examples:
– Smart Cities
– Retail Analytics
– Personalized Healthcare (myDNA)
– Cybersecurity
– National Security
– Big Data Science Projects
– Social Networks
– IoT = Internet of Things
– M2M = Machine-to-Machine
– … everything!
38. Rationale for Big Data Science
• If we collect a thorough set of parameters (high-dimensional
data) for a complete set of items
within our domain of study, then we would have
a “perfect” statistical model for that domain.
• In other words, Big Data becomes the model for
a domain X = we call this X-informatics.
• Anything we want to know about that domain is
specified and encoded within the data.
• The goal of Big Data Science is to find those
encodings, patterns, and knowledge nuggets.
• See article by IBM’s James Kobielus: “Big-Data Vision?
Whole-population analytics” at http://bit.ly/QB0uYi
39. Characterizing and Exposing
the Big Data Hype: 3 V’s or ?
n If the only distinguishing characteristic was that we have lots
of data, we would call it “Lots of Data” (or a Tonnabytes!)
n Big Data characteristics: the 3+n V’s =
1. Volume (lots of data = “Tonnabytes”)
2. Variety (complexity, curse of dimensionality, many formats)
3. Velocity (high rate of data and information flow, real-time, incoming!)
4. Veracity (necessary & sufficient data to test many hypotheses)
5. Value
6. Variability
7. Venue
8. Vocabulary
40. The Data Scientist toolkit
n It is a collection of mathematical, computational, scientific,
and domain-specific methods, tools, and algorithms to be
applied to Big Data for discovery, decision support, and
data-to-knowledge transformation:…
n Statistics
n Data Mining (Machine Learning) & Analytics (KDD)
n Data & Information Visualization
n Semantics (Natural Language Processing, Ontologies)
n Data-intensive Computing (e.g., Hadoop, Cloud, …)
n Modeling & Simulation
n Metadata for Indexing, Search, & Retrieval
n Advanced Data Management & Data Structures
n Domain-Specific Data Analysis Tools
40
41. The 6 Commandments of Data Science
(Based on “The 5 Fundamental Concepts of Data Science” :
http://www.statisticsviews.com/details/feature/5459931/Five-Fundamental-Concepts-of-Data-Science.html)
1. Begin with the end in mind (= goal-based, data-driven
decision making, “knowledge discovery by design”)
2. Data Science is Science (= hypothesis testing, and all that)
3. Know thy data (= data profiling, unsupervised exploration)
4. Love thy data (= including ugly data: skewed distributions,
outliers, long & fat tails)
5. Overfitting is a sin (= “models should be as simple as possible,
but no simpler” ~ A.Einstein)
6. Honor thy data’s first mile and last mile
(a) The First Mile is the hardest.
(ubiquitous heterogeneous data)
(b) The Last Mile is the hardest.
(actionable intelligence)
http://www.datagovernance.com/cartoon_17.html
42. Questions to Actian Corporation:
1. Most things in the world that are labeled “2.0” typically enable some sort of social
experience or social networking characteristic. How is ‘Big Data 2.0’ like that,
and how is it different?
2. You talk about Unconstrained Analytics. That sounds like “Data Science Unleashed”
– is that a reasonable analogy? How so?
3. How important are visual cues and visual analytics in Actian’s Big Data 2.0 design
and implementation? And how have you incorporated them?
4. I/O bottlenecks (for data access and movement) are typically the most severe
technological constraints in Big Data. How does Actian manage the big constraints
imposed by big data inertia?
5. Data Science is truly science insofar as it involves hypothesis generation,
experimental design, testing, analysis, and hypothesis refinement – what are some
of the unique ways that Actian empowers and enables a data scientist to perform
different steps in this process?
6. One solution to the Big Data and Data Scientist talent gap is to put powerful tools
into schools and into the hands of students, and/or to provide financial incentives to
students (e.g., scholarships). Is Actian planning any university programs like that?
7. Some say that Big Data 3.0 will be based on the semantics, context, and meaning
of data – does Actian have goals or a vision in this direction?
8. What do you see as the next evolutionary step in Big Data Science?
44. This Month: BIG DATA
March: CLOUD
April: BIG DATA
www.insideanalysis.com/webcasts/the-briefing-room
Twitter Tag: #briefr
The Briefing Room
Upcoming Topics
2014 Editorial Calendar at
www.insideanalysis.com