SlideShare a Scribd company logo
The Data Science Lab:
Enabling Flexible,
Complex Analytics
on a Single Platform
@Kognitio
#DataSci
Follow the conversation on Twitter:
• Thank you for joining today’s session!
• The web briefing will start momentarily.
Slides available NOW at www.slideshare.net/kognitio
Teleconference:
Use your computer, or call:
US +1 631 267 4890
Toll-Free 1-855-299-5224
Passcode: 841 203 797
Other global Dial-in numbers available at:
https://kognitio.webex.com/kognitio/globalcallin.php
- Web Briefing -
The Data Science Lab:
Enabling Flexible, Complex Analytics
@Kognitio #DataSciFollow the conversation on Twitter:
Today’s call will use the
WebEx Q & A feature
@Kognitio #DataSci@Kognitio #DataSci
Enabling Flexible, Complex Analytics
on a single platform
The Data Science Lab: Enabling Flexibility
Demonstrations
Summary, Question & Answer Session
Presenters: 
‐ Dr. Sharon Kirkham, Data Scientist
‐ Michael Hiskey, Product Evangelist
Web Briefing
The Data Science Lab
@Kognitio
#DataSci
Follow the conversation
on Twitter:
3
@Kognitio #DataSci@Kognitio #DataSci
Enabling Flexible, Complex Analytics
on a single platform
July 25, 2013
1. Data Accessibility
• Hadoop
• Data Mash‐Up
2. Analytical Productivity
• MPP in‐memory code execution
• R scripts with MPP
3. “Graduate” Projects to B.A.U.
• Data Science and the Business
Use Case Scenarios:
The Data Science Lab
POLL
@Kognitio #DataSci@Kognitio #DataSci
Flexible Platform for Big Data Analytics
Flexible data
access
Flexible
processing
Flexible
deployment
options
Near-line
Storage
(optional)
All BI Tools All OLAP Clients Excel
Hadoop
Clusters
Enterprise Data
Warehouses
Legacy
Systems
Kognitio
Storage
Reporting
Cloud
Storage
Analytical
Platform
Layer
5
Mature Business Intelligence & Reporting
Numbers, tables, charts, indicators
…accessed with ease and simplicity
Historical information, latency
BI tools have plateaued
Decision Support
Advanced analytics and data science
More math…a lot more math
6
The Analytical Enterprise
Business
Analyst
Systems
Admin
Data
Scientist
Sexiest job of the 21st Century?
Key: “Graduation”
• Projects will need to easily Graduate
from the Data Science Lab and
become part of Business as Usual
7
@Kognitio #DataSci@Kognitio #DataSci
Telling a story with data
Build, tune and run
complex data projects
Dealing with big data
from multiple sources
Must overcome IT
bottlenecks
Source: http://www.emc.com/microsites/bigdata/infographic.htm
Data scientists are
in demand:
8
@Kognitio #DataSci@Kognitio #DataSci
Scenario 1: Data Accessibility
”… this exercise is to identify if
improvements in data preparation can
make a significant difference to the
productivity and earning capacity of our
analytics team”
- Global Digital marketing analytics firm
source: http://newvantage.com/wp-content/uploads/2012/12/NVP-Big-Data-Survey-Themes-Trends.pdf
POLL
SQL querying on
Hadoop
Scenario 1: Data Accessibility
@Kognitio #DataSci@Kognitio #DataSci
Summary: Data Accessibility
Kognitio Hadoop Integration
• Map/Reduce agent dynamically executes on
all Hadoop nodes
• Query passes selections, relevant predicates
to the agents
• Data filtering & projection locally on each node
• Data filtered as it is read from file(s)
• Only data of interest is transferred and loaded
into memory via parallel load streams
Hadoop
Clusters
Enterprise Data
Warehouses
Legacy
Systems
Kognitio
Storage
Reporting
Cloud
Storage
11
@Kognitio #DataSci@Kognitio #DataSci
Scenario 2: Analytical Productivity
“…want to see a significant
improvement in the analytical
throughput … from current
time frame of 2 weeks … to
no more than 1 day”
- A marketing science analytics company
“…we run much of our analytics
on a 5% sample of the data. We
want to be able to run on 100%
of the data in the same time as
the 5% sample.”
- A leading Ad Agency
Source: http://www.wired.com/insights/2013/07/the-new-horizon-for-bi-and-analytics/
POLL
12
Massively parallel in-
memory code execution
Scenario 2: Analytical Productivity
@Kognitio #DataSci@Kognitio #DataSci
MPP in-memory code execution
NoSQL external scripting function:
• SQL provides standard data access framework
– Open, adaptable framework; pass data to/from any
executable or interpreter
– Fully flexible MPP execution of R, Python, Java, text
parsing libraries etc.
create interpreter perlinterp
command '/usr/bin/perl' sends 'csv' receives 'csv' ;
select top 1000 words, count(*)
from (external script using environment perlinterp
receives (txt varchar(32000))
sends (words varchar(100))
script S'endofperl(
while(<>)
{
chomp();
s/[,.!_]//g;
foreach $c (split(/ /))
{ if($c =~ /^[a-zA-Z]+$/) { print "$cn”} }
}
)endofperl'
from (select comments from customer_enquiry))dt
group by 1
order by 2 desc;
From the Demo:
This reads long comments text from
customer enquiry table, in line Perl
converts long text into output stream
of words (one word per row), query
selects top 1000 words by frequency
using standard SQL aggregation
Accessing Analytics
across the business
Scenario #3: Barriers to Deployment
@Kognitio #DataSci@Kognitio #DataSci
An Ideal Deployment Scenario
Cloud model can provide a way to quickly
model, experiment, develop and build
• Deploy to existing reporting tools
• Pass ownership to IT
• Cloud instances can be “temporary”
• Repeatable framework
2011 2010 Sep.3
Aug. Jul. Sep. Aug.
3,443,873 8.1 382,009 401,951 391,878 351,696 369,199
617,194 10.4 67,055 71,725 69,801 61,676 66,085
65,237 1.0 7,671 7,892 7,422 7,357 7,611
70,324 0.0 7,737 8,240 7,888 7,685 8,082
226,261 5.8 24,764 26,196 25,973 23,288 23,722
455,276 5.6 50,418 52,164 53,062 47,710 48,597
446,918 3.5 48,368 51,797 51,160 46,166 49,848
88,590 8.7 10,510 10,681 10,258 9,591 9,514
279,985 13.2 31,390 31,889 28,478 28,266 28,282
368,372 5.5 41,188 42,244 43,097 37,992 40,228
Not Adjusted
9 Month Total 2011 2010
*
Business 
Analyst
Business 
User
IT Admin
Data 
Scientist
PRESS
HERE
PRESS
HERE…and really cool Big Data stuff happens!
16
@Kognitio #DataSci@Kognitio #DataSci
It’s all about flexibility
Flexible data
access
Flexible
processing
Flexible
deployment
options
Near-line
Storage
(optional)
All BI Tools All OLAP Clients Excel
Hadoop
Clusters
Enterprise Data
Warehouses
Legacy
Systems
Kognitio
Storage
Reporting
Cloud
Storage
17
Question & Answer session will be conducted electronically,
using the panel to the right of your screen
Learn more, Stay connected:
Free Download
kognitio.com/GoTryIt
Request a Meeting
kognitio.com/meeting
Take the Survey
kognitio.com/DSL
The Data Science Lab:
Enabling Flexible, Complex Analytics

More Related Content

What's hot

Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
Boston Consulting Group
 
R meetup talk scaling data science with dgit
R meetup talk   scaling data science with dgitR meetup talk   scaling data science with dgit
R meetup talk scaling data science with dgit
Venkata Pingali
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
 
Question Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningQuestion Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep Learning
Lucidworks
 
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
Yahoo Developer Network
 
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
Vijay Srinivas Agneeswaran, Ph.D
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
Databricks
 
Neo4j Health Care & Life Sciences Workshop 2021
Neo4j Health Care & Life Sciences Workshop 2021Neo4j Health Care & Life Sciences Workshop 2021
Neo4j Health Care & Life Sciences Workshop 2021
Neo4j
 
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring ForresterThe Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
Neo4j
 
Threat Detection in Surveillance Videos
Threat Detection in Surveillance VideosThreat Detection in Surveillance Videos
Threat Detection in Surveillance Videos
Databricks
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
Boston Consulting Group
 
10 Step Guide to Analytics
10 Step Guide to Analytics10 Step Guide to Analytics
10 Step Guide to Analytics
Xtage Labs
 
IBM Watson
IBM WatsonIBM Watson
Knowledge Discovery in Production
Knowledge Discovery in ProductionKnowledge Discovery in Production
Knowledge Discovery in Production
André Karpištšenko
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
Experfy
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
Boston Consulting Group
 
mcubed london - data science at the edge
mcubed london - data science at the edgemcubed london - data science at the edge
mcubed london - data science at the edge
Simon Elliston Ball
 
Large-Scale Malicious Domain Detection with Spark AI
Large-Scale Malicious Domain Detection with Spark AILarge-Scale Malicious Domain Detection with Spark AI
Large-Scale Malicious Domain Detection with Spark AI
Databricks
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
Neo4j
 

What's hot (20)

Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
R meetup talk scaling data science with dgit
R meetup talk   scaling data science with dgitR meetup talk   scaling data science with dgit
R meetup talk scaling data science with dgit
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
Question Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep LearningQuestion Answering and Virtual Assistants with Deep Learning
Question Answering and Virtual Assistants with Deep Learning
 
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
ZettaVox: Content Mining and Analysis Across Heterogeneous Compute Clouds__Ha...
 
Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1Open problems big_data_19_feb_2015_ver_0.1
Open problems big_data_19_feb_2015_ver_0.1
 
Use Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data ClustersUse Machine Learning to Get the Most out of Your Big Data Clusters
Use Machine Learning to Get the Most out of Your Big Data Clusters
 
Neo4j Health Care & Life Sciences Workshop 2021
Neo4j Health Care & Life Sciences Workshop 2021Neo4j Health Care & Life Sciences Workshop 2021
Neo4j Health Care & Life Sciences Workshop 2021
 
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring ForresterThe Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
The Total Economic ImpactTM (TEI) of Neo4j, Featuring Forrester
 
Threat Detection in Surveillance Videos
Threat Detection in Surveillance VideosThreat Detection in Surveillance Videos
Threat Detection in Surveillance Videos
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
 
10 Step Guide to Analytics
10 Step Guide to Analytics10 Step Guide to Analytics
10 Step Guide to Analytics
 
IBM Watson
IBM WatsonIBM Watson
IBM Watson
 
Knowledge Discovery in Production
Knowledge Discovery in ProductionKnowledge Discovery in Production
Knowledge Discovery in Production
 
Big Data Certification
Big Data CertificationBig Data Certification
Big Data Certification
 
Cloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science TeamsCloud-native Enterprise Data Science Teams
Cloud-native Enterprise Data Science Teams
 
mcubed london - data science at the edge
mcubed london - data science at the edgemcubed london - data science at the edge
mcubed london - data science at the edge
 
Large-Scale Malicious Domain Detection with Spark AI
Large-Scale Malicious Domain Detection with Spark AILarge-Scale Malicious Domain Detection with Spark AI
Large-Scale Malicious Domain Detection with Spark AI
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
5. Building the Cancer Research Data Commons with Neo4j: The Bento Framework
 

Similar to Data science lab enabling flexibility

Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Alluxio, Inc.
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Big Data Aplications Meetup
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
Denodo
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
DATAVERSITY
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
DATAVERSITY
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio, Inc.
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
Kognitio
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Neotys_Partner
 
Product forecastingwebinar 20130417
Product forecastingwebinar 20130417Product forecastingwebinar 20130417
Product forecastingwebinar 20130417
Kognitio
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
GautamPopli1
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Denodo
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Big Data Value Association
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Denodo
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
Spark Summit
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
Travis Oliphant
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Precisely
 
Maciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The TradeMaciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The Trade
Codiax
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Alluxio, Inc.
 

Similar to Data science lab enabling flexibility (20)

Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
Accelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & AlluxioAccelerating workloads and bursting data with Google Dataproc & Alluxio
Accelerating workloads and bursting data with Google Dataproc & Alluxio
 
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
How Data Virtualization Puts Enterprise Machine Learning Programs into Produc...
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)Advanced Analytics and Machine Learning with Data Virtualization (India)
Advanced Analytics and Machine Learning with Data Virtualization (India)
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
Alluxio + Eckerson Webinar | Simplifying and Accelerating Data Access for AI/...
 
SQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query setSQL on Hadoop benchmarks using TPC-DS query set
SQL on Hadoop benchmarks using TPC-DS query set
 
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)
 
Product forecastingwebinar 20130417
Product forecastingwebinar 20130417Product forecastingwebinar 20130417
Product forecastingwebinar 20130417
 
Breed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptxBreed data scientists_ A Presentation.pptx
Breed data scientists_ A Presentation.pptx
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Democratizing AI with Apache Spark
Democratizing AI with Apache SparkDemocratizing AI with Apache Spark
Democratizing AI with Apache Spark
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
 
Maciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The TradeMaciej Marek (Philip Morris International) - The Tools of The Trade
Maciej Marek (Philip Morris International) - The Tools of The Trade
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Data science lab enabling flexibility

  • 1. The Data Science Lab: Enabling Flexible, Complex Analytics on a Single Platform @Kognitio #DataSci Follow the conversation on Twitter:
  • 2. • Thank you for joining today’s session! • The web briefing will start momentarily. Slides available NOW at www.slideshare.net/kognitio Teleconference: Use your computer, or call: US +1 631 267 4890 Toll-Free 1-855-299-5224 Passcode: 841 203 797 Other global Dial-in numbers available at: https://kognitio.webex.com/kognitio/globalcallin.php - Web Briefing - The Data Science Lab: Enabling Flexible, Complex Analytics @Kognitio #DataSciFollow the conversation on Twitter: Today’s call will use the WebEx Q & A feature
  • 3. @Kognitio #DataSci@Kognitio #DataSci Enabling Flexible, Complex Analytics on a single platform The Data Science Lab: Enabling Flexibility Demonstrations Summary, Question & Answer Session Presenters:  ‐ Dr. Sharon Kirkham, Data Scientist ‐ Michael Hiskey, Product Evangelist Web Briefing The Data Science Lab @Kognitio #DataSci Follow the conversation on Twitter: 3
  • 4. @Kognitio #DataSci@Kognitio #DataSci Enabling Flexible, Complex Analytics on a single platform July 25, 2013 1. Data Accessibility • Hadoop • Data Mash‐Up 2. Analytical Productivity • MPP in‐memory code execution • R scripts with MPP 3. “Graduate” Projects to B.A.U. • Data Science and the Business Use Case Scenarios: The Data Science Lab POLL
  • 5. @Kognitio #DataSci@Kognitio #DataSci Flexible Platform for Big Data Analytics Flexible data access Flexible processing Flexible deployment options Near-line Storage (optional) All BI Tools All OLAP Clients Excel Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage Analytical Platform Layer 5
  • 6. Mature Business Intelligence & Reporting Numbers, tables, charts, indicators …accessed with ease and simplicity Historical information, latency BI tools have plateaued Decision Support Advanced analytics and data science More math…a lot more math 6
  • 7. The Analytical Enterprise Business Analyst Systems Admin Data Scientist Sexiest job of the 21st Century? Key: “Graduation” • Projects will need to easily Graduate from the Data Science Lab and become part of Business as Usual 7
  • 8. @Kognitio #DataSci@Kognitio #DataSci Telling a story with data Build, tune and run complex data projects Dealing with big data from multiple sources Must overcome IT bottlenecks Source: http://www.emc.com/microsites/bigdata/infographic.htm Data scientists are in demand: 8
  • 9. @Kognitio #DataSci@Kognitio #DataSci Scenario 1: Data Accessibility ”… this exercise is to identify if improvements in data preparation can make a significant difference to the productivity and earning capacity of our analytics team” - Global Digital marketing analytics firm source: http://newvantage.com/wp-content/uploads/2012/12/NVP-Big-Data-Survey-Themes-Trends.pdf POLL
  • 10. SQL querying on Hadoop Scenario 1: Data Accessibility
  • 11. @Kognitio #DataSci@Kognitio #DataSci Summary: Data Accessibility Kognitio Hadoop Integration • Map/Reduce agent dynamically executes on all Hadoop nodes • Query passes selections, relevant predicates to the agents • Data filtering & projection locally on each node • Data filtered as it is read from file(s) • Only data of interest is transferred and loaded into memory via parallel load streams Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage 11
  • 12. @Kognitio #DataSci@Kognitio #DataSci Scenario 2: Analytical Productivity “…want to see a significant improvement in the analytical throughput … from current time frame of 2 weeks … to no more than 1 day” - A marketing science analytics company “…we run much of our analytics on a 5% sample of the data. We want to be able to run on 100% of the data in the same time as the 5% sample.” - A leading Ad Agency Source: http://www.wired.com/insights/2013/07/the-new-horizon-for-bi-and-analytics/ POLL 12
  • 13. Massively parallel in- memory code execution Scenario 2: Analytical Productivity
  • 14. @Kognitio #DataSci@Kognitio #DataSci MPP in-memory code execution NoSQL external scripting function: • SQL provides standard data access framework – Open, adaptable framework; pass data to/from any executable or interpreter – Fully flexible MPP execution of R, Python, Java, text parsing libraries etc. create interpreter perlinterp command '/usr/bin/perl' sends 'csv' receives 'csv' ; select top 1000 words, count(*) from (external script using environment perlinterp receives (txt varchar(32000)) sends (words varchar(100)) script S'endofperl( while(<>) { chomp(); s/[,.!_]//g; foreach $c (split(/ /)) { if($c =~ /^[a-zA-Z]+$/) { print "$cn”} } } )endofperl' from (select comments from customer_enquiry))dt group by 1 order by 2 desc; From the Demo: This reads long comments text from customer enquiry table, in line Perl converts long text into output stream of words (one word per row), query selects top 1000 words by frequency using standard SQL aggregation
  • 15. Accessing Analytics across the business Scenario #3: Barriers to Deployment
  • 16. @Kognitio #DataSci@Kognitio #DataSci An Ideal Deployment Scenario Cloud model can provide a way to quickly model, experiment, develop and build • Deploy to existing reporting tools • Pass ownership to IT • Cloud instances can be “temporary” • Repeatable framework 2011 2010 Sep.3 Aug. Jul. Sep. Aug. 3,443,873 8.1 382,009 401,951 391,878 351,696 369,199 617,194 10.4 67,055 71,725 69,801 61,676 66,085 65,237 1.0 7,671 7,892 7,422 7,357 7,611 70,324 0.0 7,737 8,240 7,888 7,685 8,082 226,261 5.8 24,764 26,196 25,973 23,288 23,722 455,276 5.6 50,418 52,164 53,062 47,710 48,597 446,918 3.5 48,368 51,797 51,160 46,166 49,848 88,590 8.7 10,510 10,681 10,258 9,591 9,514 279,985 13.2 31,390 31,889 28,478 28,266 28,282 368,372 5.5 41,188 42,244 43,097 37,992 40,228 Not Adjusted 9 Month Total 2011 2010 * Business  Analyst Business  User IT Admin Data  Scientist PRESS HERE PRESS HERE…and really cool Big Data stuff happens! 16
  • 17. @Kognitio #DataSci@Kognitio #DataSci It’s all about flexibility Flexible data access Flexible processing Flexible deployment options Near-line Storage (optional) All BI Tools All OLAP Clients Excel Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage 17
  • 18. Question & Answer session will be conducted electronically, using the panel to the right of your screen Learn more, Stay connected: Free Download kognitio.com/GoTryIt Request a Meeting kognitio.com/meeting Take the Survey kognitio.com/DSL The Data Science Lab: Enabling Flexible, Complex Analytics