• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data science lab   enabling flexibility
 

Data science lab enabling flexibility

on

  • 315 views

To empower Data Scientists, you need a Data Science Lab. Presented by Dr. Sharon Kirkham, director of the Kognitio Analytics Center of Excellence, a data scientist and expert on this topic. Join that ...

To empower Data Scientists, you need a Data Science Lab. Presented by Dr. Sharon Kirkham, director of the Kognitio Analytics Center of Excellence, a data scientist and expert on this topic. Join that session, as she explores the kind of environment required to imagine, create, experiment, develop and grow cutting-edge Big Data applications that add value

Statistics

Views

Total Views
315
Views on SlideShare
309
Embed Views
6

Actions

Likes
0
Downloads
2
Comments
0

1 Embed 6

https://twitter.com 6

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data science lab   enabling flexibility Data science lab enabling flexibility Presentation Transcript

    • The Data Science Lab: Enabling Flexible, Complex Analytics on a Single Platform @Kognitio #DataSci Follow the conversation on Twitter:
    • • Thank you for joining today’s session! • The web briefing will start momentarily. Slides available NOW at www.slideshare.net/kognitio Teleconference: Use your computer, or call: US +1 631 267 4890 Toll-Free 1-855-299-5224 Passcode: 841 203 797 Other global Dial-in numbers available at: https://kognitio.webex.com/kognitio/globalcallin.php - Web Briefing - The Data Science Lab: Enabling Flexible, Complex Analytics @Kognitio #DataSciFollow the conversation on Twitter: Today’s call will use the WebEx Q & A feature
    • @Kognitio #DataSci@Kognitio #DataSci Enabling Flexible, Complex Analytics on a single platform The Data Science Lab: Enabling Flexibility Demonstrations Summary, Question & Answer Session Presenters:  ‐ Dr. Sharon Kirkham, Data Scientist ‐ Michael Hiskey, Product Evangelist Web Briefing The Data Science Lab @Kognitio #DataSci Follow the conversation on Twitter: 3
    • @Kognitio #DataSci@Kognitio #DataSci Enabling Flexible, Complex Analytics on a single platform July 25, 2013 1. Data Accessibility • Hadoop • Data Mash‐Up 2. Analytical Productivity • MPP in‐memory code execution • R scripts with MPP 3. “Graduate” Projects to B.A.U. • Data Science and the Business Use Case Scenarios: The Data Science Lab POLL
    • @Kognitio #DataSci@Kognitio #DataSci Flexible Platform for Big Data Analytics Flexible data access Flexible processing Flexible deployment options Near-line Storage (optional) All BI Tools All OLAP Clients Excel Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage Analytical Platform Layer 5
    • Mature Business Intelligence & Reporting Numbers, tables, charts, indicators …accessed with ease and simplicity Historical information, latency BI tools have plateaued Decision Support Advanced analytics and data science More math…a lot more math 6
    • The Analytical Enterprise Business Analyst Systems Admin Data Scientist Sexiest job of the 21st Century? Key: “Graduation” • Projects will need to easily Graduate from the Data Science Lab and become part of Business as Usual 7
    • @Kognitio #DataSci@Kognitio #DataSci Telling a story with data Build, tune and run complex data projects Dealing with big data from multiple sources Must overcome IT bottlenecks Source: http://www.emc.com/microsites/bigdata/infographic.htm Data scientists are in demand: 8
    • @Kognitio #DataSci@Kognitio #DataSci Scenario 1: Data Accessibility ”… this exercise is to identify if improvements in data preparation can make a significant difference to the productivity and earning capacity of our analytics team” - Global Digital marketing analytics firm source: http://newvantage.com/wp-content/uploads/2012/12/NVP-Big-Data-Survey-Themes-Trends.pdf POLL
    • SQL querying on Hadoop Scenario 1: Data Accessibility
    • @Kognitio #DataSci@Kognitio #DataSci Summary: Data Accessibility Kognitio Hadoop Integration • Map/Reduce agent dynamically executes on all Hadoop nodes • Query passes selections, relevant predicates to the agents • Data filtering & projection locally on each node • Data filtered as it is read from file(s) • Only data of interest is transferred and loaded into memory via parallel load streams Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage 11
    • @Kognitio #DataSci@Kognitio #DataSci Scenario 2: Analytical Productivity “…want to see a significant improvement in the analytical throughput … from current time frame of 2 weeks … to no more than 1 day” - A marketing science analytics company “…we run much of our analytics on a 5% sample of the data. We want to be able to run on 100% of the data in the same time as the 5% sample.” - A leading Ad Agency Source: http://www.wired.com/insights/2013/07/the-new-horizon-for-bi-and-analytics/ POLL 12
    • Massively parallel in- memory code execution Scenario 2: Analytical Productivity
    • @Kognitio #DataSci@Kognitio #DataSci MPP in-memory code execution NoSQL external scripting function: • SQL provides standard data access framework – Open, adaptable framework; pass data to/from any executable or interpreter – Fully flexible MPP execution of R, Python, Java, text parsing libraries etc. create interpreter perlinterp command '/usr/bin/perl' sends 'csv' receives 'csv' ; select top 1000 words, count(*) from (external script using environment perlinterp receives (txt varchar(32000)) sends (words varchar(100)) script S'endofperl( while(<>) { chomp(); s/[,.!_]//g; foreach $c (split(/ /)) { if($c =~ /^[a-zA-Z]+$/) { print "$cn”} } } )endofperl' from (select comments from customer_enquiry))dt group by 1 order by 2 desc; From the Demo: This reads long comments text from customer enquiry table, in line Perl converts long text into output stream of words (one word per row), query selects top 1000 words by frequency using standard SQL aggregation
    • Accessing Analytics across the business Scenario #3: Barriers to Deployment
    • @Kognitio #DataSci@Kognitio #DataSci An Ideal Deployment Scenario Cloud model can provide a way to quickly model, experiment, develop and build • Deploy to existing reporting tools • Pass ownership to IT • Cloud instances can be “temporary” • Repeatable framework 2011 2010 Sep.3 Aug. Jul. Sep. Aug. 3,443,873 8.1 382,009 401,951 391,878 351,696 369,199 617,194 10.4 67,055 71,725 69,801 61,676 66,085 65,237 1.0 7,671 7,892 7,422 7,357 7,611 70,324 0.0 7,737 8,240 7,888 7,685 8,082 226,261 5.8 24,764 26,196 25,973 23,288 23,722 455,276 5.6 50,418 52,164 53,062 47,710 48,597 446,918 3.5 48,368 51,797 51,160 46,166 49,848 88,590 8.7 10,510 10,681 10,258 9,591 9,514 279,985 13.2 31,390 31,889 28,478 28,266 28,282 368,372 5.5 41,188 42,244 43,097 37,992 40,228 Not Adjusted 9 Month Total 2011 2010 * Business  Analyst Business  User IT Admin Data  Scientist PRESS HERE PRESS HERE…and really cool Big Data stuff happens! 16
    • @Kognitio #DataSci@Kognitio #DataSci It’s all about flexibility Flexible data access Flexible processing Flexible deployment options Near-line Storage (optional) All BI Tools All OLAP Clients Excel Hadoop Clusters Enterprise Data Warehouses Legacy Systems Kognitio Storage Reporting Cloud Storage 17
    • Question & Answer session will be conducted electronically, using the panel to the right of your screen Learn more, Stay connected: Free Download kognitio.com/GoTryIt Request a Meeting kognitio.com/meeting Take the Survey kognitio.com/DSL The Data Science Lab: Enabling Flexible, Complex Analytics