Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

KNIME Meetup 2016-04-16


Published on

Sponsored by Data Transformed, the KNIME Meetup was a big success. Please find the slides for Dan's, Tom's, Anand's and Chhitesh's presentations.
Registration & Networking
Keynote – Dan Cox, CEO of Data Transformed
KNIME & Harvest Analytics – Tom Park
Office of State Revenue Case Study – Anand Antony
Using Spark with KNIME – Chhitesh Shrestha
Networking & Drinks

Published in: Software
  • Be the first to comment

KNIME Meetup 2016-04-16

  1. 1. Creating Insights at the Speed of Business W. Daniel Cox, III CPA, CMA, CFM Chief Executive Officer
  2. 2. WELCOME to Meet Up Group
  3. 3. Energise Organisational Advantage through Awareness and Insight Registration & Networking Keynote – Dan Cox, CEO of Data Transformed KNIME & Harvest Analytics – Tom Park Office of State Revenue Case Study – Anand Antony Using Spark with KNIME – Chhitesh Shrestha Networking & Drinks
  4. 4. Journey to Best in Class Analytics We Help our Clients along this Path Time Value Proactive Discover and Predict Performers Reactive Monitor and Alert FollowersStatic Report and Drill-down Laggards Dynamic Analytics-enabled business processes Innovators
  5. 5. YOUR DATA. CLEARLY Source Your Data Realise Data Value Prepare Your Data Data Preparation Plan With Data Budget/Planning Visualise All Data Visualisation
  6. 6. BUDGET PLANNING Budgeting Forecasting Planning Demand Planning Workforce Management Accounting Financing Cashflow Sales Forecasting Modelling Campaign Forecasting DATA PREPARATION Data Governance Data Quality Master Data Management Data Warehousing Data Science ETL Applications Data Analytics SQL Language Python Language Scripting Database Management Application Development Database Development Textual ETL Text Analytics Hadoop Ecosphere Analytical Databases Relational Databases Microsoft Analysis Server OLAP OLTP Multi-Dimensional Databases Data Vault Architectures Star-Schema Architectures Data Marting Data Transformed Skill Sets VISUALISATION 30% BUDGET PLANNING 20% DATA PREPARATION 50% VISUALISATION Dashboarding Reporting Charting Location Analytics Statistical Analytics Data Analytics Business Analysis Story Telling Symmantic Layer Presentation Layer Collabration
  7. 7. Slow Fast Immature Industrial Strength EnterpriseReadiness Performance Good Enough Production Ready Traditional Operational Open Source Vortex Actian – Fast, Industrialized, Open Superior Big Data SQL with Industrialized strength
  8. 8. Do YOU Have a BIG DATA Role
  9. 9. Global Data Snapshot … 7,254,549,796 Total World Population 3,035,749,340 Internet Users 2,078,680,860 Active Social Network Users 6,572,950,124 Mobile Subscribers
  10. 10. • Challenges • Constrains data to app • Can’t manage new data • Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 44 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional Traditional systems under pressure 12 Zettabytes
  11. 11. Volume Exponential Growth Variety New Data Types Velocity Time To Value The Digital Floodgates have opened… and will never be turned off again
  12. 12. Big Data equals Big Opportunity Data Source & Type Untouched Value New Possibilities 88OF BIG DATA 15TRILLION $ Universal Access Time To Value OF COMPANIES % % 1
  13. 13. Trends for BIG DATA In the Cloud
  14. 14. Trends for BIG DATA Personal ETL
  15. 15. Trends for BIG DATA NoSQL
  16. 16. Trends for BIG DATA Hadoop
  17. 17. Trends for BIG DATA Data Lake
  18. 18. Trends for BIG DATA Ecosystem
  19. 19. Trends for BIG DATA Internet of Things
  20. 20. Big Data Trends 1. Big Data in the Cloud 2. Personal ETL 3. NoSQL 4. Hadoop 5. Data Lakes 6. Big Data Ecosystem 7. Internet of Things
  21. 21. BIG DATA is STILL just Data It needs to be translated into Answers
  22. 22. Acquire, Grow & Retain Customers Who are your best customers and how can you keep them satisfied? Where can you find more customers like them? Big data holds the insights into who your customers are and what motivates them.
  23. 23. Optimise Operations & Reduce Fraud Are your operational processes and systems as efficient as they could be? Could you reduce waste and fraud if you had real-time visibility into your business? Adopting a big data and analytics strategy can help you plan, manage and maximise operations, supply chains and the use of infrastructure assets.
  24. 24. Transform Financial Processes Do you have real-time access to reliable information about all aspects of your business? Do you have the visibility, insight and control over financial performance to better measure, monitor and shape business outcomes? Analysing all of your data, including big data, can drive enterprise agility and provide insights to help you make better decisions
  25. 25. Manage Risk How can you mitigate the financial and operational risks that could devastate your organisation? How can you manage regulatory change and reduce the risk of non-compliance? Proactively identifying, understanding and managing financial and operational risk can enable more risk-aware, confident decision making
  26. 26. Create New Business Models Are your competitors making bigger strides in changing your industry or creating new markets than you? Does your organisation’s culture support innovative thinking and exploration? Explore strategic options for business growth, using new perspectives gained from exploiting big data and analytics
  27. 27. Improve IT Economics Is your existing IT infrastructure able to provide the insights that decision makers need? Are you doing enough to protect your data centre and data from potential criminal activity or fraud? Lead the creation of new value and agility for your business by optimising big data and analytics for faster insight at a lower cost
  28. 28. Analytics Trends 1. Data Governance 2. Social Intelligence 3. Analytics Organisation-Wide 4. Community Collaboration 5. Integration of Everything 6. Cloud Analytics 7. Conversational Data 8. Journalism Data 9. Mature Mobility 10.Smart Analytics
  29. 29. Areas BIG DATA is Helping 1. Operations & Optimising 2. Product Development 3. Customer Experience 4. Understanding and Targeting Customers
  30. 30. Performance Examples Actian is Helping These Companies Achieve Leadership Digital Marketing: Hyper-segmentation every hour Banking: Enterprise Risk every 2 minutes Retail: Enterprise Market Basket Analysis every minute Defense: Network intrusion models every second Fraud: Adjustments every nano-second Amazon Redshift – Actian Matrix Cloud-based, Petabyte Scale Data Warehouse
  31. 31. The Value of Business Intelligence Organisations competing with Analytics Substantially OUTPERFORM their peers by 220%
  32. 32. Data Transformed
  33. 33. Actian Vector: Example Identical 150 Million Transaction Query Comparison between Actian Vector & Oracle DBMS
  34. 34. Harvest Analytics Tom Park
  35. 35. Overview KNIME & Big Data Tom Park
  36. 36. Gartner 2016 Magic Quadrant Advanced Analytics Platforms Niche Players (5): FICO Lavastorm Megaputer Prognoz Accenture Leaders (5): SAS IBM KNIME RapidMiner Dell Visionaries (4): Microsoft Alteryx Alpine Data Labs Predixion Challengers (2): SAP Angoss
  37. 37. Changes from 2015 to 2016 X Salford & TIBCO Dropped due to not satisfying the visual composition
  38. 38. Main Big Data Technologies NO SQL
  39. 39. Big Data Architecture
  40. 40. KNIME Big Data Extensions
  41. 41. Future Trends
  42. 42. Missing Ingredient to Success?
  43. 43.
  44. 44. Office of State Revenue Anand Antony
  45. 45. KNIME @ OSR Anand Antony Senior Data Analyst Operations Analytics and Intelligence Office of State Revenue Ph. 0414491765
  46. 46. OSR: Who are we?  As NSW’s principal revenue agency, OSR administers state taxation and revenue for, and on behalf of, the people of NSW ◦ Payroll tax ◦ Land tax ◦ Duties ◦ Grants such as First Home Benefits
  47. 47. Data Analytics Team: Who are we?  Operations Analytics & Intelligence is the analytics wing of the Operations Division in OSR ◦ Three teams – Business Intelligence, Data Analytics and Data Team  Data Analytics team consists of 10 analysts  Supports tax auditors by detecting possible non- compliant clients ◦ Via matching data from various sources and analysing them ◦ 60+ data sources
  48. 48. Data Analytics Scenario - Past  Data matching, preparation and analysis ◦ SPSS Clementine, SAS Enterprise Guide  Data mining ◦ Salford Systems  Reporting/Dashboards ◦ Excel  Fuzzy data matching ◦ SSA Name (Informatica)
  49. 49. Data Analytics Scenario - Current  Data matching, preparation and analysis ◦ KNIME (around 70% transitioned from Clementine/SAS)  Data mining ◦ Salford Systems ◦ Will be evaluating KNIME  Reporting/Dashboards ◦ Excel  Fuzzy data matching ◦ SSA Name (Informatica)
  50. 50. Internal&ExternalDataSources Data Governance Data Quality Data Matching Metadata Management MapR Hadoop Distribution Data Lake VortexMapR Advanced Data Analytics Actian/Knime Machine Learning H2O/ Spark Actian/Knime Future: Unified Analytic & Data Management Platform Governance Visualisation Presentation Layer Datamart On the fly / Sandpit Spotfire/ Tableau/ Graph DBs
  51. 51. Why KNIME?  Enrich with coding via coding snippets ◦ Mostly Java snippet at the moment  Start with canvas programming  Fast and easy learning curve for data scientists  Can tackle almost any analytic task
  52. 52. KNIME - Having the best of both worlds! ◦ Canvas programming  Coding
  53. 53. What do we use KNIME for?  Pretty much for everything! (except reporting and datamining) ◦ Data reading (text files, databases, non- standard formats) ◦ Data merging (potentially fuzzy matching too in future) ◦ Data manipulation ◦ Creating new variables ◦ Data Output ◦ Modelling (possibly in future)
  54. 54. Key nodes/functionalities ◦ Sorter, Column Reorder, Column Filter, Column Rename ◦ Concatenate, Joiner, Reference Row Filter (anti- join) ◦ Missing value ◦ Math Formula, String Manipulation, Rule Engine, Java Snippet ◦ GroupBy (aggregate, dedupe) ◦ Value Counter, Pivoting ◦ Looping ◦ Regular expressions/wildcards in various nodes
  55. 55. Data Preparation Example
  56. 56. Case study 1  Officers fill in a questionnaire on the entity audited – one excel spreadsheet for one entity  Collate all the spreadsheets stored in a location  Massage the data to produce an analysis dataset with one row per entity  Key KNIME nodes/functionalities used ◦ List files ◦ Table Row to Variable Loop Start, Loop End ◦ Java Snippet
  57. 57.  Questionnaire data for one client
  58. 58. Overview of Knime flow
  59. 59. Bring data to tabular form Within this Meta node, there is one Java Snippet for each question in the questionnaire
  60. 60. Details of a Java Snippet
  61. 61. Result of the Meta Node To get a single record for a client - Just take the last row for a “client block”! - Explained in the next slide
  62. 62. For each “client block” aggregate the variables
  63. 63. End result 1000 spread-sheets 1000 rows
  64. 64. Case study 2 – Use of Flow variables  Technique ◦ Input metadata rules into a file ◦ Read and convert into flow variables  Example ◦ Reorder variables in a dataset as per the order in the data dictionary ◦ We use “Flow variables” tab in Column Reorder tab to achieve this
  65. 65. Use of flow variables Use this tab Do not use this “manual” tab
  66. 66. KNIME wishlist!  Offset function in some nodes eg. Rule Engine, Math formula Offset function gives the value of a variable in a previous row. Eg. In SPSS Clementine @OFFSET(var,1) gives the value in the previous row. Note:- Within Java Snippet this is readily achieved since a variable retains its value until it is over-written. Therefore we can conveniently first utilise the value populated from the previous row inside a formula. Then we can update the value from the current row so as to be used in the next row.
  67. 67. Questions?
  68. 68. Data Transformed Chhitesh Shrestha
  69. 69. Apache Spark on KNIME Unleash the power of Big Data on Hadoop
  70. 70. The Big Data Problem: Data Volume 1. Storage are getting cheaper 2. Data sources are increasing 3. Thus, data is growing faster YARN But, Still processing them is a problem. Why ?
  71. 71. The Big Data Problem: Processing Now, as the memory is cheaper.
  72. 72. Why Apache Spark ? Apache Spark is an open source parallel processing framework that enables users to run large scale data analytics across clustered computers. • Speed • Flexible with programming platform • Generality • Run Everywhere
  73. 73. Spark Components
  74. 74. Spark Comparison on Calculation of Average
  75. 75. List of Spark Nodes
  76. 76. Getting the data in and out of Spark Data into Spark Data out of Spark
  77. 77. Statistics and Data Manipulation Nodes Statistics Data Manipulation
  78. 78. Mining Nodes Learners Predictors
  79. 79. Other Nodes
  80. 80. KNIME Spark Executor Architecture
  81. 81. Current Supported Hadoop and KNIME Versions Hadoop Versions • Hortonworks HDP 2.2 with Spark 1.2.x • Hortonworks HDP 2.3 with Spark 1.3.x • Cloudera CDH 5.3 with Spark 1.2.x • Cloudera CDH 5.4 with Spark 1.3.x KNIME Versions • KNIME Analytics Platform 3.1 • KNIME Server 4.2
  82. 82. Lots of talking… Lets view a demo
  83. 83. Data Transformed YOUR DATA. CLEARLY. 02 9956 3781
  84. 84. Actian Vortex on Hadoop 10 minute Demo Demonstration of Vortex, Dataflow & Vector Comparison between Actian Vortex & Cloudera Impala
  85. 85. Actian Vector: Example Identical 150 Million Transaction Query Comparison between Actian Vector & Oracle DBMS