KNIME Meetup 2016-04-16

664 views

Published on

Sponsored by Data Transformed, the KNIME Meetup was a big success. Please find the slides for Dan's, Tom's, Anand's and Chhitesh's presentations.
Agenda:
Registration & Networking
Keynote – Dan Cox, CEO of Data Transformed
KNIME & Harvest Analytics – Tom Park
Office of State Revenue Case Study – Anand Antony
Using Spark with KNIME – Chhitesh Shrestha
Networking & Drinks

Published in: Software
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
664
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
33
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • The cloud is everywhere, and we will continue to see adoption at extreme volumes. And big data is driving a lot of clouds growth: Revenues for the top 50 public cloud providers shot up 47% in Q4 of 2013 to $6.2B according to Technology Business Research. Amazon Redshift and Google Big Query are growing dramatically. Database players like Teradata are also jumping in the game.

    Snowflake
  • It has been suggested that 80% of an analyst’s time is spent on data prep, while only 20% is spent looking for insights. Enter the personal data cleansing tools focused on the analyst. Tools like Trifacta, Alteryx, Paxata and Informatica Rev are making data preparation easier to use with less technology and infrastructure required to support it.

    KNIME
  • Some may think that the jury is still deliberating, but NoSQL is making a mark in the industry. NoSQL was founded to provide scale, flexibility, and the ability to leverage large sets of data faster. Companies like MarkLogic, Casandra, Couchbase, and MongoDB are bringing new innovation to the SQL database market and are doing quite well with large production implementations in surprising places.
  • Whether you are of the belief that Hadoop will take over current database architecture, or there will be a mix of Hadoop and other styles of databases, one thing is clear, Hadoop is now a part of the big data architecture in many companies. The legacy data storage vendors have incorporated Hadoop into their architecture in one way or another. Some classical database providers have embraced the market leading Hadoop players like Teradata, SAP, and HP. Others, like IBM, have built their own flavor of Hadoop. Spark and Impala continue to mature, putting more pressure on the traditional stack. In any case, Hadoop looks like it is here to stay and is synonymous with big data architectures.
  • The concept of a big data lake, a large body of data that exists in a natural or unrefined state, is in early stages. This idea answers some fundamental questions around how to effectively store, manage and use the massive amounts of incoming data. The cutting edge companies Google and Facebook have developed useful ways to leverage the data lake, but should be considered early adopters. As it is, the data lake is still in a nascent concept, and we should expect to see advances in managing and securing the big data lake this year. And as Gartner points out, the data lake requires a new kind of management to be effective.
  • When new ways of doing things come about, it creates a new ecosystem around it. The same holds true for big data. We have new ways to store data, clean data, add content to data, bring in social media, analyze machine data, do deep analysis on data and, of course, visualize data. Over the next year we will see some surprising changes in the current ecosystem. Specifically, we will see MPP (Massively Parallel Processing) databases play a different and less prominent role.

    Actian Matrix (or more well known as Amazon Redshift)
  • Your Ford Fusion sends 250GB of data back to Ford, who in turn lets you know that something is wrong with your car. Sounds like fantasy, but hardware and semi-conductor companies are betting on it. Companies like Ford, GE, and Rolls Royce jet engines are just a few examples of companies investing in IoT. In 2015, we will see a greater use from manufacturers. Some technology companies like Cisco will create solutions around the concept to help manage the massive amounts of data.
  • The cloud is everywhere, and we will continue to see adoption at extreme volumes. And big data is driving a lot of clouds growth: Revenues for the top 50 public cloud providers shot up 47% in Q4 of 2013 to $6.2B according to Technology Business Research. Amazon Redshift and Google Big Query are growing dramatically. Database players like Teradata are also jumping in the game.

    It has been suggested that 80% of an analyst’s time is spent on data prep, while only 20% is spent looking for insights. Enter the personal data cleansing tools focused on the analyst. Tools like Trifacta, Alteryx, Paxata and Informatica Rev are making data preparation easier to use with less technology and infrastructure required to support it.

    Some may think that the jury is still deliberating, but NoSQL is making a mark in the industry. NoSQL was founded to provide scale, flexibility, and the ability to leverage large sets of data faster. Companies like MarkLogic, Casandra, Couchbase, and MongoDB are bringing new innovation to the SQL database market and are doing quite well with large production implementations in surprising places.

    Whether you are of the belief that Hadoop will take over current database architecture, or there will be a mix of Hadoop and other styles of databases, one thing is clear, Hadoop is now a part of the big data architecture in many companies. The legacy data storage vendors have incorporated Hadoop into their architecture in one way or another. Some classical database providers have embraced the market leading Hadoop players like Teradata, SAP, and HP. Others, like IBM, have built their own flavor of Hadoop. Spark and Impala continue to mature, putting more pressure on the traditional stack. In any case, Hadoop looks like it is here to stay and is synonymous with big data architectures.

    The concept of a big data lake, a large body of data that exists in a natural or unrefined state, is in early stages. This idea answers some fundamental questions around how to effectively store, manage and use the massive amounts of incoming data. The cutting edge companies Google and Facebook have developed useful ways to leverage the data lake, but should be considered early adopters. As it is, the data lake is still in a nascent concept, and we should expect to see advances in managing and securing the big data lake this year. And as Gartner points out, the data lake requires a new kind of management to be effective.

    When new ways of doing things come about, it creates a new ecosystem around it. The same holds true for big data. We have new ways to store data, clean data, add content to data, bring in social media, analyze machine data, do deep analysis on data and, of course, visualize data. Over the next year we will see some surprising changes in the current ecosystem. Specifically, we will see MPP (Massively Parallel Processing) databases play a different and less prominent role.

    Your Ford Fusion sends 250GB of data back to Ford, who in turn lets you know that something is wrong with your car. Sounds like fantasy, but hardware and semi-conductor companies are betting on it. Companies like Ford, GE, and Rolls Royce jet engines are just a few examples of companies investing in IoT. In 2015, we will see a greater use from manufacturers. Some technology companies like Cisco will create solutions around the concept to help manage the massive amounts of data.
  • The cloud is everywhere, and we will continue to see adoption at extreme volumes. And big data is driving a lot of clouds growth: Revenues for the top 50 public cloud providers shot up 47% in Q4 of 2013 to $6.2B according to Technology Business Research. Amazon Redshift and Google Big Query are growing dramatically. Database players like Teradata are also jumping in the game.

    It has been suggested that 80% of an analyst’s time is spent on data prep, while only 20% is spent looking for insights. Enter the personal data cleansing tools focused on the analyst. Tools like Trifacta, Alteryx, Paxata and Informatica Rev are making data preparation easier to use with less technology and infrastructure required to support it.

    Some may think that the jury is still deliberating, but NoSQL is making a mark in the industry. NoSQL was founded to provide scale, flexibility, and the ability to leverage large sets of data faster. Companies like MarkLogic, Casandra, Couchbase, and MongoDB are bringing new innovation to the SQL database market and are doing quite well with large production implementations in surprising places.

    Whether you are of the belief that Hadoop will take over current database architecture, or there will be a mix of Hadoop and other styles of databases, one thing is clear, Hadoop is now a part of the big data architecture in many companies. The legacy data storage vendors have incorporated Hadoop into their architecture in one way or another. Some classical database providers have embraced the market leading Hadoop players like Teradata, SAP, and HP. Others, like IBM, have built their own flavor of Hadoop. Spark and Impala continue to mature, putting more pressure on the traditional stack. In any case, Hadoop looks like it is here to stay and is synonymous with big data architectures.

    The concept of a big data lake, a large body of data that exists in a natural or unrefined state, is in early stages. This idea answers some fundamental questions around how to effectively store, manage and use the massive amounts of incoming data. The cutting edge companies Google and Facebook have developed useful ways to leverage the data lake, but should be considered early adopters. As it is, the data lake is still in a nascent concept, and we should expect to see advances in managing and securing the big data lake this year. And as Gartner points out, the data lake requires a new kind of management to be effective.

    When new ways of doing things come about, it creates a new ecosystem around it. The same holds true for big data. We have new ways to store data, clean data, add content to data, bring in social media, analyze machine data, do deep analysis on data and, of course, visualize data. Over the next year we will see some surprising changes in the current ecosystem. Specifically, we will see MPP (Massively Parallel Processing) databases play a different and less prominent role.

    Your Ford Fusion sends 250GB of data back to Ford, who in turn lets you know that something is wrong with your car. Sounds like fantasy, but hardware and semi-conductor companies are betting on it. Companies like Ford, GE, and Rolls Royce jet engines are just a few examples of companies investing in IoT. In 2015, we will see a greater use from manufacturers. Some technology companies like Cisco will create solutions around the concept to help manage the massive amounts of data.
  • Acquire, grow and retain customers:
    Who are your best customers and how can you keep them satisfied?  Where can you find more customers like them? 
    Big data holds the insights into who your customers are and what motivates them. Analysing big data can help you discover ways to improve customer interactions, add value and build relationships that last.
  • Optimise operations and reduce fraud:
    Are your operational processes and systems as efficient as they could be?  Could you reduce waste and fraud if you had real-time visibility into your business?  Adopting a big data and analytics strategy can help you plan, manage and maximise operations, supply chains and the use of infrastructure assets. Gain the insights you need to reduce costs, increase efficiencies and productivity, and limit threats.
  • Transform financial processes
    Do you have real-time access to reliable information about all aspects of your business?  Do you have the visibility, insight and control over financial performance to better measure, monitor and shape business outcomes?  Analysing all of your data, including big data, can drive enterprise agility and provide insights to help you make better decisions
  • Manage risk
    How can you mitigate the financial and operational risks that could devastate your organisation?  How can you manage regulatory change and reduce the risk of non-compliance?  Proactively identifying, understanding and managing financial and operational risk can enable more risk-aware, confident decision making.
  • Create new business models
    Are your competitors making bigger strides in changing your industry or creating new markets than you?  Does your organisation’s culture support innovative thinking and exploration?  Explore strategic options for business growth, using new perspectives gained from exploiting big data and analytics.
  • Improve IT economics
    Is your existing IT infrastructure able to provide the insights that decision makers need?  Are you doing enough to protect your data centre and data from potential criminal activity or fraud?  Lead the creation of new value and agility for your business by optimising big data and analytics for faster insight at a lower cost.
  • Just as the business intelligence landscape has transformed to self-service data, so too must governance transform. Simple approaches like locking down all enterprise data won’t work any longer—nor will the approach of doing away with any process at all. Organizations will begin to investigate what governance means in a world of self-service analytics.

    In 2014 we saw organizations begin to analyze social data in earnest. In 2015, the leading edge will start to take advantage of their capabilities. Tracking conversations at scale via social will let companies find out when a topic is starting to trend and what their customers are talking about. Social analytics will open the door to responsive product optimization.

    Today’s data analyst may be an operations manager, a supply chain executive or even a salesperson. New, easier to use technologies that provide browserbased analytics let people answer ad-hoc business questions. Companies that recognize this as a strategic advantage will begin to support everyday analysts with data, tools and training to help them do what they’re doing.

    The consumerization of IT is no longer theoretical, it’s a fact. People use products that they enjoy using, and analytics software is no different. Companies whose products inspire and empower are seeing their communities flourish. And prospective customers will also look to the health of product communities as important proof points in crowded marketplaces.

    The last 10 years have seen a massive amount of innovation across the data space, resulting in mixed environments for everything from data storage to analytics to business applications. We won’t see a return to the age of monolithic systems. However, organizations are losing patience with multiple logins and clunky processes to move and manage data. Rapid integration leveraging simple interfaces is going to become the standard.

    In 2015, we’ll start to see the first major use of cloud analytics—for onpremise data. Til now, cloud analytics have been primarily used for data in cloud apps. In 2015 companies will begin to choose the cloud when it makes sense for their business case, not only because the data is there.

    We are starting to see an age when data is interactive enough that it can become the backbone of a conversation. Now that people have speed-ofthought analytical tools, they can quickly analyze data, mash it up with other data and redesign it to create a new perspective. And as a result of these data conversations, organizations will get more insight from their data.

    The arrival on the scene of vox and continued ascendance of sites like fivethirtyeight.com will force more newsrooms to integrate data analytics into their online presence. This trend will have a spillover effect from the public sphere to organizations, encouraging companies that are lagging in analytics to get with the times.

    Workers are spending less time at their desks. But that doesn’t mean they should be less informed by data; in fact they have a greater need for data than ever before. Mobile solutions for many analytics emerged years ago and are finally reaching a level of maturity that means that mobile workers really can do light analysis from the road. And the emphasis on mobile has forced vendors to offer more natural and intuitive interfaces across the board.

    Advances in graphical, intuitive modeling will mean that business users can begin to use predictive analytics without the need for extensive expert consultation or scripting. As self-service analytics becomes more mainstream, tasks such as forecasting and prediction, will become more common– and a lot less painful.
  • Since graduating with a Master of Statistics , Analytics has been a core theme in my 20+ year career. Using data to solve problems is a passion that drives me to seekout and apply technology innovatively. In the new digital world, I aim to be a champion and an evangelist to the principle of "Evidence based Decision Making".
    Currently Director Risk Analytics Deloitte Australia
  • A Data Analyst with 15 years of experience (Taxation - 10 years, Data driven marketing - 5 years) Experience across a spectrum of data analysis tasks (exploratory analysis, developing risk/predictive variables, predictive modelling, reporting) Well developed programming skills in a range of data analysis softwares such as Knime, SAS, SPSS Clementine (IBM Modeller)
    He’s a highly regarded Data Analyst at OSR.
  • A Data Analyst with 15 years of experience (Taxation - 10 years, Data driven marketing - 5 years) Experience across a spectrum of data analysis tasks (exploratory analysis, developing risk/predictive variables, predictive modelling, reporting) Well developed programming skills in a range of data analysis softwares such as Knime, SAS, SPSS Clementine (IBM Modeller)
    He’s a highly regarded Data Analyst at OSR.
  • In Slide Show mode, click the arrow to enter the PowerPoint Getting Started Center.
  • KNIME Meetup 2016-04-16

    1. 1. Creating Insights at the Speed of Business W. Daniel Cox, III CPA, CMA, CFM Chief Executive Officer
    2. 2. WELCOME to Meet Up Group
    3. 3. Energise Organisational Advantage through Awareness and Insight Registration & Networking Keynote – Dan Cox, CEO of Data Transformed KNIME & Harvest Analytics – Tom Park Office of State Revenue Case Study – Anand Antony Using Spark with KNIME – Chhitesh Shrestha Networking & Drinks
    4. 4. Journey to Best in Class Analytics We Help our Clients along this Path Time Value Proactive Discover and Predict Performers Reactive Monitor and Alert FollowersStatic Report and Drill-down Laggards Dynamic Analytics-enabled business processes Innovators
    5. 5. YOUR DATA. CLEARLY Source Your Data Realise Data Value Prepare Your Data Data Preparation Plan With Data Budget/Planning Visualise All Data Visualisation
    6. 6. BUDGET PLANNING Budgeting Forecasting Planning Demand Planning Workforce Management Accounting Financing Cashflow Sales Forecasting Modelling Campaign Forecasting DATA PREPARATION Data Governance Data Quality Master Data Management Data Warehousing Data Science ETL Applications Data Analytics SQL Language Python Language Scripting Database Management Application Development Database Development Textual ETL Text Analytics Hadoop Ecosphere Analytical Databases Relational Databases Microsoft Analysis Server OLAP OLTP Multi-Dimensional Databases Data Vault Architectures Star-Schema Architectures Data Marting Data Transformed Skill Sets VISUALISATION 30% BUDGET PLANNING 20% DATA PREPARATION 50% VISUALISATION Dashboarding Reporting Charting Location Analytics Statistical Analytics Data Analytics Business Analysis Story Telling Symmantic Layer Presentation Layer Collabration
    7. 7. Slow Fast Immature Industrial Strength EnterpriseReadiness Performance Good Enough Production Ready Traditional Operational Open Source Vortex Actian – Fast, Industrialized, Open Superior Big Data SQL with Industrialized strength
    8. 8. Do YOU Have a BIG DATA Role
    9. 9. Global Data Snapshot … 7,254,549,796 Total World Population 3,035,749,340 Internet Users 2,078,680,860 Active Social Network Users 6,572,950,124 Mobile Subscribers
    10. 10. • Challenges • Constrains data to app • Can’t manage new data • Costly to Scale Business Value Clickstream Geolocation Web Data Internet of Things Docs, emails Server logs 2012 2.8 Zettabytes 2020 44 Zettabytes LAGGARDS INDUSTRY LEADERS 1 2 New Data ERP CRM SCM New Traditional Traditional systems under pressure 12 Zettabytes
    11. 11. Volume Exponential Growth Variety New Data Types Velocity Time To Value The Digital Floodgates have opened… and will never be turned off again
    12. 12. Big Data equals Big Opportunity Data Source & Type Untouched Value New Possibilities 88OF BIG DATA 15TRILLION $ Universal Access Time To Value OF COMPANIES % % 1
    13. 13. Trends for BIG DATA In the Cloud
    14. 14. Trends for BIG DATA Personal ETL
    15. 15. Trends for BIG DATA NoSQL
    16. 16. Trends for BIG DATA Hadoop
    17. 17. Trends for BIG DATA Data Lake
    18. 18. Trends for BIG DATA Ecosystem
    19. 19. Trends for BIG DATA Internet of Things
    20. 20. Big Data Trends 1. Big Data in the Cloud 2. Personal ETL 3. NoSQL 4. Hadoop 5. Data Lakes 6. Big Data Ecosystem 7. Internet of Things
    21. 21. BIG DATA is STILL just Data It needs to be translated into Answers
    22. 22. Acquire, Grow & Retain Customers Who are your best customers and how can you keep them satisfied? Where can you find more customers like them? Big data holds the insights into who your customers are and what motivates them.
    23. 23. Optimise Operations & Reduce Fraud Are your operational processes and systems as efficient as they could be? Could you reduce waste and fraud if you had real-time visibility into your business? Adopting a big data and analytics strategy can help you plan, manage and maximise operations, supply chains and the use of infrastructure assets.
    24. 24. Transform Financial Processes Do you have real-time access to reliable information about all aspects of your business? Do you have the visibility, insight and control over financial performance to better measure, monitor and shape business outcomes? Analysing all of your data, including big data, can drive enterprise agility and provide insights to help you make better decisions
    25. 25. Manage Risk How can you mitigate the financial and operational risks that could devastate your organisation? How can you manage regulatory change and reduce the risk of non-compliance? Proactively identifying, understanding and managing financial and operational risk can enable more risk-aware, confident decision making
    26. 26. Create New Business Models Are your competitors making bigger strides in changing your industry or creating new markets than you? Does your organisation’s culture support innovative thinking and exploration? Explore strategic options for business growth, using new perspectives gained from exploiting big data and analytics
    27. 27. Improve IT Economics Is your existing IT infrastructure able to provide the insights that decision makers need? Are you doing enough to protect your data centre and data from potential criminal activity or fraud? Lead the creation of new value and agility for your business by optimising big data and analytics for faster insight at a lower cost
    28. 28. Analytics Trends 1. Data Governance 2. Social Intelligence 3. Analytics Organisation-Wide 4. Community Collaboration 5. Integration of Everything 6. Cloud Analytics 7. Conversational Data 8. Journalism Data 9. Mature Mobility 10.Smart Analytics
    29. 29. Areas BIG DATA is Helping 1. Operations & Optimising 2. Product Development 3. Customer Experience 4. Understanding and Targeting Customers
    30. 30. Performance Examples Actian is Helping These Companies Achieve Leadership Digital Marketing: Hyper-segmentation every hour Banking: Enterprise Risk every 2 minutes Retail: Enterprise Market Basket Analysis every minute Defense: Network intrusion models every second Fraud: Adjustments every nano-second Amazon Redshift – Actian Matrix Cloud-based, Petabyte Scale Data Warehouse
    31. 31. The Value of Business Intelligence Organisations competing with Analytics Substantially OUTPERFORM their peers by 220%
    32. 32. Data Transformed
    33. 33. Actian Vector: Example https://youtu.be/dYTF5ZNioEI Identical 150 Million Transaction Query Comparison between Actian Vector & Oracle DBMS
    34. 34. Harvest Analytics Tom Park
    35. 35. Overview KNIME & Big Data Tom Park
    36. 36. Gartner 2016 Magic Quadrant Advanced Analytics Platforms Niche Players (5): FICO Lavastorm Megaputer Prognoz Accenture Leaders (5): SAS IBM KNIME RapidMiner Dell Visionaries (4): Microsoft Alteryx Alpine Data Labs Predixion Challengers (2): SAP Angoss
    37. 37. Changes from 2015 to 2016 X Salford & TIBCO Dropped due to not satisfying the visual composition
    38. 38. Main Big Data Technologies NO SQL
    39. 39. Big Data Architecture
    40. 40. KNIME Big Data Extensions
    41. 41. Future Trends
    42. 42. Missing Ingredient to Success?
    43. 43. www.dataroos.com
    44. 44. Office of State Revenue Anand Antony
    45. 45. KNIME @ OSR Anand Antony Senior Data Analyst Operations Analytics and Intelligence Office of State Revenue anandjantony@gmail.com Ph. 0414491765
    46. 46. OSR: Who are we?  As NSW’s principal revenue agency, OSR administers state taxation and revenue for, and on behalf of, the people of NSW ◦ Payroll tax ◦ Land tax ◦ Duties ◦ Grants such as First Home Benefits
    47. 47. Data Analytics Team: Who are we?  Operations Analytics & Intelligence is the analytics wing of the Operations Division in OSR ◦ Three teams – Business Intelligence, Data Analytics and Data Team  Data Analytics team consists of 10 analysts  Supports tax auditors by detecting possible non- compliant clients ◦ Via matching data from various sources and analysing them ◦ 60+ data sources
    48. 48. Data Analytics Scenario - Past  Data matching, preparation and analysis ◦ SPSS Clementine, SAS Enterprise Guide  Data mining ◦ Salford Systems  Reporting/Dashboards ◦ Excel  Fuzzy data matching ◦ SSA Name (Informatica)
    49. 49. Data Analytics Scenario - Current  Data matching, preparation and analysis ◦ KNIME (around 70% transitioned from Clementine/SAS)  Data mining ◦ Salford Systems ◦ Will be evaluating KNIME  Reporting/Dashboards ◦ Excel  Fuzzy data matching ◦ SSA Name (Informatica)
    50. 50. Internal&ExternalDataSources Data Governance Data Quality Data Matching Metadata Management MapR Hadoop Distribution Data Lake VortexMapR Advanced Data Analytics Actian/Knime Machine Learning H2O/ Spark Actian/Knime Future: Unified Analytic & Data Management Platform Governance Visualisation Presentation Layer Datamart On the fly / Sandpit Spotfire/ Tableau/ Graph DBs
    51. 51. Why KNIME?  Enrich with coding via coding snippets ◦ Mostly Java snippet at the moment  Start with canvas programming  Fast and easy learning curve for data scientists  Can tackle almost any analytic task
    52. 52. KNIME - Having the best of both worlds! ◦ Canvas programming  Coding
    53. 53. What do we use KNIME for?  Pretty much for everything! (except reporting and datamining) ◦ Data reading (text files, databases, non- standard formats) ◦ Data merging (potentially fuzzy matching too in future) ◦ Data manipulation ◦ Creating new variables ◦ Data Output ◦ Modelling (possibly in future)
    54. 54. Key nodes/functionalities ◦ Sorter, Column Reorder, Column Filter, Column Rename ◦ Concatenate, Joiner, Reference Row Filter (anti- join) ◦ Missing value ◦ Math Formula, String Manipulation, Rule Engine, Java Snippet ◦ GroupBy (aggregate, dedupe) ◦ Value Counter, Pivoting ◦ Looping ◦ Regular expressions/wildcards in various nodes
    55. 55. Data Preparation Example
    56. 56. Case study 1  Officers fill in a questionnaire on the entity audited – one excel spreadsheet for one entity  Collate all the spreadsheets stored in a location  Massage the data to produce an analysis dataset with one row per entity  Key KNIME nodes/functionalities used ◦ List files ◦ Table Row to Variable Loop Start, Loop End ◦ Java Snippet
    57. 57.  Questionnaire data for one client
    58. 58. Overview of Knime flow
    59. 59. Bring data to tabular form Within this Meta node, there is one Java Snippet for each question in the questionnaire
    60. 60. Details of a Java Snippet
    61. 61. Result of the Meta Node To get a single record for a client - Just take the last row for a “client block”! - Explained in the next slide
    62. 62. For each “client block” aggregate the variables
    63. 63. End result 1000 spread-sheets 1000 rows
    64. 64. Case study 2 – Use of Flow variables  Technique ◦ Input metadata rules into a file ◦ Read and convert into flow variables  Example ◦ Reorder variables in a dataset as per the order in the data dictionary ◦ We use “Flow variables” tab in Column Reorder tab to achieve this
    65. 65. Use of flow variables Use this tab Do not use this “manual” tab
    66. 66. KNIME wishlist!  Offset function in some nodes eg. Rule Engine, Math formula Offset function gives the value of a variable in a previous row. Eg. In SPSS Clementine @OFFSET(var,1) gives the value in the previous row. Note:- Within Java Snippet this is readily achieved since a variable retains its value until it is over-written. Therefore we can conveniently first utilise the value populated from the previous row inside a formula. Then we can update the value from the current row so as to be used in the next row.
    67. 67. Questions?
    68. 68. Data Transformed Chhitesh Shrestha
    69. 69. Apache Spark on KNIME Unleash the power of Big Data on Hadoop
    70. 70. The Big Data Problem: Data Volume 1. Storage are getting cheaper 2. Data sources are increasing 3. Thus, data is growing faster YARN But, Still processing them is a problem. Why ?
    71. 71. The Big Data Problem: Processing Now, as the memory is cheaper.
    72. 72. Why Apache Spark ? Apache Spark is an open source parallel processing framework that enables users to run large scale data analytics across clustered computers. • Speed • Flexible with programming platform • Generality • Run Everywhere
    73. 73. Spark Components
    74. 74. Spark Comparison on Calculation of Average
    75. 75. List of Spark Nodes
    76. 76. Getting the data in and out of Spark Data into Spark Data out of Spark
    77. 77. Statistics and Data Manipulation Nodes Statistics Data Manipulation
    78. 78. Mining Nodes Learners Predictors
    79. 79. Other Nodes
    80. 80. KNIME Spark Executor Architecture
    81. 81. Current Supported Hadoop and KNIME Versions Hadoop Versions • Hortonworks HDP 2.2 with Spark 1.2.x • Hortonworks HDP 2.3 with Spark 1.3.x • Cloudera CDH 5.3 with Spark 1.2.x • Cloudera CDH 5.4 with Spark 1.3.x KNIME Versions • KNIME Analytics Platform 3.1 • KNIME Server 4.2
    82. 82. Lots of talking… Lets view a demo
    83. 83. Data Transformed YOUR DATA. CLEARLY. info@DataTransformed.com.au 02 9956 3781
    84. 84. Actian Vortex on Hadoop 10 minute Demo http://videos.actian.com/watch/6iEZqvJrEKL2btoqIDImcg Demonstration of Vortex, Dataflow & Vector Comparison between Actian Vortex & Cloudera Impala
    85. 85. Actian Vector: Example https://youtu.be/dYTF5ZNioEI Identical 150 Million Transaction Query Comparison between Actian Vector & Oracle DBMS

    ×