SlideShare a Scribd company logo
1 of 29
Download to read offline
DAMA Day NYC
April 19, 2016
Innovations in Data Governance, Architecture and Analytics
Robert Quinn
q Introduction
q Big Data defined
q Context of Big Data ‘Hype Cycle’
q Challenges – created by Big Data
q Opportunities – introduced by Big Data solutions
q Case studies
q Conclusions
What are we covering
tl;dr
q Graphs
q Streaming
q Schema on Read
Ø DG
Ø DQ
Ø Analysis
q Cognitive Computing
Big Data – Common Definition
q The 3 Vs
Ø Volume - amount of data
Ø Velocity - speed of data in and out
Ø Variety - range of data types and sources
“Big Data” is about the capacity to aggregate,
cross-reference, utilize and manage complexity.
Variety is the primary ‘complicator’ for business’s facing
big data challenges.
Big Data – ‘Original’ Definition
A cultural, technological, and scholarly phenomenon that
rests on the interplay of:
q Technology: maximizing computation power and
algorithmic accuracy to gather, analyze, link, and
compare large and diverse data sets.
q Analysis: drawing on large data sets to identify
patterns in order to make economic, social, technical,
and legal claims.
q Mythology: the widespread belief that large data sets
offer a higher form of intelligence and knowledge that
can generate insights that were previously impossible,
with the aura of truth, objectivity, and accuracy.
Big Numbers
Big data infrastructure, software, and services spend:
Ø $16.6 billion in 2014
Ø $41.5 billion in 2018 (CAGR of ~26%)
About 7x higher than the growth rate of the worldwide
information and communication technology market.
Interest in Big Data, Hadoop
Perspective
Context
What else is happening in parallel to the Big Data craze:
q Open Data Movement and Data Monetization
q Cloud Computing, Open Source, Software as a Service
q Increased Risk awareness (Global Financial crisis)
q Security, Data breaches
q Ubiquitous Broad-band
q Advances in Machine Learning and AI
Challenges to existing DM approach
q Relational database management systems and desktop
statistics and visualization packages often have difficulty
handling big data.
q IT, DG, DQ "paradigms" have difficulty coping
Ø Enterprise Data Warehouses - struggle with variety
Ø ETL based architectures "limitations" have become
more widely understood
Ø Centralized DG and DQ - struggle with velocity
and variety
Challenges - continued
q Existing User Tools/Approaches
Ø Desktop Solution (i.e. Excel) - struggle with volume
Ø Manual data cleanse - struggle with volume and variety
Ø High risk of data loss
q Availability of capable/experienced resources
q Technical solutions have shorter and shorter half lives
q Project Funding (Dev -> Test -> Production model)
Ø Analytics is by nature often throw-away, experiments
Opportunities (Technologies)
q Alternatives
Ø Relational data model (No-SQL)
Ø Embedded SQL engine (Data Processing Engines)
Ø ETL architectures (Wrangling, Streaming)
q Main-stream availability of clustering and in-memory
hardware/software solutions
q Availability of algorithms for dealing with text and other
"unstructured" data has increase dramatically
q Products & Services that provide "out of box" Machine
Learning capabilities
q Products & Services that provide "out of box" support for
combining Analytics and Operational Capabilities
Hadoop
Spark vs. MapReduce
Spark Components
Spark Connectors
Spark Usage Survey
Machine Learning
Cognitive computing; leverages machine learning and
artificial intelligence to infer and predict; offers tremendous
potential to augment human expertise.
q ML development process
Ø Goal determination (requirements, outcomes)
Ø Data analysis (discovery and wrangling)
Ø Model training
Ø Evaluation
Ø Deployment and Monitoring
Opportunities (Process / Approaches)
q Collaboration capabilities appearing in Analytics / MDM
q API services for data quality, data enhancement
q Crowd Sourcing services
q Data as a service
q Explosion of research, books and courseware targeting
analytics, big data architecture and solutions
q Analyst Driven Data Sourcing (Self Service Data Prep)
q Data Catalogs
q Transparent/repeatable sourcing and analysis
q Collaborative Governance (aka ‘Expert Sourcing’)
q Crowd Sourcing, Consensus based DQ
q DQ based machine learning (aka ‘Data Curation’)
Opportunities (DG and DQ)
DQ / DG + Cognitive Computing
A ‘TAMR’ view of DM
Data Wrangling / Data Prep
“Data preparation tools have emerged as a vital method for
analysts to quickly source, blend, and wrangle data
independent of enterprise architecture’s (EA) data
management processes.” Forrester
q Features / Benefits
Ø Agility (build and validate in a single process)
Ø Repeatability / Transparency
Ø Easy to use, with many ‘advanced’ features
Ø Collaboration
Ø Discovery, Cleaning, Enrichment, Publishing, ...
q Massive increase in data volume
q Machine Learning - Member Retention
q Sentiment Analysis - Improving Survey analysis
q Crowd Sourcing - Initial Match Evaluation and Merge
Case Studies
Conclusion
q Separate Mythology from the technology and approaches
q Leverage the Hype of Big Data to make improvements
q Understand which of the 3Vs you want to focus on
q The most important aspects are still
Ø Business Goals
Ø Culture
Ø People
q Leverage
Ø Open Source
Ø Cloud Computing
Ø SAAS
Thank You!
For additional information:
Robert Quinn
Solution Architect
info@fyisolutions.com or call 973.331.9050
Messaging and Streaming Frameworks
Spark Ecosystem
Machine Learning – Example Use Cases

More Related Content

What's hot

2016 SDMX Experts meeting, Opening of SDMX Capacity Building - Introduction ...
2016 SDMX Experts meeting, Opening of SDMX Capacity Building  - Introduction ...2016 SDMX Experts meeting, Opening of SDMX Capacity Building  - Introduction ...
2016 SDMX Experts meeting, Opening of SDMX Capacity Building - Introduction ...StatsCommunications
 
Data Visualisation: Types, Principles, and Tools
Data Visualisation: Types, Principles, and ToolsData Visualisation: Types, Principles, and Tools
Data Visualisation: Types, Principles, and ToolsSumandro C
 
Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?Saratoga
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...StatsCommunications
 
Tatyana Matvienko,Senior Java Developer, Big data storages
 Tatyana Matvienko,Senior Java Developer, Big data storages Tatyana Matvienko,Senior Java Developer, Big data storages
Tatyana Matvienko,Senior Java Developer, Big data storagesAlina Vilk
 
Big data storages
Big data storagesBig data storages
Big data storagesDataArt
 
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...StatsCommunications
 
9th International Conference on Database and Data Mining (DBDM 2021)
9th International Conference on Database and Data Mining (DBDM 2021)9th International Conference on Database and Data Mining (DBDM 2021)
9th International Conference on Database and Data Mining (DBDM 2021)albert ca
 

What's hot (11)

2016 SDMX Experts meeting, Opening of SDMX Capacity Building - Introduction ...
2016 SDMX Experts meeting, Opening of SDMX Capacity Building  - Introduction ...2016 SDMX Experts meeting, Opening of SDMX Capacity Building  - Introduction ...
2016 SDMX Experts meeting, Opening of SDMX Capacity Building - Introduction ...
 
Data Visualisation: Types, Principles, and Tools
Data Visualisation: Types, Principles, and ToolsData Visualisation: Types, Principles, and Tools
Data Visualisation: Types, Principles, and Tools
 
So you want to be a Data Scientist?
So you want to be a Data Scientist?So you want to be a Data Scientist?
So you want to be a Data Scientist?
 
Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?Martin Willcox - What is a Data Lake, Anyway?
Martin Willcox - What is a Data Lake, Anyway?
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
Wheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation RelayWheeler & Benedict -- Enabling the Preservation Relay
Wheeler & Benedict -- Enabling the Preservation Relay
 
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
2016 SDMX Experts meeting, How to collect data using SDMX? Hubertus Cloodt, A...
 
Tatyana Matvienko,Senior Java Developer, Big data storages
 Tatyana Matvienko,Senior Java Developer, Big data storages Tatyana Matvienko,Senior Java Developer, Big data storages
Tatyana Matvienko,Senior Java Developer, Big data storages
 
Big data storages
Big data storagesBig data storages
Big data storages
 
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
2016 SDMX Experts meeting, Using SDMX to enable data-sharing for analytical a...
 
9th International Conference on Database and Data Mining (DBDM 2021)
9th International Conference on Database and Data Mining (DBDM 2021)9th International Conference on Database and Data Mining (DBDM 2021)
9th International Conference on Database and Data Mining (DBDM 2021)
 

Viewers also liked

DAMA Ireland Kick-Off Event 29Mar2016
DAMA Ireland Kick-Off Event 29Mar2016DAMA Ireland Kick-Off Event 29Mar2016
DAMA Ireland Kick-Off Event 29Mar2016DAMA Ireland
 
Metadata & Interoperability: Free Tools
Metadata & Interoperability: Free ToolsMetadata & Interoperability: Free Tools
Metadata & Interoperability: Free ToolsMike Jennings
 
Mar-10 Improving Data Management through utilizing Big Data - Mapping a Techn...
Mar-10 Improving Data Management through utilizing Big Data - Mapping a Techn...Mar-10 Improving Data Management through utilizing Big Data - Mapping a Techn...
Mar-10 Improving Data Management through utilizing Big Data - Mapping a Techn...mfjennin777
 
DAMA Ireland - CDMP Overview (How to become a Certified Data Management Pract...
DAMA Ireland - CDMP Overview (How to become a Certified Data Management Pract...DAMA Ireland - CDMP Overview (How to become a Certified Data Management Pract...
DAMA Ireland - CDMP Overview (How to become a Certified Data Management Pract...DAMA Ireland
 
Dama - Protecting Sensitive Data on a Database
Dama - Protecting Sensitive Data on a DatabaseDama - Protecting Sensitive Data on a Database
Dama - Protecting Sensitive Data on a Databasejohanswart1234
 
DV 2016: Why Your Organization Needs Data and Analytics Governance
DV 2016: Why Your Organization Needs Data and Analytics GovernanceDV 2016: Why Your Organization Needs Data and Analytics Governance
DV 2016: Why Your Organization Needs Data and Analytics GovernanceTealium
 
2015 Mar-10 Improving Data Management through Utilizing Big Data - Mapping a ...
2015 Mar-10 Improving Data Management through Utilizing Big Data - Mapping a ...2015 Mar-10 Improving Data Management through Utilizing Big Data - Mapping a ...
2015 Mar-10 Improving Data Management through Utilizing Big Data - Mapping a ...mfjennin777
 
DAMA Ireland - Data Trust event 9th June 2016
DAMA Ireland - Data Trust event 9th June 2016DAMA Ireland - Data Trust event 9th June 2016
DAMA Ireland - Data Trust event 9th June 2016DAMA Ireland
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
DAMA Ireland - GDPR
DAMA Ireland - GDPRDAMA Ireland - GDPR
DAMA Ireland - GDPRDAMA Ireland
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata StrategiesDATAVERSITY
 
Information Management training courses in Dubai
Information Management training courses in DubaiInformation Management training courses in Dubai
Information Management training courses in DubaiChristopher Bradley
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceRoland Bullivant
 
Big Data Scotland 2016
Big Data Scotland 2016Big Data Scotland 2016
Big Data Scotland 2016Ray Bugg
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)DATAVERSITY
 
DAMA BCS Chris Bradley Information is at the Heart of ALL architectures 18_06...
DAMA BCS Chris Bradley Information is at the Heart of ALL architectures 18_06...DAMA BCS Chris Bradley Information is at the Heart of ALL architectures 18_06...
DAMA BCS Chris Bradley Information is at the Heart of ALL architectures 18_06...Christopher Bradley
 
CDMP preparation workshop EDW2016
CDMP preparation workshop EDW2016CDMP preparation workshop EDW2016
CDMP preparation workshop EDW2016Christopher Bradley
 
LDM Webinar: Data Modeling & Metadata Management
LDM Webinar: Data Modeling & Metadata ManagementLDM Webinar: Data Modeling & Metadata Management
LDM Webinar: Data Modeling & Metadata ManagementDATAVERSITY
 

Viewers also liked (20)

DAMA Ireland Kick-Off Event 29Mar2016
DAMA Ireland Kick-Off Event 29Mar2016DAMA Ireland Kick-Off Event 29Mar2016
DAMA Ireland Kick-Off Event 29Mar2016
 
Metadata & Interoperability: Free Tools
Metadata & Interoperability: Free ToolsMetadata & Interoperability: Free Tools
Metadata & Interoperability: Free Tools
 
Mar-10 Improving Data Management through utilizing Big Data - Mapping a Techn...
Mar-10 Improving Data Management through utilizing Big Data - Mapping a Techn...Mar-10 Improving Data Management through utilizing Big Data - Mapping a Techn...
Mar-10 Improving Data Management through utilizing Big Data - Mapping a Techn...
 
my document
my documentmy document
my document
 
DAMA Ireland - CDMP Overview (How to become a Certified Data Management Pract...
DAMA Ireland - CDMP Overview (How to become a Certified Data Management Pract...DAMA Ireland - CDMP Overview (How to become a Certified Data Management Pract...
DAMA Ireland - CDMP Overview (How to become a Certified Data Management Pract...
 
Dama - Protecting Sensitive Data on a Database
Dama - Protecting Sensitive Data on a DatabaseDama - Protecting Sensitive Data on a Database
Dama - Protecting Sensitive Data on a Database
 
DV 2016: Why Your Organization Needs Data and Analytics Governance
DV 2016: Why Your Organization Needs Data and Analytics GovernanceDV 2016: Why Your Organization Needs Data and Analytics Governance
DV 2016: Why Your Organization Needs Data and Analytics Governance
 
2015 Mar-10 Improving Data Management through Utilizing Big Data - Mapping a ...
2015 Mar-10 Improving Data Management through Utilizing Big Data - Mapping a ...2015 Mar-10 Improving Data Management through Utilizing Big Data - Mapping a ...
2015 Mar-10 Improving Data Management through Utilizing Big Data - Mapping a ...
 
DAMA Ireland - Data Trust event 9th June 2016
DAMA Ireland - Data Trust event 9th June 2016DAMA Ireland - Data Trust event 9th June 2016
DAMA Ireland - Data Trust event 9th June 2016
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
DAMA Ireland - GDPR
DAMA Ireland - GDPRDAMA Ireland - GDPR
DAMA Ireland - GDPR
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Metadata Strategies
Metadata StrategiesMetadata Strategies
Metadata Strategies
 
Information Management training courses in Dubai
Information Management training courses in DubaiInformation Management training courses in Dubai
Information Management training courses in Dubai
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
 
Big Data Scotland 2016
Big Data Scotland 2016Big Data Scotland 2016
Big Data Scotland 2016
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
 
DAMA BCS Chris Bradley Information is at the Heart of ALL architectures 18_06...
DAMA BCS Chris Bradley Information is at the Heart of ALL architectures 18_06...DAMA BCS Chris Bradley Information is at the Heart of ALL architectures 18_06...
DAMA BCS Chris Bradley Information is at the Heart of ALL architectures 18_06...
 
CDMP preparation workshop EDW2016
CDMP preparation workshop EDW2016CDMP preparation workshop EDW2016
CDMP preparation workshop EDW2016
 
LDM Webinar: Data Modeling & Metadata Management
LDM Webinar: Data Modeling & Metadata ManagementLDM Webinar: Data Modeling & Metadata Management
LDM Webinar: Data Modeling & Metadata Management
 

Similar to Innovations in Data Governance, Analytics and Machine Learning at DAMA Day NYC

Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career pathRubikal
 
Business intelligence data analytics-visualization
Business intelligence data analytics-visualizationBusiness intelligence data analytics-visualization
Business intelligence data analytics-visualizationMuthu Natarajan
 
Business intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationBusiness intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationMuthu Natarajan
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptxXanGwaps
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...IT Network marcus evans
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?Inside Analysis
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesDeepaR42
 
The Modern Data Warehouse - A Hybrid Story
The Modern Data Warehouse - A Hybrid StoryThe Modern Data Warehouse - A Hybrid Story
The Modern Data Warehouse - A Hybrid StoryPerficient, Inc.
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science TeamsEMC
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastEric Kavanagh
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatiaSatish Bhatia
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...European Data Forum
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
 
Careers in Data Science _ Navigating the Digital Frontier (1).pptx
Careers in Data Science _  Navigating the Digital Frontier (1).pptxCareers in Data Science _  Navigating the Digital Frontier (1).pptx
Careers in Data Science _ Navigating the Digital Frontier (1).pptx2075AAGEPRATIK
 

Similar to Innovations in Data Governance, Analytics and Machine Learning at DAMA Day NYC (20)

Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
Business intelligence data analytics-visualization
Business intelligence data analytics-visualizationBusiness intelligence data analytics-visualization
Business intelligence data analytics-visualization
 
Business intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data VisualizationBusiness intelligence, Data Analytics & Data Visualization
Business intelligence, Data Analytics & Data Visualization
 
Introduction to BigData
Introduction to BigData Introduction to BigData
Introduction to BigData
 
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx351315535-Module-1-Intro-to-Data-Science-pptx.pptx
351315535-Module-1-Intro-to-Data-Science-pptx.pptx
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 
Data mining applications
Data mining applicationsData mining applications
Data mining applications
 
The Modern Data Warehouse - A Hybrid Story
The Modern Data Warehouse - A Hybrid StoryThe Modern Data Warehouse - A Hybrid Story
The Modern Data Warehouse - A Hybrid Story
 
Building Data Science Teams
Building Data Science TeamsBuilding Data Science Teams
Building Data Science Teams
 
Database Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory WebcastDatabase Survival Guide: Exploratory Webcast
Database Survival Guide: Exploratory Webcast
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
M.Florence Dayana
M.Florence DayanaM.Florence Dayana
M.Florence Dayana
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
EDF2014: BIG - NESSI Networking Session: Edward Curry, National University of...
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Careers in Data Science _ Navigating the Digital Frontier (1).pptx
Careers in Data Science _  Navigating the Digital Frontier (1).pptxCareers in Data Science _  Navigating the Digital Frontier (1).pptx
Careers in Data Science _ Navigating the Digital Frontier (1).pptx
 

Innovations in Data Governance, Analytics and Machine Learning at DAMA Day NYC

  • 1. DAMA Day NYC April 19, 2016 Innovations in Data Governance, Architecture and Analytics Robert Quinn
  • 2. q Introduction q Big Data defined q Context of Big Data ‘Hype Cycle’ q Challenges – created by Big Data q Opportunities – introduced by Big Data solutions q Case studies q Conclusions What are we covering
  • 3. tl;dr q Graphs q Streaming q Schema on Read Ø DG Ø DQ Ø Analysis q Cognitive Computing
  • 4. Big Data – Common Definition q The 3 Vs Ø Volume - amount of data Ø Velocity - speed of data in and out Ø Variety - range of data types and sources “Big Data” is about the capacity to aggregate, cross-reference, utilize and manage complexity. Variety is the primary ‘complicator’ for business’s facing big data challenges.
  • 5. Big Data – ‘Original’ Definition A cultural, technological, and scholarly phenomenon that rests on the interplay of: q Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large and diverse data sets. q Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims. q Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy.
  • 6. Big Numbers Big data infrastructure, software, and services spend: Ø $16.6 billion in 2014 Ø $41.5 billion in 2018 (CAGR of ~26%) About 7x higher than the growth rate of the worldwide information and communication technology market.
  • 7. Interest in Big Data, Hadoop
  • 9. Context What else is happening in parallel to the Big Data craze: q Open Data Movement and Data Monetization q Cloud Computing, Open Source, Software as a Service q Increased Risk awareness (Global Financial crisis) q Security, Data breaches q Ubiquitous Broad-band q Advances in Machine Learning and AI
  • 10. Challenges to existing DM approach q Relational database management systems and desktop statistics and visualization packages often have difficulty handling big data. q IT, DG, DQ "paradigms" have difficulty coping Ø Enterprise Data Warehouses - struggle with variety Ø ETL based architectures "limitations" have become more widely understood Ø Centralized DG and DQ - struggle with velocity and variety
  • 11. Challenges - continued q Existing User Tools/Approaches Ø Desktop Solution (i.e. Excel) - struggle with volume Ø Manual data cleanse - struggle with volume and variety Ø High risk of data loss q Availability of capable/experienced resources q Technical solutions have shorter and shorter half lives q Project Funding (Dev -> Test -> Production model) Ø Analytics is by nature often throw-away, experiments
  • 12. Opportunities (Technologies) q Alternatives Ø Relational data model (No-SQL) Ø Embedded SQL engine (Data Processing Engines) Ø ETL architectures (Wrangling, Streaming) q Main-stream availability of clustering and in-memory hardware/software solutions q Availability of algorithms for dealing with text and other "unstructured" data has increase dramatically q Products & Services that provide "out of box" Machine Learning capabilities q Products & Services that provide "out of box" support for combining Analytics and Operational Capabilities
  • 18. Machine Learning Cognitive computing; leverages machine learning and artificial intelligence to infer and predict; offers tremendous potential to augment human expertise. q ML development process Ø Goal determination (requirements, outcomes) Ø Data analysis (discovery and wrangling) Ø Model training Ø Evaluation Ø Deployment and Monitoring
  • 19. Opportunities (Process / Approaches) q Collaboration capabilities appearing in Analytics / MDM q API services for data quality, data enhancement q Crowd Sourcing services q Data as a service q Explosion of research, books and courseware targeting analytics, big data architecture and solutions
  • 20. q Analyst Driven Data Sourcing (Self Service Data Prep) q Data Catalogs q Transparent/repeatable sourcing and analysis q Collaborative Governance (aka ‘Expert Sourcing’) q Crowd Sourcing, Consensus based DQ q DQ based machine learning (aka ‘Data Curation’) Opportunities (DG and DQ)
  • 21. DQ / DG + Cognitive Computing
  • 23. Data Wrangling / Data Prep “Data preparation tools have emerged as a vital method for analysts to quickly source, blend, and wrangle data independent of enterprise architecture’s (EA) data management processes.” Forrester q Features / Benefits Ø Agility (build and validate in a single process) Ø Repeatability / Transparency Ø Easy to use, with many ‘advanced’ features Ø Collaboration Ø Discovery, Cleaning, Enrichment, Publishing, ...
  • 24. q Massive increase in data volume q Machine Learning - Member Retention q Sentiment Analysis - Improving Survey analysis q Crowd Sourcing - Initial Match Evaluation and Merge Case Studies
  • 25. Conclusion q Separate Mythology from the technology and approaches q Leverage the Hype of Big Data to make improvements q Understand which of the 3Vs you want to focus on q The most important aspects are still Ø Business Goals Ø Culture Ø People q Leverage Ø Open Source Ø Cloud Computing Ø SAAS
  • 26. Thank You! For additional information: Robert Quinn Solution Architect info@fyisolutions.com or call 973.331.9050
  • 29. Machine Learning – Example Use Cases