Careers in Data Science _ Navigating the Digital Frontier (1).pptx
Innovations in Data Governance, Analytics and Machine Learning at DAMA Day NYC
1. DAMA Day NYC
April 19, 2016
Innovations in Data Governance, Architecture and Analytics
Robert Quinn
2. q Introduction
q Big Data defined
q Context of Big Data ‘Hype Cycle’
q Challenges – created by Big Data
q Opportunities – introduced by Big Data solutions
q Case studies
q Conclusions
What are we covering
4. Big Data – Common Definition
q The 3 Vs
Ø Volume - amount of data
Ø Velocity - speed of data in and out
Ø Variety - range of data types and sources
“Big Data” is about the capacity to aggregate,
cross-reference, utilize and manage complexity.
Variety is the primary ‘complicator’ for business’s facing
big data challenges.
5. Big Data – ‘Original’ Definition
A cultural, technological, and scholarly phenomenon that
rests on the interplay of:
q Technology: maximizing computation power and
algorithmic accuracy to gather, analyze, link, and
compare large and diverse data sets.
q Analysis: drawing on large data sets to identify
patterns in order to make economic, social, technical,
and legal claims.
q Mythology: the widespread belief that large data sets
offer a higher form of intelligence and knowledge that
can generate insights that were previously impossible,
with the aura of truth, objectivity, and accuracy.
6. Big Numbers
Big data infrastructure, software, and services spend:
Ø $16.6 billion in 2014
Ø $41.5 billion in 2018 (CAGR of ~26%)
About 7x higher than the growth rate of the worldwide
information and communication technology market.
9. Context
What else is happening in parallel to the Big Data craze:
q Open Data Movement and Data Monetization
q Cloud Computing, Open Source, Software as a Service
q Increased Risk awareness (Global Financial crisis)
q Security, Data breaches
q Ubiquitous Broad-band
q Advances in Machine Learning and AI
10. Challenges to existing DM approach
q Relational database management systems and desktop
statistics and visualization packages often have difficulty
handling big data.
q IT, DG, DQ "paradigms" have difficulty coping
Ø Enterprise Data Warehouses - struggle with variety
Ø ETL based architectures "limitations" have become
more widely understood
Ø Centralized DG and DQ - struggle with velocity
and variety
11. Challenges - continued
q Existing User Tools/Approaches
Ø Desktop Solution (i.e. Excel) - struggle with volume
Ø Manual data cleanse - struggle with volume and variety
Ø High risk of data loss
q Availability of capable/experienced resources
q Technical solutions have shorter and shorter half lives
q Project Funding (Dev -> Test -> Production model)
Ø Analytics is by nature often throw-away, experiments
12. Opportunities (Technologies)
q Alternatives
Ø Relational data model (No-SQL)
Ø Embedded SQL engine (Data Processing Engines)
Ø ETL architectures (Wrangling, Streaming)
q Main-stream availability of clustering and in-memory
hardware/software solutions
q Availability of algorithms for dealing with text and other
"unstructured" data has increase dramatically
q Products & Services that provide "out of box" Machine
Learning capabilities
q Products & Services that provide "out of box" support for
combining Analytics and Operational Capabilities
18. Machine Learning
Cognitive computing; leverages machine learning and
artificial intelligence to infer and predict; offers tremendous
potential to augment human expertise.
q ML development process
Ø Goal determination (requirements, outcomes)
Ø Data analysis (discovery and wrangling)
Ø Model training
Ø Evaluation
Ø Deployment and Monitoring
19. Opportunities (Process / Approaches)
q Collaboration capabilities appearing in Analytics / MDM
q API services for data quality, data enhancement
q Crowd Sourcing services
q Data as a service
q Explosion of research, books and courseware targeting
analytics, big data architecture and solutions
20. q Analyst Driven Data Sourcing (Self Service Data Prep)
q Data Catalogs
q Transparent/repeatable sourcing and analysis
q Collaborative Governance (aka ‘Expert Sourcing’)
q Crowd Sourcing, Consensus based DQ
q DQ based machine learning (aka ‘Data Curation’)
Opportunities (DG and DQ)
23. Data Wrangling / Data Prep
“Data preparation tools have emerged as a vital method for
analysts to quickly source, blend, and wrangle data
independent of enterprise architecture’s (EA) data
management processes.” Forrester
q Features / Benefits
Ø Agility (build and validate in a single process)
Ø Repeatability / Transparency
Ø Easy to use, with many ‘advanced’ features
Ø Collaboration
Ø Discovery, Cleaning, Enrichment, Publishing, ...
24. q Massive increase in data volume
q Machine Learning - Member Retention
q Sentiment Analysis - Improving Survey analysis
q Crowd Sourcing - Initial Match Evaluation and Merge
Case Studies
25. Conclusion
q Separate Mythology from the technology and approaches
q Leverage the Hype of Big Data to make improvements
q Understand which of the 3Vs you want to focus on
q The most important aspects are still
Ø Business Goals
Ø Culture
Ø People
q Leverage
Ø Open Source
Ø Cloud Computing
Ø SAAS
26. Thank You!
For additional information:
Robert Quinn
Solution Architect
info@fyisolutions.com or call 973.331.9050