Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ADV Slides: Data Curation for Artificial Intelligence Strategies

179 views

Published on

This webinar will focus on the promise AI holds for organizations in every industry and every size, and how to overcome some of the challenge today of how to prepare for AI in the organization and how to plan AI applications.

The foundation for AI is data. You must have enough data to analyze and build models. Your data determines the depth of AI you can achieve — for example, statistical modeling, machine learning, or deep learning — and its accuracy. The increased availability of data is the single biggest contributor to the uptake in AI where it is thriving. Indeed, data’s highest use in the organization soon will be training algorithms. AI is providing a powerful foundation for impending competitive advantage and business disruption.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

ADV Slides: Data Curation for Artificial Intelligence Strategies

  1. 1. Data Curation for Artificial Intelligence Strategies Presented by: William McKnight President, McKnight Consulting Group @williammcknight www.mcknightcg.com (214) 514-1444
  2. 2. Enhance in-car navigation using computer vision Reduce cost of handling misplaced items improve call center experiences with chatbots Improve financial fraud detection and reduce costly false positives Automate paper-based, human-intensive process and reduce Document Verification Predict flight delays based on maintenance records and past flights, in order reduce cost associated with delays AI in Action
  3. 3. What’s New is Deep Learning • AI: 1950s • Machine Learning: 2000s – supervised learning, unsupervised learning, reinforcement learning • Deep Learning: 2010s – Higher Predictive Accuracy – Can Analyze All Data Sets Deep Learning allows more complex problems to be tackled, and others to be solved with higher accuracy, with less cumbersome manual fine-tuning
  4. 4. AI Affects the Entire Organization • Strategic • Technical • Operational • Talent • Data 4
  5. 5. Where to Look for AI Opportunities • The products you make and the services you offer • The supply chain for those products and services • Business operations (hiring, procurement, after- sale service, etc.) • The intelligence used in determining and designing your product and service set • The intelligence used in the marketing/approval funnel for your products and services 5
  6. 6. AI is on the Data Maturity Spectrum Maturity Level 4 (of 5): Data Strategy Data as asset in financial statements / executives; All development is within architecture; All in on AI Architecture EDW with DQ above standard; 3 & 5 year architecture plans Technology DI=streaming; Graph db for relationship data; Specialized analytic stores for workloads with requirements not suited for the EDW; EDW columnar; No ODS; minimal cubes; MDM – all functions for all major subject areas; Looking at GPU DBMS Organization Data Governance by subject area across all major subject areas; Organizational Change Management program is part of all projects; True Self- Service Business Intelligence; Chief Information Architect
  7. 7. AI Data • Governance and Quality • Curated, Most/All Data • At Scale, History • High Velocity • Integrated • Training Data Curation 7
  8. 8. Data to Collect • This is wide ranging, spanning all current data • eCommerce • ERP / CRM • Iot (e.g., Heavy Industry, Factory, Consumer, Health, Aircraft) – Equipment performance – Forecast breakdowns – Health risk • Publicly available (e.g., governmental) • Third party • Careful of overfitting 8
  9. 9. AI Data • Call center recordings and chat logs – content and data relationships as well as answers to questions • Streaming sensor data, historical maintenance records and search logs – use cases and user problems • Customer account data and purchase history – similarities in buyers and predict responses to offers • Email response metrics – processed with text content of offers to surface buyer segments. • Product catalogs and data sheets – sources of attributes and attribute values. • Public references – procedures, tool lists, and product associations. • YouTube video content audio tracks – converted to text and mined for product associations. • User website behaviors – correlated with offers and dynamic content. • Sentiment analysis, user-generated content, social graph data, and other external data sources – mined and recombined to yield knowledge and user-intent signals. 9
  10. 10. Example: Data for Predictive Maintenance 10 • Structured Data – Time Series – Events – Graph • Unstructured Data – Text – Image – Sound
  11. 11. Where to put data for Machine Learning • Cloud Storage • DBMS • HDFS – optimized for sequential read/writes • Unstructured Data Stores • Text-based serializations (CSV, JSON) – for interoperability 11
  12. 12. AI Pattern 1. Hire/Grow Data Science 2. Uncouple AI from Organizational Constraints – While Conforming the Organization 3. Ideation 4. Compile Data! – Internal and External 5. Label Data 6. Build Model 7. Prototype 8. Iterate 9. Productionalize 10. Scale 12
  13. 13. Algorithm & Data Matching • Naive Bayes Classification • Ordinary Least Squares Regression • Logistic Regression Try Multiple; Run Contests 13
  14. 14. AI Business Use Case Examples • Marketing – segmentation analysis, campaign effectiveness • Cybersecurity – proactive data collection and analysis of threats • Smart Cities – track vehicle movements, traffic data, environmental factors to optimize traffic lights, ensure smooth flow and manage tolling • Retail, Manufacturing – Supply flow, Customer flow • Oil and Gas - determine drilling patterns, ensure maximum utilization of assets, manage operational expenses, ensure safety, predictive maintenance • Life Sciences – study human genome (100s MB/person) for improving health • Customer • Employee • Partner • Patient • Supplier • Product • Bill of Materials • Assets • Equipment • Media • Agencies • Branches • Facilities • Franchises • Stores • Account • Certifications • Contracts • Financials • Policies Enterprise Data Domains
  15. 15. https://www.theguardian.com/sustainable-business/2017/feb/21/urban-heat-islands-cooling-things-down-with-trees-green-roads-and-fewer-cars Temperature Management
  16. 16. https://arxiv.org/pdf/1712.01432.pdf Large-scale Video Management
  17. 17. Satellite or Aerial Data https://medium.com/the-downlinq/car-localization-and-counting-with-overhead-imagery-an-interactive-exploration-9d5a029a596b
  18. 18. Corporate Requirements > Data • The split of the necessary AI/ML between the 'edge' of corporate users and the software itself is still to be determined • Math – floating point arithmetic, deep statistics, and linear algebra • GPUs • Python – easy to program and it good enough – NumPy and pandas libraries are available • TensorFlow – adds a computational/symbolic graph to Python • R and MATLAB – optimized for math with features such as direct slice and dice of matrices and rich libraries to draw from • Java and Scala – work well with Hadoop and Spark respectively 18
  19. 19. Data Curation for Artificial Intelligence Strategies Presented by: William McKnight President, McKnight Consulting Group @williammcknight www.mcknightcg.com (214) 514-1444

×