• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bp presentation business intelligence  and advanced data analytics september 25 2012 icpas, v2,  20130915
 

Bp presentation business intelligence and advanced data analytics september 25 2012 icpas, v2, 20130915

on

  • 252 views

 

Statistics

Views

Total Views
252
Views on SlideShare
251
Embed Views
1

Actions

Likes
0
Downloads
5
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Bp presentation business intelligence  and advanced data analytics september 25 2012 icpas, v2,  20130915 Bp presentation business intelligence and advanced data analytics september 25 2012 icpas, v2, 20130915 Presentation Transcript

    • BUSINESS INTELLIGENCE & ADVANCED ANALYTICS The Search for Patterns, Waldo, and Black Swans Barrett Peterson, C.P.A. ICPAS Chicago Metro Chapter, September 25, 2013 ICPAS Metro Chapter Barrett Peterson September 25, 2013 1
    • WHY BUSINESS INTELLIGENCE? Information Good Data Good Analysis ICPAS Metro Chapter Barrett Peterson September 25, 2013 2
    • BIG DATA AND ANALYTICS - WHY PREDICTION and PATTERN IDENTIFICATION ICPAS Metro Chapter Barrett Peterson September 25, 2013 3
    • • Digitization Datafication • Correlation, more that causality • Reduced emphasis on sampling • “Messy” data usable for many applications, but not all BIG DATA AND ANALYTICS – CRITICAL ATTRIBUTES ICPAS Metro Chapter Barrett Peterson September 25, 2013 4
    • • Reduced privacy and handling “private” data • Over reliance on, and over confidence in. data and analysis • Currency – correlations can change over time • Predictions are hard to make, especially about the future. - Niels Bohr [Not Yogi Berra]. BIG DATA AND ANALYTICS - RISKS ICPAS Metro Chapter Barrett Peterson September 25, 2013 5
    • HISTORY AND BACKGROUND ICPAS Metro Chapter Barrett Peterson September 25, 2013 6
    • • Computer based business intelligence systems is an idea that is middle aged – about 40 . Previously described as: – Decision support systems [DSS] – Executive information systems [EIS] – Management information systems [MIS] A LITTLE BACKGROUND HISTORY A trip down memory lane ICPAS Metro Chapter Barrett Peterson September 25, 2013 7
    • • Internet Development – ARAPNET and others – 1960s – Internet Protocols – 1982, presumably by Al Gore • IBM researcher Edgar Codd credited with development of relational data base theory in 1970. • IBM’s Donald Chamberlin and Raymond Boyce develop structured query language [SQL] in the early 1970s to manipulate and retrieve data from IBM’s early relational data base management system • World Wide Web and 1st web browser invented by Tim Berners-Lee in 1990 by combining the internet, hypertext mark-up language, and Uniform Resource Locator [URL] system. Became Nexus. • Mosaic, designed by Marc Andressen became the first commercial web browser [Netscape]. • Development of big data enabling database designs and high speed processing during the last 15 years. A LITTLE BACKGROUND History Important Technology Inventions ICPAS Metro Chapter Barrett Peterson September 25, 2013 8
    • • Development of the primary infrastructure – Database design – Processing and Storage Hardware – Server Development and Massively Parallel Processing • Improved telecommunications speed • Hardware miniaturization, capacity, and speed – Memory [RAM] capacity – Storage capacity and transfer speed – Bus speed – Video processing capacity and speed • Increased hardware speed and capacity • Digital formats for sensors, cameras, RFID, and other data collection sources • Mobile computing • “Cloud” capability exploits many of these developments A LITTLE BACKGROUND History Drivers Enabling BI and Advanced Analytics ICPAS Metro Chapter Barrett Peterson September 25, 2013 9
    • • Analytics • Business Intelligence • Knowledge Management • Content Management • Data Mining • Big Data • Data Integration • Datafication • Gameification • Blob [Binary Large Object] A LITTLE BACKGROUND TERMINOLOGY A consultant’s collection of confusing names - a sampler ICPAS Metro Chapter Barrett Peterson September 25, 2013 10
    • • CPU speed and power – Moore’s law – Multi-core chips – Solid State Memory • Storage improvement and cost reduction – Greatly increased capacity – petabytes and more; IBM’s first hard drive in 1958 was 3.75MB – Greatly increased access/transfer speed – Greatly reduced cost • Data collection from a wide range of devices • Data communications – speed and volume • Database management techniques and software • Application speed and power A LITTLE BACKGROUND Drivers And Enablers of Big Data ICPAS Metro Chapter Barrett Peterson September 25, 2013 11
    • BUSINESS INTELLIGENCE AND ADVANCED ANALYTICS DEFINED ICPAS Metro Chapter Barrett Peterson September 25, 2013 12
    • A system comprised of “computer” hardware, storage hardware, operating system, database software, file systems, and application software to: • Collect, “clean”, filter, “tag”, and integrate data • Store data [hardware and software] • Provide knowledge management, analytical , and presentation tools to translate data into decision useful information TONIGHT’S CRITICAL DEFINITIONS Business Intelligence ICPAS Metro Chapter Barrett Peterson September 25, 2013 13
    • • Prehistoric – Mainframe Era – DSS, EIS, MIS – Hierarchical Master Data Files • The Current Era [Primarily] – Business Intelligence – Primarily “structured” data [data that can be represented in relational /dimensional tables or flat files], and BLOB [binary large object] formats – Analysis of “known”, defined ,patterns – Presented in tables, simple charts, and dashboards • Emerging – Big Data and Advanced Analytics – to discover new, changing, or variable patterns – A wide variety of “unstructured” digital data formats added to “structured” data – Emerging storage structures – “Exploratory” analytics – Zoomable User Interface [ZUIs] – Solid State Memory and Solid-State Drives TONIGHT’S CRITICAL DEFINITIONS Business Intelligence Generations ICPAS Metro Chapter Barrett Peterson September 25, 2013 14
    • THE HARDWARE AND SOFTWARE ELEMENTS OF BUSINESS INTELLIGENCE ICPAS Metro Chapter Barrett Peterson September 25, 2013 15
    • • Computer – CPU, Memory, and Operating System Software • Data Collection – Master Data Management – Collection Processes and Devices – Data Cleansing Processes and Software • Data Storage – Petabyte capable – Physical Devices and Storage Management Software – Data Management and Integration – Database Software Storage • Relational – Traditional ERP/Transaction systems • Dimensional – Traditional Data Warehouse, including associated BLOB • Distributed , Multiple Server, Storage Systems • NoSQL [Not Only SQL] Distributed Operational Stores • Apache Hadoop for Highly Parallel Processing and certain Intensive Data Analytics Applications • DBMS System: Apache Cassandra; Amazon Dynamo • Middleware Software • High Speed Data Communications – Petaflop capable • Business Intelligence Application Software – OLAP, Dashboard, and Chart Reports – Statistical Analysis and Presentation Tools BUSINESS INTELLIGENCE ELEMENTS Principal Components for Maximum Application ICPAS Metro Chapter Barrett Peterson September 25, 2013 16
    • • Data Governance and Management – Uniform terminology – Uniform meaning – Uniform units of measure – Metadata • Data Structure and Attributes – Structured - Relational/Dimensional – Unstructured – Rate of change, context, and other attributes • Data Collection and Preparation – Filtering, particularly “Big Data”, and “tagging” – Extract, Transform, Load [ETL] for “structured data • Data Base File Systems • Data Storage and Retrieval – Capacity – Access/Retrieval speed BUSINESS INTELLIGENCE ELEMENTS DATA ISSUES: THE CORNERESTONE ICPAS Metro Chapter Barrett Peterson September 25, 2013 17
    • • Metadata management – Business definitions , rules, sources – Technical attributes, such as type, scale, transformation methods – Processing requirements – filtering, tagging, ETL, aggregation, summarization • Data Definitions and data dictionaries – Name – Unit(s) of measure • Data collection and filtering or transforming requirements – Sources – internal and external – Context addition/filtering requirements • Data integration specifications – Multiple platforms and applications – Mapping to intermediate data marts • Privacy requirements – Personal Identifying data – Laws: HIPPA, Privacy act BUSINESS INTELLIGENCE ELEMENTS MASTER DATA GOVERNANCE AND MANAGEMENT ICPAS Metro Chapter Barrett Peterson September 25, 2013 18
    • • Data Structures – “Structured” Data , principally text and numbers capable of incorporation in relational or dimensional tables – “Unstructured” Data, not suitable for relational tables, many in newer data formats, including images • Big Data Attributes – Both “structured” and “unstructured” – The four major “Vs” of big data • Volume - huge • Velocity – fast changing, unlike structured • Variety – format and content • Variability – lacks the consistency, and perhaps precision, of structured data BUSINESS INTELLIGENCE ELEMENTS Data Structures and Attributes Are Critical Drivers ICPAS Metro Chapter Barrett Peterson September 25, 2013 19
    • • Content Structure – Traditional Financial Data – Numerical – Sign/Debit or Credit – Text Descriptions • Database Management Structures – Legacy Systems: Hierarchical and Network – Transaction Systems: Relational • Relations [Tables]. Attribute [columns], Instance [Rows] • Rules: no duplicate rows; single value for attributes – Warehouse Systems: Dimensional • Facts [data items, usually a dollar amount or unit count] • Measures – dollar or count for facts • Dimensions – groups of hierarchies and descriptors of various aspects or context for the facts/measures – Big Data Databases Unstructured • Microsoft Office and Similar File Formats • Photography and Art BUSINESS INTELLIGENCE ELEMENTS Data Structures IT Lingo ICPAS Metro Chapter Barrett Peterson September 25, 2013 20
    • RELATIONAL TABLE ILLUSTRATION “Tuple” is borrowed from mathematics and set theory and is used in database design to refer to the attributes of an “item” or “value” [row], the subject or title of the table. Value examples include customers, vendors, orders, product SKUs Business Intelligence Elements ICPAS Metro Chapter Barrett Peterson September 25, 2013 21
    • BUSINESS INTELLIGENCE ELEMENTS MATH CAN BE COMPLICATED ICPAS Metro Chapter Barrett Peterson September 25, 2013 22
    • • Numbers and words/letters – Relational/Dimensional – Spreadsheets – Word Processing documents • Sound and Music • Photo • Video • Video Game • CAD Design • Graphical – PDF – Raster, Vector Graphics – Statistical Visualization • Scientific • Signal • XML [Web based mark-up formats] • Geo-Location • Web Logs BUSINESS INTELLIGENCE ELEMENTS DATA FILE TYPE CATEGORIES, ALMOST ENGLISH ICPAS Metro Chapter Barrett Peterson September 25, 2013 23
    • • Collection – Company transaction/ERP systems – Purchased, such as Nielsen, IRI – Vendor supplied, such as bank transactions – Sensor readings – Cameras – Mobile device traffic – Phones, Tablets • Filtering – Adding context such as date or location – Eliminating “chatter” from high volume data – Error correction • Aggregation & Integration BUSINESS INTELLIGENCE ELEMENTS DATA COLLECTION AND PREPARATION ICPAS Metro Chapter Barrett Peterson September 25, 2013 24
    • DATA COLLECTION - RFID RFID tag RFID tag reader ICPAS Metro Chapter Barrett Peterson September 25, 2013 25
    • DATA COLLECTION Various sensors Surveillance Camera ICPAS Metro Chapter Barrett Peterson September 25, 2013 26
    • DATA FILTERING AND CLEANSING IS IMPORTANT ICPAS Metro Chapter Barrett Peterson September 25, 2013 27
    • • Relational – SQL • Dimensional – SQL, OLAP • Binary Large Object [BLOB] – binary data, most often photos, video, audio, or PDF files • Massively Parallel-Processing [MPP] • Apache Hadoopp Distributed File System [HDFS] – Java – Google File System [GFS], used solely by Google – Google Map Reduce • Amazon S3 filesystem [used by Amazon] • NoSQL, MySQL • Storm • Resource Description Framework [RDF] Databases, like Big Data BUSINESS INTELLIGENCE ELEMENTS DATA BASE FILE SYSTEMS ICPAS Metro Chapter Barrett Peterson September 25, 2013 28
    • BUSINESS INTELLIGENCE ELEMENTS SELECT BIG DATA DATABASE MANAGEMENT SYSTEMS • Significant Originators – Google MapReduce – Google File System [GFS] – Amazon S3 filesystem • Continuing Developments – Apache Software Foundation • Apache Cassandra distributed database management system • Apache Hadoop software framework to support data- intensive distributed applications • Apache Hive, a data warehouse structure built on Hadoop • Pig - high level programming language for creating MapReduce programs with Hadoop – Significant to Technology Development • Facebook [uses MySQL as a DBMS system, with Memcache] • Yahoo • LinkedIn [Project Voldemort] ICPAS Metro Chapter Barrett Peterson September 25, 2013 29
    • • Convergence aspect of mainframes and servers • Massively parallel , multiple server, distributed processing, in multiple data centers – grid computing • Multi-core , high capacity, lower power consumption, CPUs • Memory servers for RAM employing DRAM comprised of Fully Buffered Direct Inline Memory Modules [FBDIMM] • Solid state flash drive storage • Greatly improved., and less costly, hard drive storage BUSINESS INTELLIGENCE ELEMENTS COMPUTER HARDWARE CONSIDERATIONS ICPAS Metro Chapter Barrett Peterson September 25, 2013 30
    • BI CONFIGURATION SIZES Small – BI, but not Big Data capable Medium Large – IBM Sequoia At Livermore Labs ICPAS Metro Chapter Barrett Peterson September 25, 2013 31
    • • Data Storage Terminology – Memory – CPU direct connected, often called RAM – Storage – not directly connected to the CPU • Data Storage Device Types – Memory • DRAM – based • Flash memory – based Solid-State Drives [SSDs] – Storage • Hard Disk Drives [HDD] • Optical Drives – CDs, DVDs • Data Storage Systems – Direct Attached – Network Attached Storage [NAS] – Storage Area Network [SAN] – pNFS – Parallel Network file systems BUSINESS INTELLIGENCE ELEMENTS DATA STORAGE HARDWARE/ SOFTWARE ICPAS Metro Chapter Barrett Peterson September 25, 2013 32
    • • Traditional Reporting Systems – ERP systems, including extract and presentation tools – Downloads to Excel and similar programs for analysis using functions and pivot tables • Presentation Tools • Specialized Analytics – IBM InfoSphere BigInsights and InfoSphere Streams – IBM Netezza – ParAccel Analytic Database – EMC Greenplum – SAS High Performance Computing – Information Builders WebFocus • Exploratory Tools, like IBM SPSS [originally Statistical Package for the Social Sciences] – Data mining with specialized algorithms – Statistical analysis and related charting software BUSINESS INTELLIGENCE ELEMENTS BI APPLICATION SOFTWARE ICPAS Metro Chapter Barrett Peterson September 25, 2013 33
    • • BI Reporting • Predictive Analytics • Data Exploration - correlation • Data Visualization - graphical • Instrumentation Analytics • Content Analytics • Web Analytics • Functional Applications • Industry Applications • Location Tracking BUSINESS INTELLIGENCE ELEMENTS ADVANCED ANALYTICS APPLICATION TYPES ICPAS Metro Chapter Barrett Peterson September 25, 2013 34
    • BUSINESS INTELLIGENCE ELEMENTS USE STATISTICAL TECHNIQUES APPROPRIATELY ICPAS Metro Chapter Barrett Peterson September 25, 2013 35
    • ALGORITHMS CAN BE TREACHEROUS DATA MODELS HAVE LIMITS ICPAS Metro Chapter Barrett Peterson September 25, 2013 36
    • BI AND ADVANCE ANALYTICS OUTPUT ILLUSTRATIONS ICPAS Metro Chapter Barrett Peterson September 25, 2013 37
    • EXAMPLES OF USES ICPAS Metro Chapter Barrett Peterson September 25, 2013 38
    • • Sales and Operations Planning • Financial Instruments Modeling • Production Control • Online Retail • Economics and Policy Development • Agriculture/Farming • Weather Analysis/Prediction • Environmental Impact Assessment • Healthcare Diagnosis and Records Management • Genomic Analytics and Pharmaceutical and Medical Research • Natural Resource Exploration • Research Physics • Road, Rail Traffic Management • Security Surveillance: Business, Government • Astronomy • Logistics Management, Including GPS Tracking • Electrical and Telecommunications Grids Mgmt • Social Media – Facebook, LinkedIn, Google+, Twitter, YouTube, Pinterest • TV shows – Star Trek, Person of Interest • Cloud Services – computing, Storage • Credit Scoring SELECTED EXAMPLES OF USES ICPAS Metro Chapter Barrett Peterson September 25, 2013 39
    • • Retail – Amazon – Dell – Delta Sonic Car Washes • Data Services – IBM – Google – Amazon • Financial Services • Manufacturing – McCain Foods – Frozen foods – Boeing • Transportation and Logistics – Logistics – UPS, FedEx – Rail – UP, CSX, TTX – Air – United, AMR, Southwest • Social Media – LinkedIn – Facebook • Government – NSA PRISM and Other tools – CIA – Palantir Software • Medicine and Health – Center for Disease Control (CDC) – J. Craig Venter Institute • Science – Livermore Labs SELECTED USERS ICPAS Metro Chapter Barrett Peterson September 25, 2013 40
    • • Technical Elements – Direct on-line access – Amazon specialized “Big Data” database – Distributed and extremely large data centers – Highly automated, high technology warehouses – High supplier [vendors] integration • User Benefits – Favorable prices – Suggested associated purchases – Individual interest advertising SELECTED EXAMPLES OF USE AMAZON ICPAS Metro Chapter Barrett Peterson September 25, 2013 41
    • • Technical Elements – Web driven order entry and custom purchase configuration – Tracking of sales correspondence with promotional offers – Supplier re-order integration • User Benefits – Ability to customize purchase – Reasonable cost – Prompt delivery SELECTED EXAMPLES OF USE DELL ICPAS Metro Chapter Barrett Peterson September 25, 2013 42
    • • Technical components – Shared component and assembly designs – More detailed quality specifications and product tolerances – Control of assembly schedule – “Real time” exchange of technical information – Dissemination of best practices • Customer benefits – Faster deliveries – Increased product quality – Reduced defects SELECTED EXAMPLES OF USE BOEING ICPAS Metro Chapter Barrett Peterson September 25, 2013 43
    • • Techniques employed – Collect cellphone and GPS signals, traffic cameras, and roadside sensors – Identify accidents, traffic jams, and road damage – Emergency vehicles can be dispatched – Update traffic websites – Sends messages to drivers’ GPS devices and cellphones – Uses supercomputers running Intrix application • Benefits – Eliminates traffic congestion faster – More timely relief for accident victims – Facilitate road paving scheduling SELECTED EXAMPLES OF USE NEW JERSEY DEPARTMENT OF TRANSPORTATION ICPAS Metro Chapter Barrett Peterson September 25, 2013 44
    • • Technical Elements – General LinkedIn Structure • Personal Profile • Individual Connections • Groups • Company and Other Searches • Endorsements • Attached application partners – Slideshare, Owned by LinkedIn • User Benefits – Networking with professional contacts – Personal branding capabilities – Business Development – Job Search enhancement SELECTED EXAMPLES OF USE LINKEDIN ICPAS Metro Chapter Barrett Peterson September 25, 2013 45
    • LINKEDIN PROFILE PAGE SAMPLE ICPAS Metro Chapter Barrett Peterson September 25, 2013 46
    • Facebook Page Sample ICPAS Metro Chapter Barrett Peterson September 25, 2013 47
    • TRENDS • More, bigger, faster – big data gets bigger • Cloud services continue to expand • Mobile computing expands • Hadoop becomes more common • Interactive data visualization will expand • Social media type platforms will increase their prominence • Analytics skills demands will increase • Privacy Issues will become prominent ICPAS Metro Chapter Barrett Peterson September 25, 2013 48
    • RESOURCES • Books • Competing on Analytics, Davenport & Harris • Analytics at Work, Davenport, Harris, & Morison • The Data Asset, Fisher • Data Strategy, Adelman, Moss, Abai • Big Data, Cukier, Mayer-Schonberger • Websites • The Data Warehouse Institute – tdwi.org • IBM data analytics: www.ibm.com, smarter planet ICPAS Metro Chapter Barrett Peterson September 25, 2013 49
    • SUMMARY WHY USE BI AND ADVANCED ANALYTICS INSIGHT FROM DATA ICPAS Metro Chapter Barrett Peterson September 25, 2013 50