Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data analysis trend 2015 2016 v071


Published on

New Trend - Big Data Analytics as a service
The combination of ‘data analysis’ and 'big data-open source-cloud computing' opens up a new universe of opportunities at many levels and in many places.

Published in: Data & Analytics
  • Be the first to comment

Data analysis trend 2015 2016 v071

  1. 1. Page 0 * Strictly Confidential Big Data Analytics as a Service 데이터분석 시장의 진화, 트랜드 읽기 2015. 12. 10 Chunmk
  2. 2. Page 1Page 1 10초 이야기
  3. 3. Page 2Page 2 10초 이야기 Data analysis is rooted in statistics, which has a pretty long history. It is said that the beginning of statistics was marked in ancient Egypt, when Egypt was taking a periodic census for building pyramids. Throughout history, statistics has played an important role for governments all across the world, for the creation of censuses, which were used for various governmental planning activities (including, of course, taxation).
  4. 4. Page 3 Contents I. Evolution of Data Analysis II. Data Analysis System – 3 Pillar III. Big Data, Open Source, Cloud Computing, Data Analysis IV. New Era – Data Analysis, Chaos V. Data Consumer’s Needs VI. New Trend– Citizen Data Scientist/ Smart Data Discovery
  5. 5. Page 4Page 4 데이터 분석 시스템의 진화에 대한 이해 데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해 Database Management Technology Development of Business Intelligence & Analytic Platform Technologies and Packages for Statistical Processing Flat File Based Tape based storage/Batch reporting Query Modules & Report Generators Batch querying & reporting/reporting generators Niche Statistical Subroutines Social science/clinical trials/agriculture Routinization Querying & Reporting Statistical Computation Navigational DBMS Late 1970 RDBMS emerged Early DSS Tools Commercial tools for building DSS Statistical Software Pharma & Social Scince SPSS/SAS incorporated Modularization Decision Support & Modeling 1st Gen Statistical Processing Relational DBMS RDBMS solutions matured/personal databases for PC DSS & 4GL Environments 4GL/EIS/spreadsheet/des criptive analytics PC-based Statistical Packages Other industries Pc-based, graphics/Expert systems Abstraction Analytical Processing 2nd Gen Statistical Processing Distributed DBMS Distributed architecture(clustering) Data Warehouse & BI BI tool market grew rapidly/Web based analytics Early Data Mining tools Vendors & solutions Scaling & Distribution Enterprise Performance Management Data Mining 1960s 1970s 1980s 1990s 2000s Post Relational DBMS Unstructured data, non- relational data model/ large scale distributed data Data Processing & Analytic Platform Large scale data processing/unstructured,real- time analytics/ big data analytics Data Processing & analytics Platforms Open source R based statistical platforms/NLP Text analysis Specialization & Extension Next Gen Data Processing Next Gen Data Processing AI hyped ML started new ML invented * Max Kanaskar’s “BIG DATA TECHNOLOGY SERIES”에서 자료 정리
  6. 6. Page 5Page 5 데이터 분석 시스템의 진화에 대한 이해 데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해  Technologies and Packages for Statistical Processing
  7. 7. Page 6Page 6 데이터 분석 시스템의 진화에 대한 이해 데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해  Database Management Technology
  8. 8. Page 7Page 7 데이터 분석 시스템의 진화에 대한 이해 데이터 분석 시스템의 진화 과정을 세가지 관점에서 이해  Development of Business Intelligence & Analytic Platform
  9. 9. Page 8Page 8 데이터 분석 시스템 – 3개의 기둥 1. 빅데이터의 등장 – 5V로 특징 지워지는 최근의 정의 Prescriptive Predictive Decisions Recommend Findings Objectives small big few many Data Object Size Data Object Quantity VOLUME VALUE Data Sourcesfew many Contents Typesfew many Structure Typesstructured unstructured Semantic Divirsity low high VARIETY slow fast Acquisition Rate VELOCITY Update Rateslow fast Known Data Sources Provenance Data Integrity Governance VERACITY * NIST, 2014 too big (volume), arrives too fast (velocity), changes too fast (variability), contains too much noise (veracity), too diverse (variety) to be processed within a local computing structure using traditional approaches and techniques * ISO, 2014
  10. 10. Page 9Page 9 데이터 분석 시스템 – 3개의 기둥 더 이상 떠오르는 신기술이 아닌 빅데이터 2015.8 가트너의 Hype Cycle 에서 빅데이터가 빠짐 Machine Learning, Citizen Data Science*가 새로 등장 (데이터 분석과 관련한 새로운 트랜드가 빅데이터를 대체) * people on the business side that may have some data skills, possibly from a math or even social science degree Big Data 2014년에 여기에 위치
  11. 11. Page 10Page 10 데이터 분석 시스템 - 3개의 기둥 2. 클라우드 환경으로의 변화  ask previously un-askable questions is the emerging power of the cloud  Cloud computing is a transformative force addressing size, speed, and scale, with a low cost of entry and very high potential benefits.  large-scale image processing, sensor data correlation, social network analysis, encryption/decryption, data mining, simulations, and pattern recognition *출처 : Booz Allen Hamilton
  12. 12. Page 11Page 11 데이터 분석 시스템 - 3개의 기둥 Massive Data Analytics and the Cloud HDFS Commercial hardware resilienceelasticityscalability Multi tenancy Virtualization Data Cloud Utility Cloud  Computing architecture for large-scale data processing and analytics  Designed to operate at trillions of operations/day, petabytes of storage  Designed for performance, scale, and data processing  Characterized by run-time data models and simplified development models  Computing services for outsourced IT operations  Concurrent, independent, multi-tenant user population  Service offerings such as SaaS, PaaS, and IaaS  Characterized by data segmentation, hosted applications, low cost of ownership, and elasticity *출처 : Booz Allen Hamilton
  13. 13. Page 12Page 12 데이터 분석 시스템 - 3개의 기둥 Cloud based “as a Service” 의 다양한 모델 Data Analytics as a Service Database as a Service Storage as a Service Backup as a Service … Insights as a Service
  14. 14. Page 13Page 13 데이터 분석 시스템 - 3개의 기둥 오픈소스 소프트웨어는 데이터 분석 시장에 혁명적인 파괴를 가져옴 Traditional (proprietary sw)
  15. 15. Page 14Page 14 데이터 분석 시스템 - 3개의 기둥 오픈소스 소프트웨어는 데이터 분석 시장에 혁명적인 파괴를 가져옴 Big Data Analysis Platforms and Tools Hadoop, MapReduce, GridGain, HPCC, Storm Databases/Data Warehouses CouchDB, OrientDB, Terrastore, FlockDB, Hibari, Riak, Hypertable, BigData, Hive, InfoBright, Community, Edition, Infinispan, Redis, Cassandra, HBase, MongoDB, Neo4j Business Intelligence Talend, Jaspersoft, Palo BI Suite/Jedox, Pentaho, SpagoBI, KNIME, BIRT/Actuate Data Mining RapidMiner/RapidAnalytics, Mahout, Orange, Weka, jHepWork, KEEL, SPMF, Rattle, Gluster, Hadoop Distributed File System Programming Languages Pig/Pig Latin, R, ECL Big Data Search Lucene, Solr Data Aggregation and Transfer Sqoop, Flume, Chukwa Miscellaneous Big Data Tools Terracotta, Avro, Oozie, Zookeeper 분야 오픈소스 소프트웨어(50)  아파치재단의프로젝트10월현재약230여개  데이터분석및빅데이터관련오픈소스소프트웨어의종류
  16. 16. Page 15Page 15 Big Data, Open Source, Cloud Computing, Data analysis The combination of ‘data analysis’ and 'big data-open source-cloud computing' opens up a new universe of opportunities at many levels and in many places. Traditional Data Analysis Data Analysis New Era Big Data processing Slow processing Massive/fast/distributed processing Computing Power Scale Up on premise Scale Out Off Premise(Cloud) S/W proprietary s/w Open source s/w Data structured data Structured & unstructured data Graph data Analysis statistical analysis ML, data mining, Network analysis, text mining, etc. Value limited value & insight Quick & fast discover knowledge, value
  17. 17. Page 16Page 16 New Era – Data Analysis, Chaos SaaS 전문 기업, 전통적인 데이터 분석 기업, BI 기업 등 다양한 기업들의 각기둥장 전문 업체 - 단순한 분석 및 시각화에 초점 대용량의 데이터 분석 보다는 경량 데이터 분석에 치중 SaaS 형태의 서비스 2,3 곳을 제외하고 다양한 분석기법을 적용하지 않음 사용자 중심의 UI/UX MS/IBM/Amazon에 주목하여 3개의 서비스 별도 비교 10개의 SaaS 업체 조사 결과
  18. 18. Page 17Page 17 New Era – Data Analysis, Chaos  Cloud Machine Learning으로 빅데이터 분석 시장에서 새로운 경쟁이 심화 - IBM Watson Analytics, Microsoft Azure ML, Amazon ML 비교 IBM Watson Analytics • Decision Tree • Classification • Correlation Anomaly Detection 2개 /Classification 14개/Clustering 1개 /Regression 8개/Feature selection 3 개/Evaluate 3개/Score 4개/Train 4개 /Statistical function 7개/Text Analytics 4개 Binary classification (predicting one of two possible outcomes)/ Multiclass classification (predicting one of more than two outcomes/ Regression(predicting a numeric value) • couldn't handle enterprise scale data • focused more on data visualization and exploration • use natural language(plain English questions ) • automates some tasks • user-friendly, GUI • requires knowledge of the characteristics of machine learning algorithms • targeted to developers, data scientists and very advanced business users • narrower in scope • data acquisition is effortless • No infrastructure management required • Does not require data science expertise Microsoft Azure ML Amazon ML 알고리즘 특징  쉬운 사용자 환경 제공에 노력( GUI / Data Scientist 가 필요 없는)  아직은 빅데이터 처리에 미흡 주요 특징
  19. 19. Page 18Page 18 Data Consumer’s Needs 경제적인 비용으로 시스템을 확장할 수 있는 환경을 갖고 언제 어디서나 쉽게 접속하여 다양하고 방대한 데이터를 취급 하여 인사이트를 발견하고 실행할 수 있는 데이터 분석 시스템에 대한 요구 Data Consumer Group C-level Lob user Data scientist Data engineer 360 Degree Customer view understand the market find new market personalized website/offering improve service co-create & innovate reduce risk/fraud better organize company Understand competition customers product organization Data Analysis Use Case Framework accebility Easy to use Elastic sharing security scalability Cost effective C-level ; CEO,COO,CIO,CTO,CMO… LoB ; Line of Business
  20. 20. Page 19Page 19 New Trend The Rise of the Citizen Data Scientist Gartner defines a "citizen data scientist" ¹ At the end of 2007 classic, Competing on Analytics, Tom Davenport predicted the rise of “analytical amateurs,” line of business Not a trained data scientist or developerFocused on business problems Driven to pull togather the right data, now Iterative workflow - one question leads to the next creates or generates models not typically a member of an analytics Citizen Data Scientist Alexander Linden, Research Director at Gartner, predicts that through 2017, the number of “Citizen Data Scientists,” i.e. analytical amateurs¹, will grow five times faster than the number of highly skilled Data Scientists.
  21. 21. Page 20Page 20 New Trend 5-10% Analytical Professionals — Can create algorithms Analytical Semi-Professionals — Can use visual tools, create simple models Analytical Amateurs — Can use spreadsheets 15-20% 70-80% Competing on Analytics, Tom Davenport ¹At the end of 2007 classic, Competing on Analytics, Tom Davenport predicted the rise of “analytical amateurs,”
  22. 22. Page 21Page 21 New Trend Algorithm Marketplaces Are Bringing the App Economy to Analytics Source: Gartner (October 2015)
  23. 23. Page 22Page 22 New Trend Easier-to-use analytics tools : Smart data discovery “Smart data discovery is a next-generation data discovery capability that provides insights from advanced analytics to business users or citizen data scientists without requiring them to have traditional data scientist expertise.” Source: Gartner (June 2015)
  24. 24. Page 23Page 23 New Trend Current Data Discovery Analytics Workflow Emerging Smart Data Discovery Analytics Workflow Source: Gartner (June 2015) Easier-to-use analytics tools : Smart data discovery
  25. 25. Page 24Page 24 Business User New Trend Algorithms DAaaS functional elements Smart Data Discovery “ ~ make new sources of information accessible, consumable and meaningful to organizations of all sizes, even ones that don't have extensive advanced analytics skills or in-house resources.” Citizen Data Scientist 이 자료는 매월 계속 업데이트 될 예정입니다.
  26. 26. Page 25 감사합니다.