SlideShare a Scribd company logo
Big Data Analytics:
      Beyond Beer and Diapers
      2012/2/22
      Kai Zhao @Teradata
      kingaim@gmail.com



  by Kai Zhao 2011.12
Disclaimer:
Any views or opinions presented in this article are solely those of the author and do NOT necessarily represent those of Teradata or other companies .
Content

Background:
    Traditional Business Intelligent(BI)
    What is Big Data
    What is Big Data Analytics
    Big Data Analytics: State of the Art
Big Data Analytics Technology Stack
    ETL/ELT/ETLT(Demo)
    MPP Data Warehouse
    Map Reduce
    NoSQL
    Web Service
    Data Analytics
    Data Visualization
    BI Tools(Demo)
Big Data Analytics Platform Architecture
云计算风起云涌,商业智能方兴未艾,大数据分析势在必行。
Cloud Computing storming, BI revolution and It is time for BIG DATA.

Shared-nothing Massively Parallel Processing(MPP)
Petabyte Scaling
In-database Analytics
Traditional Business Intelligent(BI)
What is Big Data
Volume: The increase in data volumes within
enterprise systems is caused by transaction volumes
and other traditional data types, as well as by new
types of data. Too much volume is a storage issue,
but too much data is also a massive analysis issue.

Variety: IT leaders have always had an issue
translating large volumes of transactional
information into decisions — now there are more
types of information to analyze — mainly coming
from social media and mobile (context-aware).
Variety includes tabular data (databases),
hierarchical data, documents, e-mail, metering data,
video, still images, audio, stock ticker data, financial
transactions and more.

Velocity: This involves streams of data, structured
record creation, and availability for access and
delivery. Velocity means both how fast data is being
produced and how fast the data must be processed
to meet demand.
What is Big Data (cont.)
Broadly speaking, Big Data is generated by a number of sources, including:
Social Networking and Media: There are currently over 700 million Facebook users, 250 million Twitter users and 156
million public blogs. Each Facebook update, Tweet, blog post and comment creates multiple new data points, both
structured, semi-structured and unstructured, sometimes called Data Exhaust.
Mobile Devices: There are over 5 billion mobile phones in use worldwide. Each call, text and instant message is
logged as data. Mobile devices, particularly smart phones and tablets, also make it easier to use social media and use
other data-generating applications. Mobile devices also collect and transmit location data.
Internet Transactions: Billions of online purchases, stock trades and other transactions happen every day, including
countless automated transactions. Each creates a number of data points collected by retailers, banks, credit cards,
credit agencies and others.
Networked Devices and Sensors: Electronic devices of all sorts – including servers and other IT hardware, smart
energy meters and temperature sensors -- all create semi-structured log data that record every action.
What is Big Data Analytics

See Video
    Big Data
    Visualization
Big Data Analytics: State of the Art

Acquisitions and Investments
Big Data Vendors and Their Productions
Forrester Report
Gartner Report
Acquisitions and Investments

 Acquirer   Acquiree(Est. date) Date of Acq. Deal                              Summary
 Teradata   AsterData - 2005      2011.3.3      $0.263 billion                 Traditional Data
 HP         Vertica – 2005        2011.2.14     $1.2 billion                   Warehouse Vendors
                                                                               needs Big Data
 IBM        Netezza – 2000        2010.11.11    $1.7 billion                   Analytics technology.
 EMC        Greenplum – 2003      2010.7.6      $0.1~0.15 billion
 SAP        Sybase                2010.5.12     $0.58 billion


Investee             Investment
Cloudera                     $76 million
MapR                         $29 million
Hortonworks                  $50 million
Datameer                     $10 million
Summary              New Big Data
                     Analytics Startups

                                               Source: http://www.leiphone.com/why-2012-the-year-of-hadoop.html
Big Data Vendors and Their Productions




                        Source: http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond
Forrester Report
Hype Cycle




             Source: Gartner
Gartner Report: Hype Cycle 2011




                                  Source: Gartner
Big Data Analytics Technology Stack




Data Import

 Data Storage

   Data Computing

     Data Analytics

       XXX as a Service
ETL/ELT/ETLT


Extract – The process by which data is extracted from the data source
Transform – The transformation of the source data into a format relevant to the solution
Load – The loading of data into the warehouse



This approach to data warehouse development is the traditional and widely accepted approach.
The following diagram illustrates each of the individual stages in the process.
ETL


This approach to data warehouse development is the traditional and widely accepted approach.
The following diagram illustrates each of the individual stages in the process.




                                  Source: Robert J Davenport ETL vs ELT A Subjective View
ETL
Strengths
      Development Time
      Designing from the output backwards ensures that only data relevant to the solution is extracted and processed,
      potentially reducing development, extract, and processing overhead; and therefore time.
      Targeted data
      Due to the targeted nature of the load process, the warehouse contains only data relevant to the presentation.
      Administration Overhead
      Reduced warehouse content simplifies the security regime implemented and hence the administration overhead.
      Tools Availability
      The prolific number of tools available that implement ETL provides flexibility of approach and the opportunity to
      identify a most appropriate tool. The proliferation of tools has lead to a competitive functionality war, which
      often results in loss of maintainability.
Weaknesses
      Flexibility
      Targeting only relevant data for output means that any future requirements, that may need data that was not
      included in the original design, will need to be added to the ETL routines. Due to nature of tight dependency
      between the routines developed, this often leads to a need for fundamental re-design and development. As a
      result this increases the time and costs involved.
      Hardware
      Most third party tools utilize their own engine to implement the ETL process. Regardless of the size of the
      solution this can necessitate the investment in additional hardware to implement the tool’s ETL engine.
      Skills Investment
      The use of third party tools to implement ETL processes compels the learning of new scripting languages.
      Learning Curve
      Implementing a third party tool that uses foreign processes and languages results in the learning curve that is
      implicit in all technologies new to an organization and can often lead to following blind alleys in their use due to
      lack of experience.
ELT


Whilst this approach to the implementation of a warehouse appears on the surface to be
similar to ETL, it differs in a number of significant ways.
The following diagram illustrates the process.
ELT
Strengths
Project Management
Being able to split the warehouse process into specific and isolated tasks, enables a project to be designed on a smaller
task basis, therefore the project can be broken down into manageable chunks.
Flexible & Future Proof
In general, in an ELT implementation all data from the sources are loaded into the warehouse as part of the extract and
load process. This, combined with the isolation of the transformation process, means that future requirements can easily
be incorporated into the warehouse structure.
Risk minimization
Removing the close interdependencies between each stage of the warehouse build process enables the development
process to be isolated, and the individual process design can thus also be isolated. This provides an excellent platform for
change, maintenance and management.
Utilize Existing Hardware
In implementing ELT as a warehouse build process, the inherent tools provided with the database engine can be used.
Alternatively, the vast majority of the third party ELT tools available employ the use of the database engine’s capability
and hence the ELT process is run on the same hardware as the database engine underpinning the data warehouse, using
the existing hardware deployed.
Utilize Existing Skill sets
By using the functionality provided by the database engine, the existing investments in database skills are re-used to
develop the warehouse.
Weaknesses
Against the Norm
ELT is an emergent approach to data warehouse design and development. Whilst it has proven itself many times over
through its abundant use in implementations throughout the world, it does require a change in mentality and design
approach against traditional methods. To get the best from an ELT approach requires an open mind.
Tools Availability
Being an emergent technology approach, ELT suffers from a limited availability of tools.
ETL Demo - Kettle


  Demo of Pentaho Kettle.
Map Reduce: Hadoop




                                                Comparing with MPP Data Warehouse.


            Source: http://www.capgemini.com/technology-blog/2012/01/what-is-hadoop/
Map Reduce: Hadoop


                                                Professional
                                                  Service

                                                                    Enterprise-
                          Database OLTP                                grade
                                                                    Distribution




                                                                                       Hadoop
           Subscription                                                             replacements:
             Service                                                                  Teradata
                                                                                   Aster/MongoDB
                                                Hadoop




                 Cluster                                                   Data Integration
               Management                                                   with Hadoop




                                          EDW                  BI
MPP Data Warehouse



    Comparing MPP Data Warehouse with Hadoop stack.
    Draw a picture.
NoSQL
NoSQL/SQL/NewSQL
    Non-Relational                                                Relational

                            Analytics(OLAP)
                                                                SQL MPP
                                                               Teradata IBM Netezza EMC Greenplum HP Vertica
                              Hadoop                           Teradata Aster VectorWise


       Operational(OLTP)                                      Oracle IBM DB2 SQL Server




        NoSQL

       KeyValue            Graph              Cloud Service
   MongoDB
                           Neo4j
                                               Amazon         Amazon RDS SQL Azure
         BDB                                   SimpleDB
         Voldemort
         Toyko Cabinet
                            Document           Columnar

                           CouchDB            HBase             MySQL PostgreSQL Ingres Sybase EnterpriseDB
                                              Cassandra




         Redis             MongoDB                                                              Data Grid/Cache

                                              Memcached
Web Service



   There are a lot of Web Services.
Data Analytics



    A lot of…..
Data Visualization: It is VERY IMPORTANT to Attract User




                             Source:打破陈规-数据及信息的可视化 向怡宁
Data Visualization: It is VERY IMPORTANT to Compete




                            Source:打破陈规-数据及信息的可视化 向怡宁
Data Visualization: It is VERY IMPORTANT to User Experience




                            Source:打破陈规-数据及信息的可视化 向怡宁
BI Tools

BI Tools fall into three categories:
Query Tools
     A query tool is software setup for users to ask questions about the data. The user can
     search for patterns or details.
Multidimensional Analysis Tools
     A multidimensional analysis tool, also called Online Analytical Processing (OLAP),
      is software that allows the user to view the same data from different aspects.
     Eg: Business Objects, Hyperio, Cognos, MicroStrategy, Pentaho, Microsoft Analysis Services
     and Palo OLAP Server etc.
Data Mining Tools
     A data mining tool is software that is automated to search data, seeking out ways that
     the data correlates to other data.
     Eg: SPSS Clementine, Weka3, R and Apache Mahout etc.
BI Tools List




                Source: BI Tool Survey 2012 http://www.businessintelligencetoolbox.com/list-of-business-intelligence-bi-tools/
BI Tools: Gartner Evaluation




                               Business intelligence (BI) platforms
                               enable all types of users – from IT staff to
                               consultants to business users – to build
                               applications that help organizations learn
                               about and understand their business
BI Demo – JasperSoft iReport



   Demo Session: JasperSoft iReport
Big Data Analytics Platform Architecture
Any Questions?

More Related Content

What's hot

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
Nagaraj Yerram
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
Capgemini
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made Easy
TIBCO Spotfire
 
Etl elt simplified
Etl elt simplifiedEtl elt simplified
Etl elt simplified
Ramchandra Koty
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
jdijcks
 
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...
DataWorks Summit
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
Jeffrey T. Pollock
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
Seeling Cheung
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
pcherukumalla
 
VenkatSubbaReddy_Resume
VenkatSubbaReddy_ResumeVenkatSubbaReddy_Resume
VenkatSubbaReddy_Resume
Venkata SubbaReddy
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom Industry
DataWorks Summit
 
Pervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityPervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricity
Cloudera, Inc.
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
James Serra
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
Vivian S. Zhang
 
Third Nature - Open Source Data Warehousing
Third Nature - Open Source Data WarehousingThird Nature - Open Source Data Warehousing
Third Nature - Open Source Data Warehousing
mark madsen
 
Hu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On World
Hitachi Vantara
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Cloudera, Inc.
 
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems
 

What's hot (20)

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Traditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overviewTraditional Data-warehousing / BI overview
Traditional Data-warehousing / BI overview
 
The technology of the business data lake
The technology of the business data lakeThe technology of the business data lake
The technology of the business data lake
 
Teradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made EasyTeradata Aster: Big Data Discovery Made Easy
Teradata Aster: Big Data Discovery Made Easy
 
Etl elt simplified
Etl elt simplifiedEtl elt simplified
Etl elt simplified
 
2012 10 bigdata_overview
2012 10 bigdata_overview2012 10 bigdata_overview
2012 10 bigdata_overview
 
Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...Who changed my data? Need for data governance and provenance in a streaming w...
Who changed my data? Need for data governance and provenance in a streaming w...
 
2009.10.22 S308460 Cloud Data Services
2009.10.22 S308460  Cloud Data Services2009.10.22 S308460  Cloud Data Services
2009.10.22 S308460 Cloud Data Services
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
Data warehouse architecture
Data warehouse architectureData warehouse architecture
Data warehouse architecture
 
VenkatSubbaReddy_Resume
VenkatSubbaReddy_ResumeVenkatSubbaReddy_Resume
VenkatSubbaReddy_Resume
 
Hadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom IndustryHadoop Boosts Profits in Media and Telecom Industry
Hadoop Boosts Profits in Media and Telecom Industry
 
Pervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricityPervasive analytics through data & analytic centricity
Pervasive analytics through data & analytic centricity
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User GroupBig Data and BI Tools - BI Reporting for Bay Area Startups User Group
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
 
Hadoop dev 01
Hadoop dev 01Hadoop dev 01
Hadoop dev 01
 
Third Nature - Open Source Data Warehousing
Third Nature - Open Source Data WarehousingThird Nature - Open Source Data Warehousing
Third Nature - Open Source Data Warehousing
 
Hu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On WorldHu Yoshida's Point of View: Competing In An Always On World
Hu Yoshida's Point of View: Competing In An Always On World
 
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
Hadoop World 2011: Big Data Architecture: Integrating Hadoop with Other Enter...
 
HPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & AnalyticsHPCC Systems - Open source, Big Data Processing & Analytics
HPCC Systems - Open source, Big Data Processing & Analytics
 

Viewers also liked

MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMorning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
MongoDB
 
BigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationBigFoot: Big Data For Every Organization
BigFoot: Big Data For Every Organization
Matteo Dell'Amico
 
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Technology Entrepreneurship Venture Lab 2012   beer buddy appTechnology Entrepreneurship Venture Lab 2012   beer buddy app
Technology Entrepreneurship Venture Lab 2012 beer buddy app
doc2005
 
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
MongoDB
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
Kai Zhao
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
Uwe Printz
 
Pp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingPp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewing
Lucas Abrantes
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
MongoDB
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
RTigger
 
Beer industry
Beer industry Beer industry
Beer industry
Christian Adeler
 
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalakeKylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kai Zhao
 
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
Kai Zhao
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
King Julian
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
Kai Zhao
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Jonathan Gray
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
Jason S
 
Visualising Data with Code
Visualising Data with CodeVisualising Data with Code
Visualising Data with Code
Ri Liu
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
Srinath Perera
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
Deloitte United States
 

Viewers also liked (20)

MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic ConceptsMorning with MongoDB Paris 2012 - MongoDB Basic Concepts
Morning with MongoDB Paris 2012 - MongoDB Basic Concepts
 
BigFoot: Big Data For Every Organization
BigFoot: Big Data For Every OrganizationBigFoot: Big Data For Every Organization
BigFoot: Big Data For Every Organization
 
Technology Entrepreneurship Venture Lab 2012 beer buddy app
Technology Entrepreneurship Venture Lab 2012   beer buddy appTechnology Entrepreneurship Venture Lab 2012   beer buddy app
Technology Entrepreneurship Venture Lab 2012 beer buddy app
 
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
Webinar: How Financial Organizations use MongoDB for Real-time Risk Managemen...
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
Pp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewingPp glob bus11_abinbev_brewing
Pp glob bus11_abinbev_brewing
 
Performance Tuning and Optimization
Performance Tuning and OptimizationPerformance Tuning and Optimization
Performance Tuning and Optimization
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Beer industry
Beer industry Beer industry
Beer industry
 
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalakeKylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
Kylo为企业级的数据湖赋能 赵锴 kai_zhao_大数据_数据湖_datalake
 
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
物联网IoT用例 赵锴_kaizhao_大数据_物联网_云计算2
 
DATA WAREHOUSING
DATA WAREHOUSINGDATA WAREHOUSING
DATA WAREHOUSING
 
GE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoTGE Predix 新手入门 赵锴 物联网_IoT
GE Predix 新手入门 赵锴 物联网_IoT
 
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
Ways of Seeing Data: Towards a Critical Literacy for Data Visualisations as R...
 
Introduction to Data Warehousing
Introduction to Data WarehousingIntroduction to Data Warehousing
Introduction to Data Warehousing
 
Visualising Data with Code
Visualising Data with CodeVisualising Data with Code
Visualising Data with Code
 
Introduction to Big Data
Introduction to Big Data Introduction to Big Data
Introduction to Big Data
 
Analytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolutionAnalytics Trends 2016: The next evolution
Analytics Trends 2016: The next evolution
 

Similar to Big data analytics beyond beer and diapers

What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
RTTS
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
Jeffrey T. Pollock
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
HEXANIKA
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
BOSupport
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
Gaurav Bhatnagar
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Denodo
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Rittman Analytics
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
punedevscom
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL Tools
Rhonda Cetnar
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINE
IRJET Journal
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
Hortonworks
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
Denodo
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
David Portnoy
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
mahezabeenIlkal
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
bobosenthil
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
Novita Sari
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
Cindy Irby
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
Jeffrey T. Pollock
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
HostedbyConfluent
 

Similar to Big data analytics beyond beer and diapers (20)

What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
Webinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafkaWebinar future dataintegration-datamesh-and-goldengatekafka
Webinar future dataintegration-datamesh-and-goldengatekafka
 
Why shift from ETL to ELT?
Why shift from ETL to ELT?Why shift from ETL to ELT?
Why shift from ETL to ELT?
 
ETL VS ELT.pdf
ETL VS ELT.pdfETL VS ELT.pdf
ETL VS ELT.pdf
 
ETL Technologies.pptx
ETL Technologies.pptxETL Technologies.pptx
ETL Technologies.pptx
 
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
Bridging the Last Mile: Getting Data to the People Who Need It (APAC)
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Designing modern dw and data lake
Designing modern dw and data lakeDesigning modern dw and data lake
Designing modern dw and data lake
 
A Comparitive Study Of ETL Tools
A Comparitive Study Of ETL ToolsA Comparitive Study Of ETL Tools
A Comparitive Study Of ETL Tools
 
MODERN DATA PIPELINE
MODERN DATA PIPELINEMODERN DATA PIPELINE
MODERN DATA PIPELINE
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Bridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need ItBridging the Last Mile: Getting Data to the People Who Need It
Bridging the Last Mile: Getting Data to the People Who Need It
 
Agile Business Intelligence
Agile Business IntelligenceAgile Business Intelligence
Agile Business Intelligence
 
Etl techniques
Etl techniquesEtl techniques
Etl techniques
 
Big data - what, why, where, when and how
Big data - what, why, where, when and howBig data - what, why, where, when and how
Big data - what, why, where, when and how
 
Summary introduction to data engineering
Summary introduction to data engineeringSummary introduction to data engineering
Summary introduction to data engineering
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
 
Fbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_servicesFbdl enabling comprehensive_data_services
Fbdl enabling comprehensive_data_services
 
Flash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lonFlash session -streaming--ses1243-lon
Flash session -streaming--ses1243-lon
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 

Recently uploaded

❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Kalyan Satta Matka Guessing Matka Result Main Bazar chart
 
Best practices for project execution and delivery
Best practices for project execution and deliveryBest practices for project execution and delivery
Best practices for project execution and delivery
CLIVE MINCHIN
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
my Pandit
 
-- June 2024 is National Volunteer Month --
-- June 2024 is National Volunteer Month ---- June 2024 is National Volunteer Month --
-- June 2024 is National Volunteer Month --
NZSG
 
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
Adnet Communications
 
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your TasteZodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
my Pandit
 
Easily Verify Compliance and Security with Binance KYC
Easily Verify Compliance and Security with Binance KYCEasily Verify Compliance and Security with Binance KYC
Easily Verify Compliance and Security with Binance KYC
Any kyc Account
 
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
my Pandit
 
2022 Vintage Roman Numerals Men Rings
2022 Vintage Roman  Numerals  Men  Rings2022 Vintage Roman  Numerals  Men  Rings
2022 Vintage Roman Numerals Men Rings
aragme
 
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
APCO
 
Structural Design Process: Step-by-Step Guide for Buildings
Structural Design Process: Step-by-Step Guide for BuildingsStructural Design Process: Step-by-Step Guide for Buildings
Structural Design Process: Step-by-Step Guide for Buildings
Chandresh Chudasama
 
How MJ Global Leads the Packaging Industry.pdf
How MJ Global Leads the Packaging Industry.pdfHow MJ Global Leads the Packaging Industry.pdf
How MJ Global Leads the Packaging Industry.pdf
MJ Global
 
Part 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 SlowdownPart 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 Slowdown
jeffkluth1
 
Creative Web Design Company in Singapore
Creative Web Design Company in SingaporeCreative Web Design Company in Singapore
Creative Web Design Company in Singapore
techboxsqauremedia
 
Digital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital ExcellenceDigital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital Excellence
Operational Excellence Consulting
 
3 Simple Steps To Buy Verified Payoneer Account In 2024
3 Simple Steps To Buy Verified Payoneer Account In 20243 Simple Steps To Buy Verified Payoneer Account In 2024
3 Simple Steps To Buy Verified Payoneer Account In 2024
SEOSMMEARTH
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Holger Mueller
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
Norma Mushkat Gaffin
 
The Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb PlatformThe Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb Platform
SabaaSudozai
 
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
AnnySerafinaLove
 

Recently uploaded (20)

❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
❼❷⓿❺❻❷❽❷❼❽ Dpboss Matka Result Satta Matka Guessing Satta Fix jodi Kalyan Fin...
 
Best practices for project execution and delivery
Best practices for project execution and deliveryBest practices for project execution and delivery
Best practices for project execution and delivery
 
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
Unveiling the Dynamic Personalities, Key Dates, and Horoscope Insights: Gemin...
 
-- June 2024 is National Volunteer Month --
-- June 2024 is National Volunteer Month ---- June 2024 is National Volunteer Month --
-- June 2024 is National Volunteer Month --
 
Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024Lundin Gold Corporate Presentation - June 2024
Lundin Gold Corporate Presentation - June 2024
 
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your TasteZodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
Zodiac Signs and Food Preferences_ What Your Sign Says About Your Taste
 
Easily Verify Compliance and Security with Binance KYC
Easily Verify Compliance and Security with Binance KYCEasily Verify Compliance and Security with Binance KYC
Easily Verify Compliance and Security with Binance KYC
 
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
Taurus Zodiac Sign: Unveiling the Traits, Dates, and Horoscope Insights of th...
 
2022 Vintage Roman Numerals Men Rings
2022 Vintage Roman  Numerals  Men  Rings2022 Vintage Roman  Numerals  Men  Rings
2022 Vintage Roman Numerals Men Rings
 
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
The APCO Geopolitical Radar - Q3 2024 The Global Operating Environment for Bu...
 
Structural Design Process: Step-by-Step Guide for Buildings
Structural Design Process: Step-by-Step Guide for BuildingsStructural Design Process: Step-by-Step Guide for Buildings
Structural Design Process: Step-by-Step Guide for Buildings
 
How MJ Global Leads the Packaging Industry.pdf
How MJ Global Leads the Packaging Industry.pdfHow MJ Global Leads the Packaging Industry.pdf
How MJ Global Leads the Packaging Industry.pdf
 
Part 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 SlowdownPart 2 Deep Dive: Navigating the 2024 Slowdown
Part 2 Deep Dive: Navigating the 2024 Slowdown
 
Creative Web Design Company in Singapore
Creative Web Design Company in SingaporeCreative Web Design Company in Singapore
Creative Web Design Company in Singapore
 
Digital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital ExcellenceDigital Transformation Frameworks: Driving Digital Excellence
Digital Transformation Frameworks: Driving Digital Excellence
 
3 Simple Steps To Buy Verified Payoneer Account In 2024
3 Simple Steps To Buy Verified Payoneer Account In 20243 Simple Steps To Buy Verified Payoneer Account In 2024
3 Simple Steps To Buy Verified Payoneer Account In 2024
 
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challengesEvent Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
Event Report - SAP Sapphire 2024 Orlando - lots of innovation and old challenges
 
Mastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnapMastering B2B Payments Webinar from BlueSnap
Mastering B2B Payments Webinar from BlueSnap
 
The Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb PlatformThe Genesis of BriansClub.cm Famous Dark WEb Platform
The Genesis of BriansClub.cm Famous Dark WEb Platform
 
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
Anny Serafina Love - Letter of Recommendation by Kellen Harkins, MS.
 

Big data analytics beyond beer and diapers

  • 1. Big Data Analytics: Beyond Beer and Diapers 2012/2/22 Kai Zhao @Teradata kingaim@gmail.com by Kai Zhao 2011.12 Disclaimer: Any views or opinions presented in this article are solely those of the author and do NOT necessarily represent those of Teradata or other companies .
  • 2. Content Background: Traditional Business Intelligent(BI) What is Big Data What is Big Data Analytics Big Data Analytics: State of the Art Big Data Analytics Technology Stack ETL/ELT/ETLT(Demo) MPP Data Warehouse Map Reduce NoSQL Web Service Data Analytics Data Visualization BI Tools(Demo) Big Data Analytics Platform Architecture
  • 3. 云计算风起云涌,商业智能方兴未艾,大数据分析势在必行。 Cloud Computing storming, BI revolution and It is time for BIG DATA. Shared-nothing Massively Parallel Processing(MPP) Petabyte Scaling In-database Analytics
  • 5. What is Big Data Volume: The increase in data volumes within enterprise systems is caused by transaction volumes and other traditional data types, as well as by new types of data. Too much volume is a storage issue, but too much data is also a massive analysis issue. Variety: IT leaders have always had an issue translating large volumes of transactional information into decisions — now there are more types of information to analyze — mainly coming from social media and mobile (context-aware). Variety includes tabular data (databases), hierarchical data, documents, e-mail, metering data, video, still images, audio, stock ticker data, financial transactions and more. Velocity: This involves streams of data, structured record creation, and availability for access and delivery. Velocity means both how fast data is being produced and how fast the data must be processed to meet demand.
  • 6. What is Big Data (cont.) Broadly speaking, Big Data is generated by a number of sources, including: Social Networking and Media: There are currently over 700 million Facebook users, 250 million Twitter users and 156 million public blogs. Each Facebook update, Tweet, blog post and comment creates multiple new data points, both structured, semi-structured and unstructured, sometimes called Data Exhaust. Mobile Devices: There are over 5 billion mobile phones in use worldwide. Each call, text and instant message is logged as data. Mobile devices, particularly smart phones and tablets, also make it easier to use social media and use other data-generating applications. Mobile devices also collect and transmit location data. Internet Transactions: Billions of online purchases, stock trades and other transactions happen every day, including countless automated transactions. Each creates a number of data points collected by retailers, banks, credit cards, credit agencies and others. Networked Devices and Sensors: Electronic devices of all sorts – including servers and other IT hardware, smart energy meters and temperature sensors -- all create semi-structured log data that record every action.
  • 7. What is Big Data Analytics See Video Big Data Visualization
  • 8. Big Data Analytics: State of the Art Acquisitions and Investments Big Data Vendors and Their Productions Forrester Report Gartner Report
  • 9. Acquisitions and Investments Acquirer Acquiree(Est. date) Date of Acq. Deal Summary Teradata AsterData - 2005 2011.3.3 $0.263 billion Traditional Data HP Vertica – 2005 2011.2.14 $1.2 billion Warehouse Vendors needs Big Data IBM Netezza – 2000 2010.11.11 $1.7 billion Analytics technology. EMC Greenplum – 2003 2010.7.6 $0.1~0.15 billion SAP Sybase 2010.5.12 $0.58 billion Investee Investment Cloudera $76 million MapR $29 million Hortonworks $50 million Datameer $10 million Summary New Big Data Analytics Startups Source: http://www.leiphone.com/why-2012-the-year-of-hadoop.html
  • 10. Big Data Vendors and Their Productions Source: http://wikibon.org/wiki/v/Big_Data:_Hadoop,_Business_Analytics_and_Beyond
  • 12. Hype Cycle Source: Gartner
  • 13. Gartner Report: Hype Cycle 2011 Source: Gartner
  • 14. Big Data Analytics Technology Stack Data Import Data Storage Data Computing Data Analytics XXX as a Service
  • 15. ETL/ELT/ETLT Extract – The process by which data is extracted from the data source Transform – The transformation of the source data into a format relevant to the solution Load – The loading of data into the warehouse This approach to data warehouse development is the traditional and widely accepted approach. The following diagram illustrates each of the individual stages in the process.
  • 16. ETL This approach to data warehouse development is the traditional and widely accepted approach. The following diagram illustrates each of the individual stages in the process. Source: Robert J Davenport ETL vs ELT A Subjective View
  • 17. ETL Strengths Development Time Designing from the output backwards ensures that only data relevant to the solution is extracted and processed, potentially reducing development, extract, and processing overhead; and therefore time. Targeted data Due to the targeted nature of the load process, the warehouse contains only data relevant to the presentation. Administration Overhead Reduced warehouse content simplifies the security regime implemented and hence the administration overhead. Tools Availability The prolific number of tools available that implement ETL provides flexibility of approach and the opportunity to identify a most appropriate tool. The proliferation of tools has lead to a competitive functionality war, which often results in loss of maintainability. Weaknesses Flexibility Targeting only relevant data for output means that any future requirements, that may need data that was not included in the original design, will need to be added to the ETL routines. Due to nature of tight dependency between the routines developed, this often leads to a need for fundamental re-design and development. As a result this increases the time and costs involved. Hardware Most third party tools utilize their own engine to implement the ETL process. Regardless of the size of the solution this can necessitate the investment in additional hardware to implement the tool’s ETL engine. Skills Investment The use of third party tools to implement ETL processes compels the learning of new scripting languages. Learning Curve Implementing a third party tool that uses foreign processes and languages results in the learning curve that is implicit in all technologies new to an organization and can often lead to following blind alleys in their use due to lack of experience.
  • 18. ELT Whilst this approach to the implementation of a warehouse appears on the surface to be similar to ETL, it differs in a number of significant ways. The following diagram illustrates the process.
  • 19. ELT Strengths Project Management Being able to split the warehouse process into specific and isolated tasks, enables a project to be designed on a smaller task basis, therefore the project can be broken down into manageable chunks. Flexible & Future Proof In general, in an ELT implementation all data from the sources are loaded into the warehouse as part of the extract and load process. This, combined with the isolation of the transformation process, means that future requirements can easily be incorporated into the warehouse structure. Risk minimization Removing the close interdependencies between each stage of the warehouse build process enables the development process to be isolated, and the individual process design can thus also be isolated. This provides an excellent platform for change, maintenance and management. Utilize Existing Hardware In implementing ELT as a warehouse build process, the inherent tools provided with the database engine can be used. Alternatively, the vast majority of the third party ELT tools available employ the use of the database engine’s capability and hence the ELT process is run on the same hardware as the database engine underpinning the data warehouse, using the existing hardware deployed. Utilize Existing Skill sets By using the functionality provided by the database engine, the existing investments in database skills are re-used to develop the warehouse. Weaknesses Against the Norm ELT is an emergent approach to data warehouse design and development. Whilst it has proven itself many times over through its abundant use in implementations throughout the world, it does require a change in mentality and design approach against traditional methods. To get the best from an ELT approach requires an open mind. Tools Availability Being an emergent technology approach, ELT suffers from a limited availability of tools.
  • 20. ETL Demo - Kettle Demo of Pentaho Kettle.
  • 21. Map Reduce: Hadoop Comparing with MPP Data Warehouse. Source: http://www.capgemini.com/technology-blog/2012/01/what-is-hadoop/
  • 22. Map Reduce: Hadoop Professional Service Enterprise- Database OLTP grade Distribution Hadoop Subscription replacements: Service Teradata Aster/MongoDB Hadoop Cluster Data Integration Management with Hadoop EDW BI
  • 23. MPP Data Warehouse Comparing MPP Data Warehouse with Hadoop stack. Draw a picture.
  • 24. NoSQL
  • 25. NoSQL/SQL/NewSQL Non-Relational Relational Analytics(OLAP) SQL MPP Teradata IBM Netezza EMC Greenplum HP Vertica Hadoop Teradata Aster VectorWise Operational(OLTP) Oracle IBM DB2 SQL Server NoSQL KeyValue Graph Cloud Service MongoDB Neo4j Amazon Amazon RDS SQL Azure BDB SimpleDB Voldemort Toyko Cabinet Document Columnar CouchDB HBase MySQL PostgreSQL Ingres Sybase EnterpriseDB Cassandra Redis MongoDB Data Grid/Cache Memcached
  • 26. Web Service There are a lot of Web Services.
  • 27. Data Analytics A lot of…..
  • 28. Data Visualization: It is VERY IMPORTANT to Attract User Source:打破陈规-数据及信息的可视化 向怡宁
  • 29. Data Visualization: It is VERY IMPORTANT to Compete Source:打破陈规-数据及信息的可视化 向怡宁
  • 30. Data Visualization: It is VERY IMPORTANT to User Experience Source:打破陈规-数据及信息的可视化 向怡宁
  • 31. BI Tools BI Tools fall into three categories: Query Tools A query tool is software setup for users to ask questions about the data. The user can search for patterns or details. Multidimensional Analysis Tools A multidimensional analysis tool, also called Online Analytical Processing (OLAP), is software that allows the user to view the same data from different aspects. Eg: Business Objects, Hyperio, Cognos, MicroStrategy, Pentaho, Microsoft Analysis Services and Palo OLAP Server etc. Data Mining Tools A data mining tool is software that is automated to search data, seeking out ways that the data correlates to other data. Eg: SPSS Clementine, Weka3, R and Apache Mahout etc.
  • 32. BI Tools List Source: BI Tool Survey 2012 http://www.businessintelligencetoolbox.com/list-of-business-intelligence-bi-tools/
  • 33. BI Tools: Gartner Evaluation Business intelligence (BI) platforms enable all types of users – from IT staff to consultants to business users – to build applications that help organizations learn about and understand their business
  • 34. BI Demo – JasperSoft iReport Demo Session: JasperSoft iReport
  • 35. Big Data Analytics Platform Architecture