SlideShare a Scribd company logo
1 of 29
June 2012




IBM Big Data
The Marriage of Hadoop and Data Warehousing


James Kobielus
Senior Program Director, Product Marketing, Big Data, IBM




                                                            © 2012 IBM Corporation
Hadoop and DW are
    fast being joined into a
    new platform paradigm:
       the Hadoop DW


2                              © 2012 IBM Corporation
Agenda




    §  Big Data: 3 Vs and myriad use cases
    §  Big Data: diverse workloads
    §  Big Data: emergence of the Hadoop DW




3                                              © 2012 IBM Corporation
Agenda




    §  Big Data: 3 Vs and myriad use cases
    §  Big Data: diverse workloads
    §  Big Data: emergence of the Hadoop DW




4                                              © 2012 IBM Corporation
Scalability Imperative: 3 Vs Drive Big Data Everywhere




       Information               Radical                     Extreme
    from Everywhere             Flexibility                 Scalability




    Volume                     Velocity                  Variety



5
    12            terabytes
     of Tweets created daily
                               5      million
                               trade events per second
                                                         100’s
                                                         from surveillance cameras
                                                                                  video
                                                                                  feeds

                                                                        © 2012 IBM Corporation
More Business Use Cases for Big Data Across Enterprise




6                                                    © 2012 IBM Corporation
More Mission-Critical Apps Ride on Big Data Platforms


      Advanced Analytic Applications
                                                       §  Integrate and manage the full variety, velocity
                                                           and volume of data

                                                       §  Apply advanced analytics to information in its
                                                           native form

                Big Data Platform                      §  Visualize all available data for ad-hoc analysis
       Process and analyze any type of data                and discovery
                    Accelerators
                                                       §  Development environment for building new
                                                           analytic applications

                                                       §  Integration and deploy applications with enterprise
                                                           grade availability, manageability, security, and
                                                           performance
    •  Analyze data in motion   •  Visualization and
    •  MapReduce / noSQL           exploration
    •  Machine Learning         •  Scalability
    •  Text Analytics           •  Hardware
    •  Text Search                 acceleration
    •  Data Discovery           •  Stream computing


7                                                                                             © 2012 IBM Corporation
Big Data: Business Crucible for Practical Data Science


                            Business and IT Identify
                         Information Sources Available




      New insights                                             IT Delivers a
    drive integration                                          Platform that
      to traditional                                         enables creative
       technology                                            exploration of all
                                                            available data and
                                                                  content



                           Business determines what
                        questions to ask by exploring the
                             data and relationships


8                                                                        © 2012 IBM Corporation
Big Data Initiatives: Fueled by Practical Data Science
                                      Analyze a Variety of Information
                                      Novel analytics on a broad set of mixed
                                      information that could not be analyzed before



                                      Analyze Information in Motion
                                      Streaming data analysis
                                      Large volume data bursts and ad-hoc analysis


                                      Analyze Extreme Volumes of Information
                                      Cost-efficiently process and analyze PBs of
                                      information
                                      Manage & analyze high volumes of structured,
                                      relational data


                                      Discover and Experiment
                                      Ad-hoc analytics, data discovery and
                                      experimentation



                                      Manage and Plan
                                      Enforce data structure, integrity and control to

9                                     ensure consistency for repeatable queries IBM Corporation
                                                                           © 2012
Big Data: Marriage of Established & Emerging Approaches


                 Established Approach                             Emerging Approaches
                  Structured, analytical, logical            Creative, holistic thought, intuition




                                    DW                        Hadoop, etc.
       Transaction Data                                                                          Web Logs


     Internal App Data                                                                              Social Data
                    Structured                                             Unstructured
                        Structured                  Enterprise       Exploratory
                                                                           Exploratory
                    Repeatable
                        Repeatable
                        Linear
                                                    Integration
                                                                      Iterative
                                                                           Iterative   Text Data: emails
     Mainframe Data
                                  Linear
                Monthly sales reports                                          Brand sentiment
                 Profitability analysis                                        Product strategy
       OLTP SystemCustomer surveys
                   Data                                                                     Sensor data: images
                                                                               Maximum asset utilization



           ERP data               Traditional                         New                        RFID
                                   Sources                           Sources




10                                                                                                       © 2012 IBM Corporation
Agenda




     §  Big Data: 3 Vs and myriad use cases
     §  Big Data: diverse workloads
     §  Big Data: emergence of the Hadoop DW




11                                              © 2012 IBM Corporation
Continuous Social Media Monitoring and Analytics




                       Data Set                         Information extracted
                       •    1.1B tweets                 •    Buzz and sentiment
                       •    5.7M blog and forum posts   •    Gender, Location and Occupation
                       •    3.5M relevant messages      •    Fans
                       •    97K referencing Product A   •    Intent to in purchase
                       •    18K referencing Product B   •    Specific attributes of products




12                                                                                 © 2012 IBM Corporation
Content mining, natural language processing, & classification


 §  How it works                                         Unstructured text (document, email, etc)
     –  Parses text and detects meaning with extractors
                                                          Football World Cup 2010, one team
     –  Understands the context in which the text is
        analyzed
                                                          distinguished themselves well, losing to
                                                          the eventual champions 1-0 in the Final.
     –  Hundreds of pre-built extractors for names,
        addresses, phone numbers, organizations, URL,
                                                          Early in the second half, Netherlands’
        Datetime, etc.                                    striker, Arjen Robben, had a breakaway,
                                                          but the keeper for Spain, Iker Casillas
 §  Accuracy                                             made the save. Winger Andres Iniesta
     –  Highly accurate in deriving meaning from          scored for Spain for the win.
        complex text



 §  Performance
     –  AQL language optimized for MapReduce                         Classification and Insight
                                                            World Cup 2010 Highlights




13                                                                                           © 2012 IBM Corporation
Entity Extraction and Integration




14                                  © 2012 IBM Corporation
Statistical Analysis, Predictive Modeling, & Machine Learning

          Enables Machine learning (ML) on massive datasets
           §  R and Matlab-like syntax for smooth adoption
           §  Optimizations to generate low-level executions plans
           §  Out-of-box and write-your-own analytic algorithms, e.g. Regression, Clustering,
               Classification, Pattern Mining, Ranking, etc.
           §  Scale to massively parallel clusters from 10s to 1000s of machines and from
               Terabytes to Petabytes



     What are people
     talking about in social
     media about a
     product?




     15

15                                                                                        © 2012 IBM Corporation
Targeted E-Commerce and Next Best Action




16                                         © 2012 IBM Corporation
Predictive Complex Event Processing




17                                    © 2012 IBM Corporation
Intent and Sentiment Analysis

                      Online flow: Data-in-motion analysis

     Data Sources     Stream Computing and Analytics                                                Timely
                                                                                                   Decisions

                                                                       Entity       Predictive
                               Data Ingest       Text Analytics:     Analytics:     Analytics:
                                and Prep         Timely Insights      Profile         Action
                                                                     Resolution    Determination
                                                                                                   Dashboard




                      Hadoop System and Analytics

                                                                   Comprehensive
                                                      Entity
          Social Media and                                          Social Media    Predictive      Customer
                              Text Analytics      Analytics and
           Enterprise Data                                           Customer       Analytics        Models
                                                   Integration
                                                                      Profiles


                      Offline flow: Data-at-rest analysis                                           Reports




18                                                                                                  © 2012 IBM Corporation
Agenda




     §  Big Data: 3 Vs and myriad use cases
     §  Big Data: diverse workloads
     §  Big Data: emergence of the Hadoop DW




19                                              © 2012 IBM Corporation
Big Data: DW & Hadoop are Married in Spirit



                                             Cloud-facing
                                             architectures
               models                         Massively
                        policies
          metadata aggregates                   parallel
      DQ MDM hubs             marts           processing
                           cubes
  ETL databases

              DW                             In-database
                               views
    storage
                                   memory
 staging
          production cache in-database
                                               analytics
 nodes
          tables              analytics
                  operational
                  data stores
                                            Mixed workload
                                             management

                                            Hybrid storage
                                               layers


20                                                           © 2012 IBM Corporation
Hadoop is Core of Next-Gen Big Data DW


     §  Vendor-agnostic framework for
         massively parallel processing of
         advanced analytics against
         polystructured information
     §  Leverages extensible framework for
         building advanced analytics and data
         management functions
     §  Evolving rapidly in new directions
     §  Being commercialized and adopted
         rapidly in enterprises
     §  Vibrant open-source community and
         industry


21                                              © 2012 IBM Corporation
Hadoop, DW, and other Databases Co-Exist in Big Data
Ecosystem



              Hadoop &                                  In-memory
               NoSQL
                                   DW RDBMS
                                                         Columnar


                                                           OLAP



          Big Data staging,
              ETL, and         Big Data SVOT and    Big Data access
          preprocessing tier     governance tier   and interaction tier




22                                                                        © 2012 IBM Corporation
How Hadoop and DW Complement Each Other




23                                        © 2012 IBM Corporation
Single Version of Big Data: Where Hadoop DW Will Excel
                                                   Timely Insights
                                                   • Intent to see a movie title, buy a product
                                                   • Current Location


                     Life Events                                                     Products Interests
                     • Life-changing events: relocation, having a                    • Personal preferences of product and services
                       baby, getting married, getting divorced,                      • Product purchase history
                       buying a house



       Personal Attributes                                                                        Relationships
                                                             Social media based                   • Personal relationships: family, friends
       • Identifiers: name, address, age, gender                                                    and roommates…
       • Interests: sports, pets, cuisine…                       360-degree
                                                                                                  • Business relationships: co-workers and
       • Life Cycle Status: marital, parental                consumer profiles                      work/interests network…




     Monetizable intent to see a                                                 Monetizable intent to buy
     Kinda feel like going to movies tonight… Any                                I need a new digital camera for my food pictures, and
     recommendations? @Texas Angelika Texas                                      recommendations around 300?

     I don t think anyone understands how much I like                            What should I buy?? A mini laptop with Windows 7 OR a Apply
     watching movies. My 3rd trip to the threatre in 3 days.                     MacBook!??!

                                                                                 Life Events
     Location announcements                                                     College: Off to Standard for my MBA! Bbye chicago!
     I m at Starbucks Parque Tezontle http://4sq.com/
     fYReSj                                                                      Looks like we ll be moving to New Orleans sooner than I
24                                                                               thought.                                           © 2012 IBM Corporation
Hadoop DW Integration: What to Look For
                                                                             models
     §  Hadoop distro functional depth                                                 policies
                                                                         metadata aggregates
     §  EDW HDFS connector                                          DQ MDM hubs                marts
                                                                                           cubes
                                                               ETL databases

                                                                             DW
     §  Software, appliance, and cloud form factors for                                         views
                                                                 storage
         Hadoop offerings                                     staging                               memory
                                                              nodes    production    cache in-database
     §  Pluggable storage layer for Hadoop offerings                  tables
                                                                                operational
                                                                                              analytics

     §  Bundled data management and analytics                                  data stores

         offerings integrated with Hadoop solutions
     §  Modeling, management, acceleration, and
         optimization tools
     §  Real-time/low-latency capabilities integrated into
         Hadoop offerings
     §  Robust availability, security, and workload
         management tools integrated with Hadoop
         offerings
     §  And many more, focused on EDW-grade
         robustness, scalability, and flexibility!


25                                                                                            © 2012 IBM Corporation
Consider Big Data Platform Accelerators

                  Telecommunications                              Retail Customer
                  CDR streaming analytics                         Intelligence
                  Deep Network Analytics                          Customer Behavior and Lifetime
                                                                  Value Analysis


                  Finance                                         Social Media Analytics
                  Streaming options trading                       Sentiment Analytics, Intent to
                  Insurance and banking DW                        purchase
                  models


                  Public transportation                           Data mining
                  Real-time monitoring and                        Streaming statistical analysis
                  routing optimization




     Over 100 sample    User Defined          Standard Toolkits      Industry Data Models
       applications       Toolkits                                     Banking, Insurance, Telco,
                                                                          Healthcare, Retail
26                                                                                  © 2012 IBM Corporation
How Will You Do MDM on Your Hadoop DW?

     (A1) Unstructured Entity Integration (on BigInsights)
       –  Complex analytics to populate master data set
        –  Text Analytics: Rule language (AQL) for extracting
           entities, events, relationships from text and html documents
        –  Entity Integration: Rule language (HIL) to express &               MDM DaaS
           customize the integration, cleansing, and aggregation of           Applications
           the master entities                                                 and Views
     (A2) Entity Repository (on MDM)
       –  BigInsights Bridge: Generation of the MDM model for
           public master entities, from the BigInsights model; and                                             select cik, Officers, Directors
           bulk-loading of master entities                                                                     from Company
                                                                            Data services                      where name = Citigroup
        –  Query-based Application Development: Supports the
           generation of custom queries for individual applications

                                                                                                                                   Tooling based
                                                                                Queries                                            on entity model

                                                                      A2
 External data
 subscriptions
 (e.g., Acxiom)
                                                  A1                        Relational tables   SELECT *
                                                                                                FROM

                                                                              with master
                                                                                                (SELECT t2.CIK as CIK, t2.NAME as NAME, t2.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
                                                                                                      t2.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, t2.POSITION_NAME as
                                                                                                POSITION_NAME,

                                             Text Analytics                     entities        FROM
                                                                                                      tp.EARLIEST_DATE as EARLIEST_DATE, tp.IS_EARLIEST_EXACT as IS_EARLIEST_EXACT,
                                                                                                      tp.LATEST_DATE as LATEST_DATE, tp.IS_LATEST_EXACT as IS_LATEST_EXACT


 External public data                               and                                          (SELECT t1.CIK as CIK, t1.NAME as NAME,t1.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
                                                                                                             t1.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, p.NAME as POSITION_NAME,
                                                                                                             p.POSITIONSPK_ID as POSITIONSPK_ID
 sources                                    Entity Integration                                    FROM
                                                                                                    (SELECT o.CIK as CIK, o.NAME as NAME, o.IS_FORMER_OFFICER as IS_FORMER_OFFICER,
                                                                                                          o.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, o.OFFICERSPK_ID as OFFICERSPK_ID

 (e.g., SEC/FDIC,
                                                                                                     FROM DB2ADMIN.OFFICERS o
                                                                                                     WHERE o.OFFICER_OF = 567830643756635868
                                                                                                    ) as t1
 Twitter, Blogs,                              BigInsights                  InfoSphere MDM           left outer join DB2ADMIN.POSITIONS p on t1.OFFICERSPK_ID= p.POSITIONOF
                                                                                                  ) as t2

 Facebook)                                                                                        left outer join D2ADMIN.RANGEOFKNOWNDATES tp

                                                                           with Extensions      UNION
                                                                                                            on t2.POSITIONSPK_ID = tp.RANGE_OF_KNOWN_DATES_FOR_POS )

                                                                                                               // ( OUTER UNION)

                                                                                                …



27                                                                                                                                   © 2012 IBM Corporation
IBM Big Data Platform

New analytic applications drive the                         Analytic Applications
requirements for a big data platform           BI /    Exploration / Functional Industry Predictive Content
                                             Reporting Visualization   App        App
                                                                                                    BI /
                                                                                         Analytics Analytics
                                                                                                    Reporting



   •  Integrate and manage the full          IBM Big Data Platform
      variety, velocity and volume of data
                                                Visualization         Application         Systems
   •  Apply advanced analytics to               & Discovery          Development         Management
      information in its native form
   •  Visualize all available data for ad-                             Accelerators
      hoc analysis
   •  Development environment for                  Hadoop              Stream               Data
                                                   System             Computing           Warehouse
      building new analytic applications
   •  Workload optimization and
      scheduling
   •  Security and Governance                           Information Integration & Governance



                                                                                            © 2012 IBM Corporation
Thank You!




29                © 2012 IBM Corporation

More Related Content

What's hot

Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Jordan Chung
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014Wilfried Hoge
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMark Kromer
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationDenodo
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol HARMAN Services
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsightsCynthia Saracco
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDataWorks Summit
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopDavid Yahalom
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015DataWorks Summit
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessDataWorks Summit
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
The 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeThe 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeDataWorks Summit
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeDataWorks Summit
 

What's hot (20)

Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, ClouderaMongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
MongoDB IoT City Tour STUTTGART: Hadoop and future data management. By, Cloudera
 
2014.07.11 biginsights data2014
2014.07.11 biginsights data20142014.07.11 biginsights data2014
2014.07.11 biginsights data2014
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
Introduction to Microsoft Azure HD Insight by Dattatrey Sindhol
 
Azure HDInsight
Azure HDInsightAzure HDInsight
Azure HDInsight
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data:  Technical Introduction to BigSheets for InfoSphere BigInsightsBig Data:  Technical Introduction to BigSheets for InfoSphere BigInsights
Big Data: Technical Introduction to BigSheets for InfoSphere BigInsights
 
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...
 
Designing Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted AnalyticsDesigning Data Pipelines for Automous and Trusted Analytics
Designing Data Pipelines for Automous and Trusted Analytics
 
A beginners guide to Cloudera Hadoop
A beginners guide to Cloudera HadoopA beginners guide to Cloudera Hadoop
A beginners guide to Cloudera Hadoop
 
Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015Extending Data Lake using the Lambda Architecture June 2015
Extending Data Lake using the Lambda Architecture June 2015
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Ibm big data
Ibm big dataIbm big data
Ibm big data
 
The 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data LakeThe 5 Keys to a Killer Data Lake
The 5 Keys to a Killer Data Lake
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
Use dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application codeUse dependency injection to get Hadoop *out* of your application code
Use dependency injection to get Hadoop *out* of your application code
 

Viewers also liked

Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems IBM Power Systems
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data PlatformVikas Manoria
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM Cynthia Saracco
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure passJason Strate
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALinside-BigData.com
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalDiego Alberto Tamayo
 
Oracle Solaris Software Integration
Oracle Solaris Software IntegrationOracle Solaris Software Integration
Oracle Solaris Software IntegrationOTN Systems Hub
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016Łukasz Grala
 
Oracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud InfrastructureOracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud InfrastructureOTN Systems Hub
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...Mark Rittman
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3OTN Systems Hub
 
The Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPSThe Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPSinside-BigData.com
 

Viewers also liked (20)

Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Open Innovation with Power Systems
Open Innovation with Power Systems Open Innovation with Power Systems
Open Innovation with Power Systems
 
IBM Power8 announce
IBM Power8 announceIBM Power8 announce
IBM Power8 announce
 
Overview - IBM Big Data Platform
Overview - IBM Big Data PlatformOverview - IBM Big Data Platform
Overview - IBM Big Data Platform
 
Big Data: SQL on Hadoop from IBM
Big Data:  SQL on Hadoop from IBM Big Data:  SQL on Hadoop from IBM
Big Data: SQL on Hadoop from IBM
 
Accelerating Business Intelligence Solutions with Microsoft Azure pass
Accelerating Business Intelligence Solutions with Microsoft Azure   passAccelerating Business Intelligence Solutions with Microsoft Azure   pass
Accelerating Business Intelligence Solutions with Microsoft Azure pass
 
OpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORALOpenPOWER Roadmap Toward CORAL
OpenPOWER Roadmap Toward CORAL
 
OpenPOWER Update
OpenPOWER UpdateOpenPOWER Update
OpenPOWER Update
 
The State of Linux Containers
The State of Linux ContainersThe State of Linux Containers
The State of Linux Containers
 
IBM POWER8 as an HPC platform
IBM POWER8 as an HPC platformIBM POWER8 as an HPC platform
IBM POWER8 as an HPC platform
 
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_finalPresentacin webinar move_up_to_power8_with_scale_out_servers_final
Presentacin webinar move_up_to_power8_with_scale_out_servers_final
 
Blockchain
BlockchainBlockchain
Blockchain
 
Bitcoin explained
Bitcoin explainedBitcoin explained
Bitcoin explained
 
Oracle Solaris Software Integration
Oracle Solaris Software IntegrationOracle Solaris Software Integration
Oracle Solaris Software Integration
 
Expert summit SQL Server 2016
Expert summit   SQL Server 2016Expert summit   SQL Server 2016
Expert summit SQL Server 2016
 
Puppet + Windows Nano Server
Puppet + Windows Nano ServerPuppet + Windows Nano Server
Puppet + Windows Nano Server
 
Oracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud InfrastructureOracle Solaris Secure Cloud Infrastructure
Oracle Solaris Secure Cloud Infrastructure
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Oracle Solaris Build and Run Applications Better on 11.3
Oracle Solaris  Build and Run Applications Better on 11.3Oracle Solaris  Build and Run Applications Better on 11.3
Oracle Solaris Build and Run Applications Better on 11.3
 
The Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPSThe Quantum Effect: HPC without FLOPS
The Quantum Effect: HPC without FLOPS
 

Similar to Ibm big data ibm marriage of hadoop and data warehousing

Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)Ajay Ohri
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Mark Heid
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataEMC
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forumbigdatawf
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high levelJames Findlay
 
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Mauricio Godoy
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalAccenture the Netherlands
 
Infochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudInfochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudBrian Krpec
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Etu Solution
 
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)Mark Heid
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics AKAGroup
 
What is the Point of Hadoop
What is the Point of HadoopWhat is the Point of Hadoop
What is the Point of HadoopDataWorks Summit
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big dataNathan Bijnens
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelInside Analysis
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureOdinot Stanislas
 
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Cloudera, Inc.
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 

Similar to Ibm big data ibm marriage of hadoop and data warehousing (20)

Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)Ibm big data    hadoop summit 2012 james kobielus final 6-13-12(1)
Ibm big data hadoop summit 2012 james kobielus final 6-13-12(1)
 
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
Big Data Meets Social Analytics - IBM Connect 2012 (CN-CC13)
 
The New Enterprise Data Platform
The New Enterprise Data PlatformThe New Enterprise Data Platform
The New Enterprise Data Platform
 
Analyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast DataAnalyze This! Best Practices For Big And Fast Data
Analyze This! Best Practices For Big And Fast Data
 
IBM Stream au Hadoop User Group
IBM Stream au Hadoop User GroupIBM Stream au Hadoop User Group
IBM Stream au Hadoop User Group
 
Big Data World Forum
Big Data World ForumBig Data World Forum
Big Data World Forum
 
01 im overview high level
01 im overview high level01 im overview high level
01 im overview high level
 
Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?Robert LeBlanc - Why Big Data? Why Now?
Robert LeBlanc - Why Big Data? Why Now?
 
OSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - TechnicalOSC2012: Big Data Using Open Source: Netapp Project - Technical
OSC2012: Big Data Using Open Source: Netapp Project - Technical
 
Infochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the CloudInfochimps #1 Big Data Platform for the Cloud
Infochimps #1 Big Data Platform for the Cloud
 
Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案Big Data 視覺化分析解決方案
Big Data 視覺化分析解決方案
 
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 
Information Management and Analytics
Information Management and Analytics Information Management and Analytics
Information Management and Analytics
 
What is the Point of Hadoop
What is the Point of HadoopWhat is the Point of Hadoop
What is the Point of Hadoop
 
Getting more out of your big data
Getting more out of your big dataGetting more out of your big data
Getting more out of your big data
 
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data FunnelA Strategic View of Enterprise Reporting and Analytics: The Data Funnel
A Strategic View of Enterprise Reporting and Analytics: The Data Funnel
 
Big Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the FutureBig Data Beyond Hadoop*: Research Directions for the Future
Big Data Beyond Hadoop*: Research Directions for the Future
 
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
Modernizing Your IT Infrastructure with Hadoop - Cloudera Summer Webinar Seri...
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

Ibm big data ibm marriage of hadoop and data warehousing

  • 1. June 2012 IBM Big Data The Marriage of Hadoop and Data Warehousing James Kobielus Senior Program Director, Product Marketing, Big Data, IBM © 2012 IBM Corporation
  • 2. Hadoop and DW are fast being joined into a new platform paradigm: the Hadoop DW 2 © 2012 IBM Corporation
  • 3. Agenda §  Big Data: 3 Vs and myriad use cases §  Big Data: diverse workloads §  Big Data: emergence of the Hadoop DW 3 © 2012 IBM Corporation
  • 4. Agenda §  Big Data: 3 Vs and myriad use cases §  Big Data: diverse workloads §  Big Data: emergence of the Hadoop DW 4 © 2012 IBM Corporation
  • 5. Scalability Imperative: 3 Vs Drive Big Data Everywhere Information Radical Extreme from Everywhere Flexibility Scalability Volume Velocity Variety 5 12 terabytes of Tweets created daily 5 million trade events per second 100’s from surveillance cameras video feeds © 2012 IBM Corporation
  • 6. More Business Use Cases for Big Data Across Enterprise 6 © 2012 IBM Corporation
  • 7. More Mission-Critical Apps Ride on Big Data Platforms Advanced Analytic Applications §  Integrate and manage the full variety, velocity and volume of data §  Apply advanced analytics to information in its native form Big Data Platform §  Visualize all available data for ad-hoc analysis Process and analyze any type of data and discovery Accelerators §  Development environment for building new analytic applications §  Integration and deploy applications with enterprise grade availability, manageability, security, and performance •  Analyze data in motion •  Visualization and •  MapReduce / noSQL exploration •  Machine Learning •  Scalability •  Text Analytics •  Hardware •  Text Search acceleration •  Data Discovery •  Stream computing 7 © 2012 IBM Corporation
  • 8. Big Data: Business Crucible for Practical Data Science Business and IT Identify Information Sources Available New insights IT Delivers a drive integration Platform that to traditional enables creative technology exploration of all available data and content Business determines what questions to ask by exploring the data and relationships 8 © 2012 IBM Corporation
  • 9. Big Data Initiatives: Fueled by Practical Data Science Analyze a Variety of Information Novel analytics on a broad set of mixed information that could not be analyzed before Analyze Information in Motion Streaming data analysis Large volume data bursts and ad-hoc analysis Analyze Extreme Volumes of Information Cost-efficiently process and analyze PBs of information Manage & analyze high volumes of structured, relational data Discover and Experiment Ad-hoc analytics, data discovery and experimentation Manage and Plan Enforce data structure, integrity and control to 9 ensure consistency for repeatable queries IBM Corporation © 2012
  • 10. Big Data: Marriage of Established & Emerging Approaches Established Approach Emerging Approaches Structured, analytical, logical Creative, holistic thought, intuition DW Hadoop, etc. Transaction Data Web Logs Internal App Data Social Data Structured Unstructured Structured Enterprise Exploratory Exploratory Repeatable Repeatable Linear Integration Iterative Iterative Text Data: emails Mainframe Data Linear Monthly sales reports Brand sentiment Profitability analysis Product strategy OLTP SystemCustomer surveys Data Sensor data: images Maximum asset utilization ERP data Traditional New RFID Sources Sources 10 © 2012 IBM Corporation
  • 11. Agenda §  Big Data: 3 Vs and myriad use cases §  Big Data: diverse workloads §  Big Data: emergence of the Hadoop DW 11 © 2012 IBM Corporation
  • 12. Continuous Social Media Monitoring and Analytics Data Set Information extracted •  1.1B tweets •  Buzz and sentiment •  5.7M blog and forum posts •  Gender, Location and Occupation •  3.5M relevant messages •  Fans •  97K referencing Product A •  Intent to in purchase •  18K referencing Product B •  Specific attributes of products 12 © 2012 IBM Corporation
  • 13. Content mining, natural language processing, & classification §  How it works Unstructured text (document, email, etc) –  Parses text and detects meaning with extractors Football World Cup 2010, one team –  Understands the context in which the text is analyzed distinguished themselves well, losing to the eventual champions 1-0 in the Final. –  Hundreds of pre-built extractors for names, addresses, phone numbers, organizations, URL, Early in the second half, Netherlands’ Datetime, etc. striker, Arjen Robben, had a breakaway, but the keeper for Spain, Iker Casillas §  Accuracy made the save. Winger Andres Iniesta –  Highly accurate in deriving meaning from scored for Spain for the win. complex text §  Performance –  AQL language optimized for MapReduce Classification and Insight World Cup 2010 Highlights 13 © 2012 IBM Corporation
  • 14. Entity Extraction and Integration 14 © 2012 IBM Corporation
  • 15. Statistical Analysis, Predictive Modeling, & Machine Learning Enables Machine learning (ML) on massive datasets §  R and Matlab-like syntax for smooth adoption §  Optimizations to generate low-level executions plans §  Out-of-box and write-your-own analytic algorithms, e.g. Regression, Clustering, Classification, Pattern Mining, Ranking, etc. §  Scale to massively parallel clusters from 10s to 1000s of machines and from Terabytes to Petabytes What are people talking about in social media about a product? 15 15 © 2012 IBM Corporation
  • 16. Targeted E-Commerce and Next Best Action 16 © 2012 IBM Corporation
  • 17. Predictive Complex Event Processing 17 © 2012 IBM Corporation
  • 18. Intent and Sentiment Analysis Online flow: Data-in-motion analysis Data Sources Stream Computing and Analytics Timely Decisions Entity Predictive Data Ingest Text Analytics: Analytics: Analytics: and Prep Timely Insights Profile Action Resolution Determination Dashboard Hadoop System and Analytics Comprehensive Entity Social Media and Social Media Predictive Customer Text Analytics Analytics and Enterprise Data Customer Analytics Models Integration Profiles Offline flow: Data-at-rest analysis Reports 18 © 2012 IBM Corporation
  • 19. Agenda §  Big Data: 3 Vs and myriad use cases §  Big Data: diverse workloads §  Big Data: emergence of the Hadoop DW 19 © 2012 IBM Corporation
  • 20. Big Data: DW & Hadoop are Married in Spirit Cloud-facing architectures models Massively policies metadata aggregates parallel DQ MDM hubs marts processing cubes ETL databases DW In-database views storage memory staging production cache in-database analytics nodes tables analytics operational data stores Mixed workload management Hybrid storage layers 20 © 2012 IBM Corporation
  • 21. Hadoop is Core of Next-Gen Big Data DW §  Vendor-agnostic framework for massively parallel processing of advanced analytics against polystructured information §  Leverages extensible framework for building advanced analytics and data management functions §  Evolving rapidly in new directions §  Being commercialized and adopted rapidly in enterprises §  Vibrant open-source community and industry 21 © 2012 IBM Corporation
  • 22. Hadoop, DW, and other Databases Co-Exist in Big Data Ecosystem Hadoop & In-memory NoSQL DW RDBMS Columnar OLAP Big Data staging, ETL, and Big Data SVOT and Big Data access preprocessing tier governance tier and interaction tier 22 © 2012 IBM Corporation
  • 23. How Hadoop and DW Complement Each Other 23 © 2012 IBM Corporation
  • 24. Single Version of Big Data: Where Hadoop DW Will Excel Timely Insights • Intent to see a movie title, buy a product • Current Location Life Events Products Interests • Life-changing events: relocation, having a • Personal preferences of product and services baby, getting married, getting divorced, • Product purchase history buying a house Personal Attributes Relationships Social media based • Personal relationships: family, friends • Identifiers: name, address, age, gender and roommates… • Interests: sports, pets, cuisine… 360-degree • Business relationships: co-workers and • Life Cycle Status: marital, parental consumer profiles work/interests network… Monetizable intent to see a Monetizable intent to buy Kinda feel like going to movies tonight… Any I need a new digital camera for my food pictures, and recommendations? @Texas Angelika Texas recommendations around 300? I don t think anyone understands how much I like What should I buy?? A mini laptop with Windows 7 OR a Apply watching movies. My 3rd trip to the threatre in 3 days. MacBook!??! Life Events Location announcements College: Off to Standard for my MBA! Bbye chicago! I m at Starbucks Parque Tezontle http://4sq.com/ fYReSj Looks like we ll be moving to New Orleans sooner than I 24 thought. © 2012 IBM Corporation
  • 25. Hadoop DW Integration: What to Look For models §  Hadoop distro functional depth policies metadata aggregates §  EDW HDFS connector DQ MDM hubs marts cubes ETL databases DW §  Software, appliance, and cloud form factors for views storage Hadoop offerings staging memory nodes production cache in-database §  Pluggable storage layer for Hadoop offerings tables operational analytics §  Bundled data management and analytics data stores offerings integrated with Hadoop solutions §  Modeling, management, acceleration, and optimization tools §  Real-time/low-latency capabilities integrated into Hadoop offerings §  Robust availability, security, and workload management tools integrated with Hadoop offerings §  And many more, focused on EDW-grade robustness, scalability, and flexibility! 25 © 2012 IBM Corporation
  • 26. Consider Big Data Platform Accelerators Telecommunications Retail Customer CDR streaming analytics Intelligence Deep Network Analytics Customer Behavior and Lifetime Value Analysis Finance Social Media Analytics Streaming options trading Sentiment Analytics, Intent to Insurance and banking DW purchase models Public transportation Data mining Real-time monitoring and Streaming statistical analysis routing optimization Over 100 sample User Defined Standard Toolkits Industry Data Models applications Toolkits Banking, Insurance, Telco, Healthcare, Retail 26 © 2012 IBM Corporation
  • 27. How Will You Do MDM on Your Hadoop DW? (A1) Unstructured Entity Integration (on BigInsights) –  Complex analytics to populate master data set –  Text Analytics: Rule language (AQL) for extracting entities, events, relationships from text and html documents –  Entity Integration: Rule language (HIL) to express & MDM DaaS customize the integration, cleansing, and aggregation of Applications the master entities and Views (A2) Entity Repository (on MDM) –  BigInsights Bridge: Generation of the MDM model for public master entities, from the BigInsights model; and select cik, Officers, Directors bulk-loading of master entities from Company Data services where name = Citigroup –  Query-based Application Development: Supports the generation of custom queries for individual applications Tooling based Queries on entity model A2 External data subscriptions (e.g., Acxiom) A1 Relational tables SELECT * FROM with master (SELECT t2.CIK as CIK, t2.NAME as NAME, t2.IS_FORMER_OFFICER as IS_FORMER_OFFICER, t2.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, t2.POSITION_NAME as POSITION_NAME, Text Analytics entities FROM tp.EARLIEST_DATE as EARLIEST_DATE, tp.IS_EARLIEST_EXACT as IS_EARLIEST_EXACT, tp.LATEST_DATE as LATEST_DATE, tp.IS_LATEST_EXACT as IS_LATEST_EXACT External public data and (SELECT t1.CIK as CIK, t1.NAME as NAME,t1.IS_FORMER_OFFICER as IS_FORMER_OFFICER, t1.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, p.NAME as POSITION_NAME, p.POSITIONSPK_ID as POSITIONSPK_ID sources Entity Integration FROM (SELECT o.CIK as CIK, o.NAME as NAME, o.IS_FORMER_OFFICER as IS_FORMER_OFFICER, o.IS_IMPORTANT_OFFICER as IS_IMPORTANT_OFFICER, o.OFFICERSPK_ID as OFFICERSPK_ID (e.g., SEC/FDIC, FROM DB2ADMIN.OFFICERS o WHERE o.OFFICER_OF = 567830643756635868 ) as t1 Twitter, Blogs, BigInsights InfoSphere MDM left outer join DB2ADMIN.POSITIONS p on t1.OFFICERSPK_ID= p.POSITIONOF ) as t2 Facebook) left outer join D2ADMIN.RANGEOFKNOWNDATES tp with Extensions UNION on t2.POSITIONSPK_ID = tp.RANGE_OF_KNOWN_DATES_FOR_POS ) // ( OUTER UNION) … 27 © 2012 IBM Corporation
  • 28. IBM Big Data Platform New analytic applications drive the Analytic Applications requirements for a big data platform BI / Exploration / Functional Industry Predictive Content Reporting Visualization App App BI / Analytics Analytics Reporting •  Integrate and manage the full IBM Big Data Platform variety, velocity and volume of data Visualization Application Systems •  Apply advanced analytics to & Discovery Development Management information in its native form •  Visualize all available data for ad- Accelerators hoc analysis •  Development environment for Hadoop Stream Data System Computing Warehouse building new analytic applications •  Workload optimization and scheduling •  Security and Governance Information Integration & Governance © 2012 IBM Corporation
  • 29. Thank You! 29 © 2012 IBM Corporation