SlideShare a Scribd company logo
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   1




INTEGRATING BIG
DATA
Dataversity Webinar
Feb 7 2012
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   2




State of Data Today
©2012 Sixth Sense Advisors, Inc. All Rights Reserved    3




A Growing Trend
 Expectations for BI are changing w/o anyone telling us

  Requirement         Expectations                               Reality
     Speed         Speed of the Internet              Speed = Infra + Arch +
                                                            Design
  Accessibility      Accessibility of a                   BI Tool licenses &
                       Smartphone                              security
    Usability         IPAD - Mobility                   Web Enabled BI Tool
   Availability       Google Search                  Data & Report Metadata
    Delivery        Speed of questions                Methodology & Signoff
      Data         Access to everything                    Structured Data
   Scalability       Cloud (Amazon)                    Existing Infrastructure
      Cost        Cell phone or Free WIFI                        Millions
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   4



The	
  Wisdom	
  of	
  Crowds	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   5


Data	
  Deluge	
  =	
  Business	
  Insights	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   6	
  



   BIG	
  Data	
  
Structured             Current                       New

                      ERP
                      CRM
                      SCM


                     Content
                     Management
                     Systems

                     Email
                     Call Center

                     Documents
                     Contracts


UnStructured
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   7




What’s so Big about Big Data

            Velocity
            Volume
            Variety
           Complexity
           Ambiguity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   8


               So you are about to start the Big
               Data Project

   Tools                                                               Output




                     Data


instructions
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   9	
  




        The	
  Normal	
  Way	
  Results	
  In	
  ……..	
  




Image Source: Web
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   10	
  




  Why	
  Big	
  Data	
  can	
  Fail	
  on	
  the	
  RDBMS?	
  

                         New Data Types
   Current
                          New volume
     Data                                                             •  POOR
 Management               New analytics                                  Performance
   Platform                                                           •  Failed
(RDBMS + ETL             New workload                                    Programs
     +BI)                New metadata


                                                             Scalability; Sharding; ACID;
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   11	
  




BIG Data
•  Workload Demands                   •  Infrastructure
   •  Process dynamic data              Requirements
      content                             •  Scalable platform
   •  Process unstructured                •  Database independence
      data                                •  Fault tolerant
   •  Systems that can scale                 architectures
      up and scale out with               •  Low cost of acquisition
      high volume data                       and store
   •  Perform complex
                                          •  Supported by standard
      operations within                      toolsets
      reasonable response
      time
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   12




Hadoop


                                               Design Goals
                                               ü  System Shall Manage and
                                                   Heal Itself
                                               ü  Performance Shall Scale
                                                   Linearly
                                               ü  Compute Shall Move to
                                                   Data
                                               ü  Simple Core, Modular and
                                                   Extensible
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   13


Hadoop Differentiators

 Schema-on-Write: RDBMS                       Schema-on-Read: Hadoop
•    Schema must be created                   •    Data is simply copied to the file
     before data is loaded.                        store, no special transformation
                                                   is needed.
•    An explicit load operation has
     to take place which transforms           •    A SerDe (Serializer/Deserlizer)
     the data to the internal                      is applied during read time to
     structure of the database.                    extract the required columns.
•    New columns must be added                •    New data can start flowing
     explicitly before data for such               anytime and will appear
     columns can be loaded into                    retroactively once the SerDe is
     the database.                                 updated to parse them.
•    Read is Fast.                            •    Load is Fast
•    Standards/Governance.                    •    Evolving Schemas/Agility
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   14




Hadoop Known Limitations
•  Write-once model
•  A namespace with an extremely large number of files exceeds
   Namenode’s capacity to maintain
•  Cannot be mounted by exisiting OS
  •  Getting data in and out is tedious
  •  Virtual File System can solve problem
•  HDFS does not implement / support
   •  User quotas
   •  Access permissions
   •  Hard or soft links
   •  Data balancing schemes
•  No periodic checkpoints
•  Namenode is single point of failure
   •  Automatic restart and failover to another machine not yet supported
©2012 Sixth Sense Advisors, Inc. All Rights Reserved    15

   Hadoop Tips
•  Hadoop is useful                                    •  Implementation
   •  When you must process lots of                        •  Think big, start small
      unstructured data                                    •  Build on agile cycles
   •  When running batch jobs is                           •  Focus on the data, as you will
      acceptable                                              always develop schema on
   •  When you have access to lots of                         write.
      cheap hardware



                                                       •  Available Optimizations
•  Hadoop is not useful
                                                           •    Input to Maps
   •  For intense calculations with little or              •    Map only jobs
      no data                                              •    Combiner
   •  When your data is not self-contained                 •    Compression
                                                           •    Speculation
   •  When you need interactive results
                                                           •    Fault Tolerance
                                                           •    Buffer Size
                                                           •    Parallelism (threads)
                                                           •    Partitioner
                                                           •    Reporter
                                                           •    DistributedCache
                                                           •    Task child environment settings
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   16




 Hadoop Tips
•  Troubleshooting                                  •  Performance Tuning
  •  Are your partitions uniform?                       •  Increase the memory/buffer allocated
  •  Can you combine records at the map                      to the tasks
       side?                                            •    Increase the number of tasks that can
  •    Are maps reading off a DFS block                      be run in parallel
       worth of data?                                   •    Increase the number of threads that
  •    Are you running a single reduce wave                  serve the map outputs
       (unless the data size per reducers is            •    Disable unnecessary logging
       too big) ?                                       •    Turn on speculation
  •    Have you tried compressing                       •    Run reducers in one wave as they
       intermediate data & final data?                       tend to get expensive
  •    Are there buffer size issues                     •    Tune the usage of DistributedCache,
  •    Do you see unexplained “long tails”                   it can increase efficiency
  •    Are your CPU cores busy?
  •    Is at least one system resource being
       loaded?
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   17




NoSQL
•  Stands for Not Only SQL
•  Based on CAP Theorem
•  Usually do not require a fixed table schema nor do they
   use the concept of joins
•  All NoSQL offerings relax one or more of the ACID
   properties
•  NoSQL databases come in a variety of flavors
  •  XML (myXMLDB, Tamino, Sedna)
  •  Wide Column (Cassandra, Hbase, Big Table)
  •  Key/Value (Redis, Memcached with BerkleyDB)
  •  Graph (neo4j, InfoGrid)
  •  Document store (CouchDB, MongoDB)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved      18




 NoSQL Footprint

           Key       Amazon Dynamo
          Value


       Voldermort               Big       Google Big Table
                               Table
Size
                              HBase                                Lotus Notes
                                                         Doc
                                                       Database
                  Cassandra                                                                   Graph
                                                                                      Graph
                                                                                              Theory




                                   Complexity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   19




    NoSQL
•  Access and Query                      •  Best Practices
    •  RESTful interfaces (HTTP as an        •  Design for data collection
       accessAPI)                            •  Plan the data store
    •  Query languages other than SQL        •  Organize by type and semantics
        •  SPARQL - Query language for       •  Partition for performance
           the SemanticWeb                        •  Access and Query is run time
        •  Gremlin - the graph traversal             dependent
           language                          •  Horizontal scaling
        •  Sones Graph Query Language        •  Memory Caching
    •  Data Manipulation / Query API
        •  The Google BigTable
           DataStoreAPI
        •  The Neo4jTraversalAPI
    •  Serialization Formats
        •  JSON
        •  Thrift
        •  ProtoBuffers
        •  RDF
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   20




     Textual ETL Engine
Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of
data that can be analyzed by standard analytical tools


                                                         •     Textual ETL Engine provides a robust user
                                                               interface to define rules (or patterns / keywords)
                                                               to process unstructured or semi-structured data.
                                                         •     The rules engine encapsulates all the complexity
                                                               and lets the user define simple phrases and
                                                               keywords
                                                         •     Easy to implement and easy to realize ROI




•    Advantages                                               •    Disadvantages
       •  Simple to use                                              •  Not integrated with Hadoop as a rules
       •  No MR or Coding required for text analysis                    interface
          and mining                                                 •  Currently uses Sqoop for metadata
       •  Extensible by Taxonomy integration                            interchange with Hadoop or NoSQL
       •  Works on standard and new databases                           interfaces
       •  Produces a highly columnar key-value                       •  Current GA does not handle distributed
          store, ready for metadata integration                         processing outside Windows platform
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   21




Integration
•  All RDBMS vendors today are supporting Hadoop or NoSQL as
 an integration or extension
  •    Oracle Exalytics / Big Data Appliance
  •    Teradata Aster Appliance
  •    EMC Greenplum Appliance
  •    IBM BigInsights
  •    Microsoft Windows Azure Integration
•  There are multiple providers of Hadoop distribution
   •  CloudEra
   •  HortonWorks
   •  Zettaset
•  Adapters from vendors to interface with CloudEra or
 HortonWorks distributions of Hadoop are available today. There
 are integration efforts to release Hadoop as an integral engine
 across the RDBMS vendor platforms
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   22

           Conceptual	
  SoluEon	
  Architecture	
  
                                                  Metadata             MDM


              ETL
                                Data
OLTP          ELT
                              Warehouse                                            Reporting
              CDC
                                                                                   Analytics
                                                     DataMart’s                     Search
                                                                                     OLAP
                                                                                  Text Mining
                               Big Data                                         Content Analytics
BIG Data      Textual            DW                                            Knowledge Analytics
Content        ETL
 Email                         Taxonomy
  Docs
              And / Or

           MR / Ruby / Java
              (Hadoop)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   23




Integration Tips
•  The key to the castle in integrating Big Data is metadata
•  Whatever the tool, technology and technique, if you do not
   know your metadata, your integration will fail
•  Semantic technologies and architectures will be the way to
   process and integrate the Big Data, much akin to Web 2.0
   models
•  Data quality for Big Data is a very questionable goal. To get
   some semblance of quality, taxonomies and ontologies can be
   of help
•  3rd part data providers also provide keywords, trending tags
   and scores, these can provide a lot of integration support
•  Writing business rules for Big Data can be very cumbersome
   and not all programs can be written in MapReduce
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   24


Which Tool


  Application      Hadoop              NoSQL               Textual ETL
Machine Learning     x                     x
  Sentiments         x                     x                       x
Text Processing      x                     x                       x
Image Processing     x                     x
 Video Analytics     x                     x
  Log Parsing        x                     x                       x
  Collaborative      x                     x                       x
    Filtering
 Context Search                                                    x
Email & Content                                                    x
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   25

Success	
  Stories	
  
 •  Machine learning & Recommendation Engines – Amazon,
      Orbitz
 •    CRM - Consumer Analytics, Metrics, Social Network
      Analytics, Churn, Sentiment, Influencer, Proximity
 •    Finance – Fraud, Compliance
 •    Telco – CDR, Fraud
 •    Healthcare – Provider / Patient analytics, fraud, proactive
      care
 •    Lifesciences – clinical analytics, physician outreach
 •    Pharma – Pharmacovigilance, clinical trials
 •    Insurance – fraud, geo-spatial
 •    Manufacturing – warranty analytics, supplier quality
      metrics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   26




Data Science

Data Analytics                 Art & Science                          APPLIED SCIENCE

 Content                                                       User Interest Prediction
 Customer                                                         inventory prediction
 Product                                                              Machine learning
 Behaviors                                                              Pattern Mining
 Optimization                                                   Advanced Regression
 Big Data Processing & ETL                                                    Analysis



Business Intelligence
                                                                        Advanced Analytics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   27

Challenges	
  
 •  Resources	
  Availability	
  
 •  MR	
  is	
  hard	
  to	
  implement	
  
 •  Speech	
  to	
  text	
  
     •  ConversaEon	
  context	
  is	
  oJen	
  missing	
  
     •  Quality	
  of	
  recording	
  
     •  Accent	
  issues	
  
 •  Visual	
  data	
  tagging	
  
     •  Images	
  
     •  Text	
  embedded	
  within	
  images	
  
 •  Metadata	
  is	
  not	
  available	
  
 •  Data	
  is	
  not	
  trusted	
  	
  
 •  Content	
  management	
  plaMorm	
  capabiliEes	
  
 •  Ontologies	
  Ambiguity	
  
 •  Taxonomy	
  IntegraEon	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   28




Contact
•  Krish Krishnan
   rkrish1124@yahoo.com
       Twitter: @datagenius

More Related Content

What's hot

Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
qureshihamid
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
Drew Hansen
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
Snowflake Computing
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
Databricks
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing conceptspcherukumalla
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
Nilesh Gule
 
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the CloudOracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Markus Michalewicz
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
Ivo Andreev
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
Sivakumar Ramar
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
Databricks
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
Sunil Gurav
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
Matillion
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architectureAmit Bhalla
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Edureka!
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
Databricks
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
Databricks
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
elephantscale
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Denodo
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data Cloud
Kent Graziano
 

What's hot (20)

Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
Snowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD PipelinesSnowflake Automated Deployments / CI/CD Pipelines
Snowflake Automated Deployments / CI/CD Pipelines
 
Elastic Data Warehousing
Elastic Data WarehousingElastic Data Warehousing
Elastic Data Warehousing
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Date warehousing concepts
Date warehousing conceptsDate warehousing concepts
Date warehousing concepts
 
Part 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure SynapsePart 3 - Modern Data Warehouse with Azure Synapse
Part 3 - Modern Data Warehouse with Azure Synapse
 
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the CloudOracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
Oracle RAC Virtualized - In VMs, in Containers, On-premises, and in the Cloud
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
An overview of snowflake
An overview of snowflakeAn overview of snowflake
An overview of snowflake
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
Master the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - SnowflakeMaster the Multi-Clustered Data Warehouse - Snowflake
Master the Multi-Clustered Data Warehouse - Snowflake
 
Less01 architecture
Less01 architectureLess01 architecture
Less01 architecture
 
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | EdurekaData Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
Data Warehouse Concepts | Data Warehouse Tutorial | Data Warehousing | Edureka
 
Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0Achieving Lakehouse Models with Spark 3.0
Achieving Lakehouse Models with Spark 3.0
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Rise of the Data Cloud
Rise of the Data CloudRise of the Data Cloud
Rise of the Data Cloud
 

Viewers also liked

Exploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in MuseumsExploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in Museums
Brendan Ciecko
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
lljohnston
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)
Mia
 
Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies
exouniversity
 
QlikView & Big Data
QlikView & Big DataQlikView & Big Data
QlikView & Big Data
Mischa van Werkhoven
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
Den Reymer
 
Introduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakdaIntroduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakda
BAINIDA
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017
Den Reymer
 

Viewers also liked (9)

Exploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in MuseumsExploring Artificial Intelligence in Museums
Exploring Artificial Intelligence in Museums
 
Liam
LiamLiam
Liam
 
Cultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data CollectionsCultural Heritage Insitutions and Big Data Collections
Cultural Heritage Insitutions and Big Data Collections
 
Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)Planning for big data (lessons from cultural heritage)
Planning for big data (lessons from cultural heritage)
 
Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies Artificial Intelligence - A.I. Aggressive Technologies
Artificial Intelligence - A.I. Aggressive Technologies
 
QlikView & Big Data
QlikView & Big DataQlikView & Big Data
QlikView & Big Data
 
Big Data Industry Insights 2015
Big Data Industry Insights 2015 Big Data Industry Insights 2015
Big Data Industry Insights 2015
 
Introduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakdaIntroduction to big data and analytic eakasit patcharawongsakda
Introduction to big data and analytic eakasit patcharawongsakda
 
Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017Gartner TOP 10 Strategic Technology Trends 2017
Gartner TOP 10 Strategic Technology Trends 2017
 

Similar to Integrating Big Data Technologies

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Cloudera, Inc.
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
Gwen (Chen) Shapira
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...Amr Awadallah
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Cloudera, Inc.
 
Blueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biBlueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biDataWorks Summit
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
Seeling Cheung
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
ArunshankarArjunan
 
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesSharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
Henry Ong
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
Gustav Lundström
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
inside-BigData.com
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Cloudera, Inc.
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
DataWorks Summit
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
Teradata Aster
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Precisely
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
Slim Baltagi
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Hortonworks
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopHortonworks
 

Similar to Integrating Big Data Technologies (20)

Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Integrated dwh 3
Integrated dwh 3Integrated dwh 3
Integrated dwh 3
 
Integrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle DatabaseIntegrated Data Warehouse with Hadoop and Oracle Database
Integrated Data Warehouse with Hadoop and Oracle Database
 
Beyond TCO
Beyond TCOBeyond TCO
Beyond TCO
 
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
How Apache Hadoop is Revolutionizing Business Intelligence and Data Analytics...
 
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache HadoopBusiness Intelligence and Data Analytics Revolutionized with Apache Hadoop
Business Intelligence and Data Analytics Revolutionized with Apache Hadoop
 
Blueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and biBlueprint for integrating big data analytics and bi
Blueprint for integrating big data analytics and bi
 
Hadoop and SQL: Delivery Analytics Across the Organization
Hadoop and SQL:  Delivery Analytics Across the OrganizationHadoop and SQL:  Delivery Analytics Across the Organization
Hadoop and SQL: Delivery Analytics Across the Organization
 
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...Ibm info sphere datastage and hadoop   two best-of-breed solutions together-f...
Ibm info sphere datastage and hadoop two best-of-breed solutions together-f...
 
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data ServicesSharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
SharePoint 2010 Managed Metadata vs SQL 2012 Master Data Services
 
Ibm db2update2019 icp4 data
Ibm db2update2019   icp4 dataIbm db2update2019   icp4 data
Ibm db2update2019 icp4 data
 
Practical introduction to hadoop
Practical introduction to hadoopPractical introduction to hadoop
Practical introduction to hadoop
 
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
Hadoop World 2011: How Hadoop Revolutionized Business Intelligence and Advanc...
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which Hadoop and the Data Warehouse: When to Use Which
Hadoop and the Data Warehouse: When to Use Which
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data SlidesUtrecht NL-HUG/Data Science-NL - Agile Data Slides
Utrecht NL-HUG/Data Science-NL - Agile Data Slides
 
Paris HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on HadoopParis HUG - Agile Analytics Applications on Hadoop
Paris HUG - Agile Analytics Applications on Hadoop
 

More from DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
DATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
DATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
DATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
DATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
DATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
DATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
DATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
DATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
DATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
DATAVERSITY
 

More from DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Recently uploaded

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 

Recently uploaded (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 

Integrating Big Data Technologies

  • 1. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 1 INTEGRATING BIG DATA Dataversity Webinar Feb 7 2012
  • 2. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 2 State of Data Today
  • 3. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 3 A Growing Trend Expectations for BI are changing w/o anyone telling us Requirement Expectations Reality Speed Speed of the Internet Speed = Infra + Arch + Design Accessibility Accessibility of a BI Tool licenses & Smartphone security Usability IPAD - Mobility Web Enabled BI Tool Availability Google Search Data & Report Metadata Delivery Speed of questions Methodology & Signoff Data Access to everything Structured Data Scalability Cloud (Amazon) Existing Infrastructure Cost Cell phone or Free WIFI Millions
  • 4. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 4 The  Wisdom  of  Crowds  
  • 5. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 5 Data  Deluge  =  Business  Insights  
  • 6. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 6   BIG  Data   Structured Current New ERP CRM SCM Content Management Systems Email Call Center Documents Contracts UnStructured
  • 7. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 7 What’s so Big about Big Data Velocity Volume Variety Complexity Ambiguity
  • 8. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 8 So you are about to start the Big Data Project Tools Output Data instructions
  • 9. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 9   The  Normal  Way  Results  In  ……..   Image Source: Web
  • 10. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 10   Why  Big  Data  can  Fail  on  the  RDBMS?   New Data Types Current New volume Data •  POOR Management New analytics Performance Platform •  Failed (RDBMS + ETL New workload Programs +BI) New metadata Scalability; Sharding; ACID;
  • 11. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 11   BIG Data •  Workload Demands •  Infrastructure •  Process dynamic data Requirements content •  Scalable platform •  Process unstructured •  Database independence data •  Fault tolerant •  Systems that can scale architectures up and scale out with •  Low cost of acquisition high volume data and store •  Perform complex •  Supported by standard operations within toolsets reasonable response time
  • 12. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 12 Hadoop Design Goals ü  System Shall Manage and Heal Itself ü  Performance Shall Scale Linearly ü  Compute Shall Move to Data ü  Simple Core, Modular and Extensible
  • 13. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 13 Hadoop Differentiators Schema-on-Write: RDBMS Schema-on-Read: Hadoop •  Schema must be created •  Data is simply copied to the file before data is loaded. store, no special transformation is needed. •  An explicit load operation has to take place which transforms •  A SerDe (Serializer/Deserlizer) the data to the internal is applied during read time to structure of the database. extract the required columns. •  New columns must be added •  New data can start flowing explicitly before data for such anytime and will appear columns can be loaded into retroactively once the SerDe is the database. updated to parse them. •  Read is Fast. •  Load is Fast •  Standards/Governance. •  Evolving Schemas/Agility
  • 14. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 14 Hadoop Known Limitations •  Write-once model •  A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain •  Cannot be mounted by exisiting OS •  Getting data in and out is tedious •  Virtual File System can solve problem •  HDFS does not implement / support •  User quotas •  Access permissions •  Hard or soft links •  Data balancing schemes •  No periodic checkpoints •  Namenode is single point of failure •  Automatic restart and failover to another machine not yet supported
  • 15. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 15 Hadoop Tips •  Hadoop is useful •  Implementation •  When you must process lots of •  Think big, start small unstructured data •  Build on agile cycles •  When running batch jobs is •  Focus on the data, as you will acceptable always develop schema on •  When you have access to lots of write. cheap hardware •  Available Optimizations •  Hadoop is not useful •  Input to Maps •  For intense calculations with little or •  Map only jobs no data •  Combiner •  When your data is not self-contained •  Compression •  Speculation •  When you need interactive results •  Fault Tolerance •  Buffer Size •  Parallelism (threads) •  Partitioner •  Reporter •  DistributedCache •  Task child environment settings
  • 16. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 16 Hadoop Tips •  Troubleshooting •  Performance Tuning •  Are your partitions uniform? •  Increase the memory/buffer allocated •  Can you combine records at the map to the tasks side? •  Increase the number of tasks that can •  Are maps reading off a DFS block be run in parallel worth of data? •  Increase the number of threads that •  Are you running a single reduce wave serve the map outputs (unless the data size per reducers is •  Disable unnecessary logging too big) ? •  Turn on speculation •  Have you tried compressing •  Run reducers in one wave as they intermediate data & final data? tend to get expensive •  Are there buffer size issues •  Tune the usage of DistributedCache, •  Do you see unexplained “long tails” it can increase efficiency •  Are your CPU cores busy? •  Is at least one system resource being loaded?
  • 17. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 17 NoSQL •  Stands for Not Only SQL •  Based on CAP Theorem •  Usually do not require a fixed table schema nor do they use the concept of joins •  All NoSQL offerings relax one or more of the ACID properties •  NoSQL databases come in a variety of flavors •  XML (myXMLDB, Tamino, Sedna) •  Wide Column (Cassandra, Hbase, Big Table) •  Key/Value (Redis, Memcached with BerkleyDB) •  Graph (neo4j, InfoGrid) •  Document store (CouchDB, MongoDB)
  • 18. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 18 NoSQL Footprint Key Amazon Dynamo Value Voldermort Big Google Big Table Table Size HBase Lotus Notes Doc Database Cassandra Graph Graph Theory Complexity
  • 19. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 19 NoSQL •  Access and Query •  Best Practices •  RESTful interfaces (HTTP as an •  Design for data collection accessAPI) •  Plan the data store •  Query languages other than SQL •  Organize by type and semantics •  SPARQL - Query language for •  Partition for performance the SemanticWeb •  Access and Query is run time •  Gremlin - the graph traversal dependent language •  Horizontal scaling •  Sones Graph Query Language •  Memory Caching •  Data Manipulation / Query API •  The Google BigTable DataStoreAPI •  The Neo4jTraversalAPI •  Serialization Formats •  JSON •  Thrift •  ProtoBuffers •  RDF
  • 20. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 20 Textual ETL Engine Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of data that can be analyzed by standard analytical tools •  Textual ETL Engine provides a robust user interface to define rules (or patterns / keywords) to process unstructured or semi-structured data. •  The rules engine encapsulates all the complexity and lets the user define simple phrases and keywords •  Easy to implement and easy to realize ROI •  Advantages •  Disadvantages •  Simple to use •  Not integrated with Hadoop as a rules •  No MR or Coding required for text analysis interface and mining •  Currently uses Sqoop for metadata •  Extensible by Taxonomy integration interchange with Hadoop or NoSQL •  Works on standard and new databases interfaces •  Produces a highly columnar key-value •  Current GA does not handle distributed store, ready for metadata integration processing outside Windows platform
  • 21. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 21 Integration •  All RDBMS vendors today are supporting Hadoop or NoSQL as an integration or extension •  Oracle Exalytics / Big Data Appliance •  Teradata Aster Appliance •  EMC Greenplum Appliance •  IBM BigInsights •  Microsoft Windows Azure Integration •  There are multiple providers of Hadoop distribution •  CloudEra •  HortonWorks •  Zettaset •  Adapters from vendors to interface with CloudEra or HortonWorks distributions of Hadoop are available today. There are integration efforts to release Hadoop as an integral engine across the RDBMS vendor platforms
  • 22. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 22 Conceptual  SoluEon  Architecture   Metadata MDM ETL Data OLTP ELT Warehouse Reporting CDC Analytics DataMart’s Search OLAP Text Mining Big Data Content Analytics BIG Data Textual DW Knowledge Analytics Content ETL Email Taxonomy Docs And / Or MR / Ruby / Java (Hadoop)
  • 23. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 23 Integration Tips •  The key to the castle in integrating Big Data is metadata •  Whatever the tool, technology and technique, if you do not know your metadata, your integration will fail •  Semantic technologies and architectures will be the way to process and integrate the Big Data, much akin to Web 2.0 models •  Data quality for Big Data is a very questionable goal. To get some semblance of quality, taxonomies and ontologies can be of help •  3rd part data providers also provide keywords, trending tags and scores, these can provide a lot of integration support •  Writing business rules for Big Data can be very cumbersome and not all programs can be written in MapReduce
  • 24. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 24 Which Tool Application Hadoop NoSQL Textual ETL Machine Learning x x Sentiments x x x Text Processing x x x Image Processing x x Video Analytics x x Log Parsing x x x Collaborative x x x Filtering Context Search x Email & Content x
  • 25. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 25 Success  Stories   •  Machine learning & Recommendation Engines – Amazon, Orbitz •  CRM - Consumer Analytics, Metrics, Social Network Analytics, Churn, Sentiment, Influencer, Proximity •  Finance – Fraud, Compliance •  Telco – CDR, Fraud •  Healthcare – Provider / Patient analytics, fraud, proactive care •  Lifesciences – clinical analytics, physician outreach •  Pharma – Pharmacovigilance, clinical trials •  Insurance – fraud, geo-spatial •  Manufacturing – warranty analytics, supplier quality metrics
  • 26. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 26 Data Science Data Analytics Art & Science APPLIED SCIENCE Content User Interest Prediction Customer inventory prediction Product Machine learning Behaviors Pattern Mining Optimization Advanced Regression Big Data Processing & ETL Analysis Business Intelligence Advanced Analytics
  • 27. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 27 Challenges   •  Resources  Availability   •  MR  is  hard  to  implement   •  Speech  to  text   •  ConversaEon  context  is  oJen  missing   •  Quality  of  recording   •  Accent  issues   •  Visual  data  tagging   •  Images   •  Text  embedded  within  images   •  Metadata  is  not  available   •  Data  is  not  trusted     •  Content  management  plaMorm  capabiliEes   •  Ontologies  Ambiguity   •  Taxonomy  IntegraEon  
  • 28. ©2012 Sixth Sense Advisors, Inc. All Rights Reserved 28 Contact •  Krish Krishnan rkrish1124@yahoo.com Twitter: @datagenius