©2012 Sixth Sense Advisors, Inc. All Rights Reserved   1




INTEGRATING BIG
DATA
Dataversity Webinar
Feb 7 2012
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   2




State of Data Today
©2012 Sixth Sense Advisors, Inc. All Rights Reserved    3




A Growing Trend
 Expectations for BI are changing w/o anyone telling us

  Requirement         Expectations                               Reality
     Speed         Speed of the Internet              Speed = Infra + Arch +
                                                            Design
  Accessibility      Accessibility of a                   BI Tool licenses &
                       Smartphone                              security
    Usability         IPAD - Mobility                   Web Enabled BI Tool
   Availability       Google Search                  Data & Report Metadata
    Delivery        Speed of questions                Methodology & Signoff
      Data         Access to everything                    Structured Data
   Scalability       Cloud (Amazon)                    Existing Infrastructure
      Cost        Cell phone or Free WIFI                        Millions
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   4



The	
  Wisdom	
  of	
  Crowds	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   5


Data	
  Deluge	
  =	
  Business	
  Insights	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   6	
  



   BIG	
  Data	
  
Structured             Current                       New

                      ERP
                      CRM
                      SCM


                     Content
                     Management
                     Systems

                     Email
                     Call Center

                     Documents
                     Contracts


UnStructured
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   7




What’s so Big about Big Data

            Velocity
            Volume
            Variety
           Complexity
           Ambiguity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   8


               So you are about to start the Big
               Data Project

   Tools                                                               Output




                     Data


instructions
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   9	
  




        The	
  Normal	
  Way	
  Results	
  In	
  ……..	
  




Image Source: Web
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   10	
  




  Why	
  Big	
  Data	
  can	
  Fail	
  on	
  the	
  RDBMS?	
  

                         New Data Types
   Current
                          New volume
     Data                                                             •  POOR
 Management               New analytics                                  Performance
   Platform                                                           •  Failed
(RDBMS + ETL             New workload                                    Programs
     +BI)                New metadata


                                                             Scalability; Sharding; ACID;
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   11	
  




BIG Data
•  Workload Demands                   •  Infrastructure
   •  Process dynamic data              Requirements
      content                             •  Scalable platform
   •  Process unstructured                •  Database independence
      data                                •  Fault tolerant
   •  Systems that can scale                 architectures
      up and scale out with               •  Low cost of acquisition
      high volume data                       and store
   •  Perform complex
                                          •  Supported by standard
      operations within                      toolsets
      reasonable response
      time
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   12




Hadoop


                                               Design Goals
                                               ü  System Shall Manage and
                                                   Heal Itself
                                               ü  Performance Shall Scale
                                                   Linearly
                                               ü  Compute Shall Move to
                                                   Data
                                               ü  Simple Core, Modular and
                                                   Extensible
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   13


Hadoop Differentiators

 Schema-on-Write: RDBMS                       Schema-on-Read: Hadoop
•    Schema must be created                   •    Data is simply copied to the file
     before data is loaded.                        store, no special transformation
                                                   is needed.
•    An explicit load operation has
     to take place which transforms           •    A SerDe (Serializer/Deserlizer)
     the data to the internal                      is applied during read time to
     structure of the database.                    extract the required columns.
•    New columns must be added                •    New data can start flowing
     explicitly before data for such               anytime and will appear
     columns can be loaded into                    retroactively once the SerDe is
     the database.                                 updated to parse them.
•    Read is Fast.                            •    Load is Fast
•    Standards/Governance.                    •    Evolving Schemas/Agility
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   14




Hadoop Known Limitations
•  Write-once model
•  A namespace with an extremely large number of files exceeds
   Namenode’s capacity to maintain
•  Cannot be mounted by exisiting OS
  •  Getting data in and out is tedious
  •  Virtual File System can solve problem
•  HDFS does not implement / support
   •  User quotas
   •  Access permissions
   •  Hard or soft links
   •  Data balancing schemes
•  No periodic checkpoints
•  Namenode is single point of failure
   •  Automatic restart and failover to another machine not yet supported
©2012 Sixth Sense Advisors, Inc. All Rights Reserved    15

   Hadoop Tips
•  Hadoop is useful                                    •  Implementation
   •  When you must process lots of                        •  Think big, start small
      unstructured data                                    •  Build on agile cycles
   •  When running batch jobs is                           •  Focus on the data, as you will
      acceptable                                              always develop schema on
   •  When you have access to lots of                         write.
      cheap hardware



                                                       •  Available Optimizations
•  Hadoop is not useful
                                                           •    Input to Maps
   •  For intense calculations with little or              •    Map only jobs
      no data                                              •    Combiner
   •  When your data is not self-contained                 •    Compression
                                                           •    Speculation
   •  When you need interactive results
                                                           •    Fault Tolerance
                                                           •    Buffer Size
                                                           •    Parallelism (threads)
                                                           •    Partitioner
                                                           •    Reporter
                                                           •    DistributedCache
                                                           •    Task child environment settings
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   16




 Hadoop Tips
•  Troubleshooting                                  •  Performance Tuning
  •  Are your partitions uniform?                       •  Increase the memory/buffer allocated
  •  Can you combine records at the map                      to the tasks
       side?                                            •    Increase the number of tasks that can
  •    Are maps reading off a DFS block                      be run in parallel
       worth of data?                                   •    Increase the number of threads that
  •    Are you running a single reduce wave                  serve the map outputs
       (unless the data size per reducers is            •    Disable unnecessary logging
       too big) ?                                       •    Turn on speculation
  •    Have you tried compressing                       •    Run reducers in one wave as they
       intermediate data & final data?                       tend to get expensive
  •    Are there buffer size issues                     •    Tune the usage of DistributedCache,
  •    Do you see unexplained “long tails”                   it can increase efficiency
  •    Are your CPU cores busy?
  •    Is at least one system resource being
       loaded?
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   17




NoSQL
•  Stands for Not Only SQL
•  Based on CAP Theorem
•  Usually do not require a fixed table schema nor do they
   use the concept of joins
•  All NoSQL offerings relax one or more of the ACID
   properties
•  NoSQL databases come in a variety of flavors
  •  XML (myXMLDB, Tamino, Sedna)
  •  Wide Column (Cassandra, Hbase, Big Table)
  •  Key/Value (Redis, Memcached with BerkleyDB)
  •  Graph (neo4j, InfoGrid)
  •  Document store (CouchDB, MongoDB)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved      18




 NoSQL Footprint

           Key       Amazon Dynamo
          Value


       Voldermort               Big       Google Big Table
                               Table
Size
                              HBase                                Lotus Notes
                                                         Doc
                                                       Database
                  Cassandra                                                                   Graph
                                                                                      Graph
                                                                                              Theory




                                   Complexity
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   19




    NoSQL
•  Access and Query                      •  Best Practices
    •  RESTful interfaces (HTTP as an        •  Design for data collection
       accessAPI)                            •  Plan the data store
    •  Query languages other than SQL        •  Organize by type and semantics
        •  SPARQL - Query language for       •  Partition for performance
           the SemanticWeb                        •  Access and Query is run time
        •  Gremlin - the graph traversal             dependent
           language                          •  Horizontal scaling
        •  Sones Graph Query Language        •  Memory Caching
    •  Data Manipulation / Query API
        •  The Google BigTable
           DataStoreAPI
        •  The Neo4jTraversalAPI
    •  Serialization Formats
        •  JSON
        •  Thrift
        •  ProtoBuffers
        •  RDF
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   20




     Textual ETL Engine
Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of
data that can be analyzed by standard analytical tools


                                                         •     Textual ETL Engine provides a robust user
                                                               interface to define rules (or patterns / keywords)
                                                               to process unstructured or semi-structured data.
                                                         •     The rules engine encapsulates all the complexity
                                                               and lets the user define simple phrases and
                                                               keywords
                                                         •     Easy to implement and easy to realize ROI




•    Advantages                                               •    Disadvantages
       •  Simple to use                                              •  Not integrated with Hadoop as a rules
       •  No MR or Coding required for text analysis                    interface
          and mining                                                 •  Currently uses Sqoop for metadata
       •  Extensible by Taxonomy integration                            interchange with Hadoop or NoSQL
       •  Works on standard and new databases                           interfaces
       •  Produces a highly columnar key-value                       •  Current GA does not handle distributed
          store, ready for metadata integration                         processing outside Windows platform
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   21




Integration
•  All RDBMS vendors today are supporting Hadoop or NoSQL as
 an integration or extension
  •    Oracle Exalytics / Big Data Appliance
  •    Teradata Aster Appliance
  •    EMC Greenplum Appliance
  •    IBM BigInsights
  •    Microsoft Windows Azure Integration
•  There are multiple providers of Hadoop distribution
   •  CloudEra
   •  HortonWorks
   •  Zettaset
•  Adapters from vendors to interface with CloudEra or
 HortonWorks distributions of Hadoop are available today. There
 are integration efforts to release Hadoop as an integral engine
 across the RDBMS vendor platforms
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   22

           Conceptual	
  SoluEon	
  Architecture	
  
                                                  Metadata             MDM


              ETL
                                Data
OLTP          ELT
                              Warehouse                                            Reporting
              CDC
                                                                                   Analytics
                                                     DataMart’s                     Search
                                                                                     OLAP
                                                                                  Text Mining
                               Big Data                                         Content Analytics
BIG Data      Textual            DW                                            Knowledge Analytics
Content        ETL
 Email                         Taxonomy
  Docs
              And / Or

           MR / Ruby / Java
              (Hadoop)
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   23




Integration Tips
•  The key to the castle in integrating Big Data is metadata
•  Whatever the tool, technology and technique, if you do not
   know your metadata, your integration will fail
•  Semantic technologies and architectures will be the way to
   process and integrate the Big Data, much akin to Web 2.0
   models
•  Data quality for Big Data is a very questionable goal. To get
   some semblance of quality, taxonomies and ontologies can be
   of help
•  3rd part data providers also provide keywords, trending tags
   and scores, these can provide a lot of integration support
•  Writing business rules for Big Data can be very cumbersome
   and not all programs can be written in MapReduce
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   24


Which Tool


  Application      Hadoop              NoSQL               Textual ETL
Machine Learning     x                     x
  Sentiments         x                     x                       x
Text Processing      x                     x                       x
Image Processing     x                     x
 Video Analytics     x                     x
  Log Parsing        x                     x                       x
  Collaborative      x                     x                       x
    Filtering
 Context Search                                                    x
Email & Content                                                    x
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   25

Success	
  Stories	
  
 •  Machine learning & Recommendation Engines – Amazon,
      Orbitz
 •    CRM - Consumer Analytics, Metrics, Social Network
      Analytics, Churn, Sentiment, Influencer, Proximity
 •    Finance – Fraud, Compliance
 •    Telco – CDR, Fraud
 •    Healthcare – Provider / Patient analytics, fraud, proactive
      care
 •    Lifesciences – clinical analytics, physician outreach
 •    Pharma – Pharmacovigilance, clinical trials
 •    Insurance – fraud, geo-spatial
 •    Manufacturing – warranty analytics, supplier quality
      metrics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   26




Data Science

Data Analytics                 Art & Science                          APPLIED SCIENCE

 Content                                                       User Interest Prediction
 Customer                                                         inventory prediction
 Product                                                              Machine learning
 Behaviors                                                              Pattern Mining
 Optimization                                                   Advanced Regression
 Big Data Processing & ETL                                                    Analysis



Business Intelligence
                                                                        Advanced Analytics
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   27

Challenges	
  
 •  Resources	
  Availability	
  
 •  MR	
  is	
  hard	
  to	
  implement	
  
 •  Speech	
  to	
  text	
  
     •  ConversaEon	
  context	
  is	
  oJen	
  missing	
  
     •  Quality	
  of	
  recording	
  
     •  Accent	
  issues	
  
 •  Visual	
  data	
  tagging	
  
     •  Images	
  
     •  Text	
  embedded	
  within	
  images	
  
 •  Metadata	
  is	
  not	
  available	
  
 •  Data	
  is	
  not	
  trusted	
  	
  
 •  Content	
  management	
  plaMorm	
  capabiliEes	
  
 •  Ontologies	
  Ambiguity	
  
 •  Taxonomy	
  IntegraEon	
  
©2012 Sixth Sense Advisors, Inc. All Rights Reserved   28




Contact
•  Krish Krishnan
   rkrish1124@yahoo.com
       Twitter: @datagenius

Integrating Big Data Technologies

  • 1.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 1 INTEGRATING BIG DATA Dataversity Webinar Feb 7 2012
  • 2.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 2 State of Data Today
  • 3.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 3 A Growing Trend Expectations for BI are changing w/o anyone telling us Requirement Expectations Reality Speed Speed of the Internet Speed = Infra + Arch + Design Accessibility Accessibility of a BI Tool licenses & Smartphone security Usability IPAD - Mobility Web Enabled BI Tool Availability Google Search Data & Report Metadata Delivery Speed of questions Methodology & Signoff Data Access to everything Structured Data Scalability Cloud (Amazon) Existing Infrastructure Cost Cell phone or Free WIFI Millions
  • 4.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 4 The  Wisdom  of  Crowds  
  • 5.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 5 Data  Deluge  =  Business  Insights  
  • 6.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 6   BIG  Data   Structured Current New ERP CRM SCM Content Management Systems Email Call Center Documents Contracts UnStructured
  • 7.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 7 What’s so Big about Big Data Velocity Volume Variety Complexity Ambiguity
  • 8.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 8 So you are about to start the Big Data Project Tools Output Data instructions
  • 9.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 9   The  Normal  Way  Results  In  ……..   Image Source: Web
  • 10.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 10   Why  Big  Data  can  Fail  on  the  RDBMS?   New Data Types Current New volume Data •  POOR Management New analytics Performance Platform •  Failed (RDBMS + ETL New workload Programs +BI) New metadata Scalability; Sharding; ACID;
  • 11.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 11   BIG Data •  Workload Demands •  Infrastructure •  Process dynamic data Requirements content •  Scalable platform •  Process unstructured •  Database independence data •  Fault tolerant •  Systems that can scale architectures up and scale out with •  Low cost of acquisition high volume data and store •  Perform complex •  Supported by standard operations within toolsets reasonable response time
  • 12.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 12 Hadoop Design Goals ü  System Shall Manage and Heal Itself ü  Performance Shall Scale Linearly ü  Compute Shall Move to Data ü  Simple Core, Modular and Extensible
  • 13.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 13 Hadoop Differentiators Schema-on-Write: RDBMS Schema-on-Read: Hadoop •  Schema must be created •  Data is simply copied to the file before data is loaded. store, no special transformation is needed. •  An explicit load operation has to take place which transforms •  A SerDe (Serializer/Deserlizer) the data to the internal is applied during read time to structure of the database. extract the required columns. •  New columns must be added •  New data can start flowing explicitly before data for such anytime and will appear columns can be loaded into retroactively once the SerDe is the database. updated to parse them. •  Read is Fast. •  Load is Fast •  Standards/Governance. •  Evolving Schemas/Agility
  • 14.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 14 Hadoop Known Limitations •  Write-once model •  A namespace with an extremely large number of files exceeds Namenode’s capacity to maintain •  Cannot be mounted by exisiting OS •  Getting data in and out is tedious •  Virtual File System can solve problem •  HDFS does not implement / support •  User quotas •  Access permissions •  Hard or soft links •  Data balancing schemes •  No periodic checkpoints •  Namenode is single point of failure •  Automatic restart and failover to another machine not yet supported
  • 15.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 15 Hadoop Tips •  Hadoop is useful •  Implementation •  When you must process lots of •  Think big, start small unstructured data •  Build on agile cycles •  When running batch jobs is •  Focus on the data, as you will acceptable always develop schema on •  When you have access to lots of write. cheap hardware •  Available Optimizations •  Hadoop is not useful •  Input to Maps •  For intense calculations with little or •  Map only jobs no data •  Combiner •  When your data is not self-contained •  Compression •  Speculation •  When you need interactive results •  Fault Tolerance •  Buffer Size •  Parallelism (threads) •  Partitioner •  Reporter •  DistributedCache •  Task child environment settings
  • 16.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 16 Hadoop Tips •  Troubleshooting •  Performance Tuning •  Are your partitions uniform? •  Increase the memory/buffer allocated •  Can you combine records at the map to the tasks side? •  Increase the number of tasks that can •  Are maps reading off a DFS block be run in parallel worth of data? •  Increase the number of threads that •  Are you running a single reduce wave serve the map outputs (unless the data size per reducers is •  Disable unnecessary logging too big) ? •  Turn on speculation •  Have you tried compressing •  Run reducers in one wave as they intermediate data & final data? tend to get expensive •  Are there buffer size issues •  Tune the usage of DistributedCache, •  Do you see unexplained “long tails” it can increase efficiency •  Are your CPU cores busy? •  Is at least one system resource being loaded?
  • 17.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 17 NoSQL •  Stands for Not Only SQL •  Based on CAP Theorem •  Usually do not require a fixed table schema nor do they use the concept of joins •  All NoSQL offerings relax one or more of the ACID properties •  NoSQL databases come in a variety of flavors •  XML (myXMLDB, Tamino, Sedna) •  Wide Column (Cassandra, Hbase, Big Table) •  Key/Value (Redis, Memcached with BerkleyDB) •  Graph (neo4j, InfoGrid) •  Document store (CouchDB, MongoDB)
  • 18.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 18 NoSQL Footprint Key Amazon Dynamo Value Voldermort Big Google Big Table Table Size HBase Lotus Notes Doc Database Cassandra Graph Graph Theory Complexity
  • 19.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 19 NoSQL •  Access and Query •  Best Practices •  RESTful interfaces (HTTP as an •  Design for data collection accessAPI) •  Plan the data store •  Query languages other than SQL •  Organize by type and semantics •  SPARQL - Query language for •  Partition for performance the SemanticWeb •  Access and Query is run time •  Gremlin - the graph traversal dependent language •  Horizontal scaling •  Sones Graph Query Language •  Memory Caching •  Data Manipulation / Query API •  The Google BigTable DataStoreAPI •  The Neo4jTraversalAPI •  Serialization Formats •  JSON •  Thrift •  ProtoBuffers •  RDF
  • 20.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 20 Textual ETL Engine Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of data that can be analyzed by standard analytical tools •  Textual ETL Engine provides a robust user interface to define rules (or patterns / keywords) to process unstructured or semi-structured data. •  The rules engine encapsulates all the complexity and lets the user define simple phrases and keywords •  Easy to implement and easy to realize ROI •  Advantages •  Disadvantages •  Simple to use •  Not integrated with Hadoop as a rules •  No MR or Coding required for text analysis interface and mining •  Currently uses Sqoop for metadata •  Extensible by Taxonomy integration interchange with Hadoop or NoSQL •  Works on standard and new databases interfaces •  Produces a highly columnar key-value •  Current GA does not handle distributed store, ready for metadata integration processing outside Windows platform
  • 21.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 21 Integration •  All RDBMS vendors today are supporting Hadoop or NoSQL as an integration or extension •  Oracle Exalytics / Big Data Appliance •  Teradata Aster Appliance •  EMC Greenplum Appliance •  IBM BigInsights •  Microsoft Windows Azure Integration •  There are multiple providers of Hadoop distribution •  CloudEra •  HortonWorks •  Zettaset •  Adapters from vendors to interface with CloudEra or HortonWorks distributions of Hadoop are available today. There are integration efforts to release Hadoop as an integral engine across the RDBMS vendor platforms
  • 22.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 22 Conceptual  SoluEon  Architecture   Metadata MDM ETL Data OLTP ELT Warehouse Reporting CDC Analytics DataMart’s Search OLAP Text Mining Big Data Content Analytics BIG Data Textual DW Knowledge Analytics Content ETL Email Taxonomy Docs And / Or MR / Ruby / Java (Hadoop)
  • 23.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 23 Integration Tips •  The key to the castle in integrating Big Data is metadata •  Whatever the tool, technology and technique, if you do not know your metadata, your integration will fail •  Semantic technologies and architectures will be the way to process and integrate the Big Data, much akin to Web 2.0 models •  Data quality for Big Data is a very questionable goal. To get some semblance of quality, taxonomies and ontologies can be of help •  3rd part data providers also provide keywords, trending tags and scores, these can provide a lot of integration support •  Writing business rules for Big Data can be very cumbersome and not all programs can be written in MapReduce
  • 24.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 24 Which Tool Application Hadoop NoSQL Textual ETL Machine Learning x x Sentiments x x x Text Processing x x x Image Processing x x Video Analytics x x Log Parsing x x x Collaborative x x x Filtering Context Search x Email & Content x
  • 25.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 25 Success  Stories   •  Machine learning & Recommendation Engines – Amazon, Orbitz •  CRM - Consumer Analytics, Metrics, Social Network Analytics, Churn, Sentiment, Influencer, Proximity •  Finance – Fraud, Compliance •  Telco – CDR, Fraud •  Healthcare – Provider / Patient analytics, fraud, proactive care •  Lifesciences – clinical analytics, physician outreach •  Pharma – Pharmacovigilance, clinical trials •  Insurance – fraud, geo-spatial •  Manufacturing – warranty analytics, supplier quality metrics
  • 26.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 26 Data Science Data Analytics Art & Science APPLIED SCIENCE Content User Interest Prediction Customer inventory prediction Product Machine learning Behaviors Pattern Mining Optimization Advanced Regression Big Data Processing & ETL Analysis Business Intelligence Advanced Analytics
  • 27.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 27 Challenges   •  Resources  Availability   •  MR  is  hard  to  implement   •  Speech  to  text   •  ConversaEon  context  is  oJen  missing   •  Quality  of  recording   •  Accent  issues   •  Visual  data  tagging   •  Images   •  Text  embedded  within  images   •  Metadata  is  not  available   •  Data  is  not  trusted     •  Content  management  plaMorm  capabiliEes   •  Ontologies  Ambiguity   •  Taxonomy  IntegraEon  
  • 28.
    ©2012 Sixth SenseAdvisors, Inc. All Rights Reserved 28 Contact •  Krish Krishnan rkrish1124@yahoo.com Twitter: @datagenius