Driving	
  Business	
  
Transformations	
  with	
  
  Big	
  Data	
  Analytics	
  	
  
           DAMA	
  SouthWest	
  Ohio	
  
            September	
  13,	
  2012	
  




                                           S
Key	
  Business	
  Trends	
  

S  Mega	
  Trends	
  
    S  Socializa1on	
  
    S  Collabora1on	
  
    S  Gamifica1on	
  
    S  Mobile	
  

S  Micro	
  Trends	
  
    S  Micro-­‐Segmenta1on	
  
    S  Advanced	
  Analy1cs	
  
Crowdsourcing	
  &	
  Collabora1on	
  
                 	
  
                Within	
  1	
  month:	
  
GoldCorp	
       Within	
  a	
  few	
  years:	
  
                S  More	
  than	
  1000	
  virtual	
  prospectors	
  
                   •  From	
  a	
  $100	
  million	
  company	
  into	
  a	
  $9	
  
                      billion	
  juggernaut	
  	
  
                S  50	
  countries	
  

                S  110	
  new	
  targets,	
  50%	
  previously	
  
                     uniden1fied	
  

                S  80%	
  yielded	
  gold	
  




                       •  $575,000 prize money
                       •  400Mb data
                       •  55,000 acres
                                  3	
                       copyright	
  @Sixth	
  Sense	
  Advisors	
  Inc	
  2012	
  
Collaboration	
  &	
  GamiCication	
  




                  4	
      copyright	
  @Sixth	
  Sense	
  Advisors	
  Inc	
  2012	
  
Gamifica1on	
  
Peer	
  2	
  Peer	
  Collabora1on	
  
Crowdsourcing	
  	
  
Game	
  Changer	
  

S  To	
  become	
  a	
  leader	
  from	
  a	
  compe1tor	
  and	
  create	
  an	
  
    undisputed	
  market	
  presence,	
  companies	
  need	
  to	
  create	
  new	
  
    and	
  vibrant	
  business	
  models	
  

S  These	
  business	
  models	
  need	
  a	
  lot	
  of	
  research,	
  idea1on	
  and	
  
    execu1on	
  (read	
  –	
  Data,	
  Data	
  and	
  more	
  Data)	
  

S  Companies	
  that	
  can	
  harvest	
  data	
  efficiently	
  and	
  effec1vely	
  will	
  
    emerge	
  as	
  the	
  winner	
  of	
  the	
  Game,	
  ul1mately	
  changing	
  the	
  
    Game.	
  
What	
  Does	
  It	
  Take	
  




                                 S
A	
  Growing	
  Trend	
  
    Expecta1ons	
  for	
  BI	
  are	
  changing	
  w/o	
  anyone	
  telling	
  us
                                                                                	
  
Requirement	
                    ExpectaDons	
                                       Reality	
  
    Speed	
               Speed	
  of	
  the	
  Internet	
            Speed	
  =	
  Infra	
  +	
  Arch	
  +	
  
                                                                                Design	
  
Accessibility	
               Accessibility	
  of	
  a	
           BI	
  Tool	
  licenses	
  &	
  security	
  
                                Smartphone	
  
  Usability	
                  IPAD	
  -­‐	
  Mobility	
               Web	
  Enabled	
  BI	
  Tool	
  
 Availability	
                 Google	
  Search	
                  Data	
  &	
  Report	
  Metadata	
  
   Delivery	
               Speed	
  of	
  ques1ons	
                Methodology	
  &	
  Signoff	
  
     Data	
                Access	
  to	
  everything	
  	
                Structured	
  Data	
  
 Scalability	
                 Cloud	
  (Amazon)	
                    Exis1ng	
  Infrastructure	
  
     Cost	
              Cell	
  phone	
  or	
  Free	
  WIFI	
                      Millions	
  

                                                                            ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                  10	
  
                                                                                                                                Reserved	
  
Long	
  Tail	
  



                                                  The New Way
                                            (with a bigger, longer tail)

  The Old Way
(Pareto Principle, Control
  or 80/20 rule)
                                                      Source: http://en.wikipedia.org/wiki/The_Long_Tail
                             20%



           When Web 2.0 is applied…




                                                      copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
2008 US Presidential Elections




     $32 million raised from 275,000 people
              who gave $100 or less
                                    copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Long	
  Tail	
  Example	
  

                       Web 2.0 significantly increases
                       total value contributed/received
                       by aggregating the long tail
                       of smaller value donors.
High $ value
  donors,
   Small
constellation

                                           Source: http://en.wikipedia.org/wiki/The_Long_Tail
                 20%

                            Low $ value donors,
                            Larger constellation



                                            copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Brand	
  Management	
  




                   copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Big	
  Data	
  




                  S
The	
  Buzz	
  




                  copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Data	
  Disruptions	
  




Porter	
  CompeDDve	
  Model	
               17	
     copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
State	
  of	
  Data	
  Today	
  




                         ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
               18	
  
                                                                             Reserved	
  
Future	
  of	
  Data	
  




           19	
       copyright	
  @Sixth	
  Sense	
  Advisors	
  Inc	
  2012	
  
Big	
  Data	
  
Big Data can be defined as data that can grow in volume, velocity, variety and complexity at
unprecedented pace. The growth and complexity present challenges with the capture, storage,
management, analysis and visualization using the typical BI tool stack




                                                   20	
                 copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Tapping into the data
  Business	
                                Infrastructure	
  

                                      Today	
  we	
  do	
  Big	
  or	
  Small	
  
Structured data
                                      compute	
  with	
  Small	
  and	
  Large	
  
used today	
  
                                      structured	
  data	
  sets	
  




Big Data                              Big	
  Data	
  will	
  mean	
  Big	
  or	
  
existing across                       Small	
  compute	
  with	
  Big	
  
the enterprise                        data	
  sets,	
  not	
  always	
  
that can be                           available	
  in	
  structured	
  or	
  
made available                        semi-­‐structured	
  formats	
  
to business	
  




                             21	
                  copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Analytics	
  
S  Analy1cs	
  is	
  the	
  key	
  visualiza1on	
  technique	
  to	
  analyze	
  and	
  mone1ze	
  
     from	
  Big	
  Data	
  
S  The	
  field	
  of	
  analy1cs	
  is	
  resurging	
  from	
  the	
  advent	
  of	
  Big	
  Data	
  	
  
     S    Social	
  Analy1cs	
  
     S    Sensor	
  Analy1cs	
  
     S    Text	
  Analy1cs	
  
     S    Deep	
  Data	
  Mining	
  

S  Analy1cs	
  needs	
  metadata	
  for	
  integra1on	
  

S  Applica1ons	
  
     S    Fraud	
  Detec1on	
  
     S    Campaign	
  Op1miza1on	
  
     S    Demand	
  and	
  Supply	
  Op1miza1on	
  
     S    Forecast	
  Op1miza1on	
  


                                                              22	
                      copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
What’s	
  so	
  Big	
  about	
  Big	
  Data	
  

                                 Velocity	
  
                                 Volume	
  
                                  Variety	
  
                                Complexity	
  
                                Ambiguity	
  
                                     	
  
                                ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                      23	
  
                                                                                    Reserved	
  
What	
  do	
  we	
  collect	
  

•  Facebook has an average of 30 billion pieces of content added
   every month

•  YouTube receives 24hours of video, every minute

•  5 Billion mobile phones in use in 2010

•  A leading retailer in the UK collects 1.5 billion pieces of
   information to adjust prices and promotions

•  Amazon.com: 30% of sales is out of its recommendation engine

•  A Boeing Jet Engine produces 20TB/Hour for engineers to
   examine in real time to make improvements




                                      24	
            copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Potential	
  Business	
  Insights	
  	
  

S  Trends	
                                          S  Pharmaceu1cal	
  Companies	
  	
  
                                                          S  Pa1ent	
  Educa1on	
  
S  Brand	
  Iden1ty	
  &	
  Management	
                 S  Physician	
  Enriched	
  Content	
  
                                                              Management	
  
S  Consumer	
  Educa1on	
                                S  Reduce	
  Clinical	
  Trial	
  Cycles	
  and	
  
                                                              Errors	
  
S  Compe11ve	
  Intelligence	
                           S  Pharmacovigilance	
  

S  Micro-­‐Targe1ng	
  Leverage	
                    S  Financial	
  
    “Crowdsourcing”	
  driven	
  
    innova1on	
  to	
  beger	
  products	
  and	
         S  Fraud	
  
    services	
  (DELL,	
  Innocen1ve	
  (SAP,	
           S  Customer	
  Management	
  
    P&G))	
  
                                                      S  Manufacturing	
  
S  eDiscovery	
  (Legal	
  trends	
  and	
               S  Supply	
  chain	
  op1miza1on	
  
    pagerns,	
  financial	
  fraud)	
                      S  Track	
  &	
  Trace	
  
                                                          S  Compliance	
  

                                                      	
  
                                                                               copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Why	
  DWBI	
  Fails	
  Repeatedly	
  
                                                                                                                         Lost	
  value	
  =	
  
Business	
  Value	
                                                                                                      Sum	
  (Latencies)
                                                                                                                         +	
  Opportunity	
  
                                                    Business	
  SituaDon	
  
                                                                                                                         Cost	
  

                                  Data	
  Latency	
  
 Value	
  




                                                                                Data	
  is	
  ready	
  
 Lost	
  




                                         Analysis	
  Latency	
  

                                                                                                                   InformaDon	
  is	
  available	
  


                                                            Decision	
  Latency	
  
                                                                                                                                                       Decision	
  is	
  made	
  




                                                                        AcDon	
  Dme	
  or	
  AcDon	
  distance	
  
                                                                                                                                                                        Time	
  

Base	
  Graph	
  Courtesy	
  –	
  Dr.	
  Richard	
  Hackathorn	
  



                                                                                                          26	
                                  copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
The	
  Data	
  Landscape	
  
                                                                         Datamarts	
  &	
  
Transac1onal	
                                                                                                      Reports	
  
  Systems	
                   ODS	
                                       Analy1cal	
  
                                                                          Databases	
  

                                                                                                                 Dashboard
                                                                                                                     s	
  
                                          Enterprise	
  
Transac1onal	
                          Datawarehouse	
  	
              Datamarts	
  &	
  
  Systems	
                   ODS	
                                       Analy1cal	
  
                                                                          Databases	
                               Analy1c	
  
                                                                                                                    Models	
  


                                                                                                                   Other	
  
Transac1onal	
  
                                                                                                                 Applica1on
                              ODS	
                                      Datamarts	
  &	
                             s	
  
  Systems	
  
                                                                          Analy1cal	
  
                                                                          Databases	
  




          Data	
  Transforma1on	
                               27	
                     copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
ACID	
  Kills	
  

S  Atomic – All of the work in a transaction completes
  (commit) or none of it completes
S  Consistent – A transaction transforms the database
  from one consistent state to another consistent state.
  Consistency is defined in terms of constraints.
S  Isolated – The results of any changes made during a
  transaction are not visible until the transaction has
  committed.
S  Durable – The results of a committed transaction
  survive failures

                             28	
          copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
BIG	
  Data	
  Scenarios	
  EXAMPLES	
  
To:	
  Bob.Collins@bankwithus.com	
  
	
  
Dear	
  Mr.	
  Collins,	
  
	
  
This	
  email	
  is	
  in	
  reference	
  to	
  my	
  bank	
  account	
  which	
  has	
  been	
  
efficiently	
  handled	
  by	
  your	
  bank	
  for	
  more	
  than	
  five	
  years.	
  
There	
  has	
  been	
  no	
  problem	
  1ll	
  date	
  un1l	
  last	
  week	
  the	
  
situa1on	
  went	
  out	
  of	
  the	
  hand.	
  
	
  
I	
  have	
  deposited	
  one	
  of	
  my	
  high	
  amount	
  cheque	
  to	
  my	
  bank	
  
account	
  no:	
  65656512	
  which	
  was	
  to	
  be	
  credited	
  same	
  day	
  but	
  
due	
  to	
  your	
  staff	
  carelessness	
  it	
  wasn’t	
  done	
  and	
  because	
  of	
  
this	
  negligence	
  my	
  reputa1on	
  in	
  the	
  market	
  has	
  been	
  
tarnished.	
  Furthermore	
  I	
  had	
  issued	
  one	
  payment	
  cheque	
  to	
  
the	
  party	
  which	
  was	
  showing	
  bounced	
  due	
  to	
  “Insufficient	
  
balance”	
  just	
  because	
  my	
  cheque	
  didn’t	
  make	
  on	
  1me.	
  
	
  
My	
  rela1onship	
  with	
  your	
  bank	
  has	
  matured	
  with	
  the	
  1me	
  and	
  
it’s	
  a	
  shame	
  to	
  tell	
  you	
  about	
  this	
  kind	
  of	
  services	
  are	
  not	
  
acceptable	
  when	
  it	
  is	
  ques1on	
  of	
  somebody’s	
  reputa1on.	
  I	
  
hope	
  you	
  got	
  my	
  point	
  and	
  I	
  am	
  agaching	
  a	
  copy	
  of	
  the	
  same	
  
for	
  further	
  rapid	
  procedures	
  and	
  remit	
  into	
  my	
  account	
  in	
  a	
  
day.	
  
	
  
Yours	
  sincerely	
  
	
  
Daniel	
  Carter	
  
	
  
Ph:	
  564-­‐009-­‐2311	
  
                                                                                                        29	
     copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
BIG	
  Data	
  Text	
  Example	
  
       S    We	
  will	
  ooen	
  imply	
  addi1onal	
  informa1on	
  in	
  spoken	
  language	
  by	
  the	
  way	
  we	
  place	
  
             stress	
  on	
  words.	
  	
  



       S    The	
  sentence	
  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  demonstrates	
  the	
  importance	
  stress	
  
             can	
  play	
  in	
  a	
  sentence,	
  and	
  thus	
  the	
  inherent	
  difficulty	
  a	
  natural	
  language	
  processor	
  can	
  
             have	
  in	
  parsing	
  it.	
  	
  
             S  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  Someone	
  else	
  said	
  it,	
  but	
  I	
  didn't.	
  	
  
             S  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  simply	
  didn't	
  ever	
  say	
  it.	
  	
  
             S  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  might	
  have	
  implied	
  it	
  in	
  some	
  way,	
  but	
  I	
  never	
  
                    explicitly	
  said	
  it.	
  	
  
             S  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  said	
  someone	
  took	
  it;	
  I	
  didn't	
  say	
  it	
  was	
  she.	
  	
  
             S  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  just	
  said	
  she	
  probably	
  borrowed	
  it.	
  	
  
             S  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  said	
  she	
  stole	
  someone	
  else's	
  money.	
  	
  
             S  "I	
  never	
  said	
  she	
  stole	
  my	
  money"	
  -­‐	
  I	
  said	
  she	
  stole	
  something,	
  but	
  not	
  my	
  money	
  



       S    Depending	
  on	
  which	
  word	
  the	
  speaker	
  places	
  the	
  stress,	
  this	
  sentence	
  could	
  have	
  
             several	
  dis1nct	
  meanings.	
  

                                                                                        30	
                              copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Example Source: Wikepedia
Pattern	
  Detection	
  
Clustering	
  Techniques	
                    U1li1es	
  
     K-­‐Means	
                                   Accuracy	
  Measures	
  
     Maximin	
                                     Range	
  Filters	
  
     Agglomera1ve	
                                K-­‐Fold	
  Cross	
  Valida1on	
  
     Divisive	
                                    Merge	
  &	
  Subset	
  
     Regression	
                                  Vector	
  Magnitude	
  

Classifica1on	
  Techniques	
  
      Na1ve	
  Bayes	
                        Examples	
  	
  
      Neural	
  Networks	
                    • Text	
  –	
  OCR,	
  Machine,	
  Digital	
  
             Back	
  Propoga1onal	
           • 	
  Face	
  recogni1on,	
  verifica1on,	
  retrieval.	
  	
  
             Recursively	
  Spliung	
  	
     • 	
  Finger	
  prints	
  recogni1on.	
  
      K-­‐Nearest	
  Neighbor	
               • 	
  Speech	
  recogni1on.	
  
      Minimum	
  Distance	
                   • 	
  Medical	
  diagnosis:	
  X-­‐Ray,	
  EKG	
  analysis	
  
                                              • 	
  	
  Machine	
  diagnos1cs	
  data	
  
Reduc1on	
  Techniques	
                      • 	
  Geological	
  data	
  
    Backward	
  Elimina1on	
                  • 	
  Automated	
  Target	
  Recogni1on	
  (ATR).	
  
    Forward	
  Selec1on	
                     • 	
  	
  Image	
  segmenta1on	
  and	
  analysis	
  (recogni1on	
  from	
  
    Agribute	
  Removal	
                     aerial	
  or	
  satelite	
  photographs).	
  
    Principal	
  Components	
  
                                                          31	
                             copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
So	
  you	
  are	
  about	
  to	
  start	
  
                    the	
  Big	
  Data	
  Project	
  

    Tools	
                                                              Output	
  




                        Data	
  


instruc1ons	
  




                                            ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                   32	
  
                                                                                                Reserved	
  
The	
  Normal	
  Way	
  Results	
  In	
  ……..	
  




                       33	
        @2012	
  Copyright	
  Sixth	
  Sense	
  Advisors	
  
Performance	
  
Re-­‐Engineering	
  a	
  Ferrari	
  Engine	
  in	
  a	
  Yugo	
  does	
  not	
  make	
  the	
  fastest	
  race	
  car.




                             + New Data Types

                             + New volume

                             + New Analytics

                             + New Data Retention

                             + New Data Workloads




                                                                 34	
                       copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
BIG	
  Data	
  
ü  Workload	
  Demands	
                                    ü  Infrastructure	
  Needs	
  
   ü  Process	
  dynamic	
  data	
  content	
                    ü  Scalable	
  plaxorm	
  
   ü  Process	
  unstructured	
  data	
                          ü  Database	
  independence	
  
   ü  Systems	
  that	
  can	
  scale	
  up	
  and	
             ü  Fault	
  Tolerance	
  
        scale	
  out	
  with	
  high	
  volume	
  data	
          ü  Supported	
  by	
  standard	
  toolsets	
  
   ü  Perform	
  complex	
  opera1ons	
  
        within	
  reasonable	
  response	
  1me	
  




                                                                                      ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                              35	
  
                                                                                                                                          Reserved	
  
Data	
  Warehouse	
  Appliance	
  

High Availability	
                                                                •  A	
  Data	
  Warehouse	
  (DW)	
  
                                                                                      Appliance	
  is	
  an	
  integrated	
  
Standard SQL Interface	
                                                              set	
  of	
  servers,	
  storage,	
  OS,	
  
                                                                                      database	
  and	
  interconnect	
  
Advanced Compression	
                                                                specifically	
  preconfigured	
  
                                                                                      and	
  tuned	
  for	
  the	
  rigors	
  of	
  
MPP	
                                                                                 data	
  warehousing.	
  	
  

Leverages existing BI, ETL and OLTP investments	
                                  •  DW	
  appliances	
  offer	
  an	
  
                                                                                      agrac1ve	
  price	
  /	
  
Hadoop & MapReduce Interface / Embedded	
                                             performance	
  value	
  
                                                                                      proposi1on	
  and	
  are	
  
Minimal	
  disk	
  I/O	
  bogleneck;	
  simultaneously	
  load	
  &	
  query	
        frequently	
  a	
  frac1on	
  of	
  the	
  
                                                                                      cost	
  of	
  tradi1onal	
  data	
  
Auto Database Management	
                                                            warehouse	
  solu1ons.	
  	
  


                                                                          36	
                    copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Hadoop	
  




    37	
     copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Hadoop & RDBMS Analogy

                                                     RDBMS	
                                 Hadoop	
  




                          Sports car:                                                  Cargo train:
                          •      refined                                               •  rough
                          •      has a lot of features                                 •  missing a lot of
                          •      accelerates very fast                                       luxury
                          •      pricey                                                •  slow to accelerate
                          •      expensive to maintain	
                               •  carries almost anything
                                                                                       •  moves a lot of stuff very
                                                                                          efficiently
*	
  Original	
  Slide	
  Author-­‐	
  Amr	
  Adwallah	
  ,	
  CloudEra	
  
                                                                              38	
             copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
NoSQL	
  
S    Stands	
  for	
  Not	
  Only	
  SQL	
  
S    Based	
  on	
  CAP	
  Theorem	
  /	
  BASE	
  

S    Usually	
  do	
  not	
  require	
  a	
  fixed	
  table	
  schema	
  nor	
  do	
  they	
  use	
  the	
  concept	
  of	
  joins	
  
S    All	
  NoSQL	
  offerings	
  relax	
  one	
  or	
  more	
  of	
  the	
  ACID	
  properDes	
  	
  

S    Scalable replication and distribution
      S  Potentially thousands of machines
      S  Potentially distributed around the world

S    Queries need to return answers quickly

S    Mostly query, few updates
S    Asynchronous Inserts & Updates

S    NoSQL	
  databases	
  come	
  in	
  a	
  variety	
  of	
  flavors	
  
      S    XML	
  (myXMLDB,	
  Tamino,	
  Sedna)	
  	
  
      S    Wide	
  Column	
  (Cassandra,	
  Hbase,	
  Big	
  Table)	
  
      S    Key/Value	
  (Redis,	
  Memcached	
  with	
  BerkleyDB)	
  	
  	
  
      S    Graph	
  (neo4j,	
  InfoGrid)	
  
      S    Document	
  store	
  (CouchDB,	
  MongoDB)	
  
                                                                                                            ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                                             39	
  
                                                                                                                                                                Reserved	
  
NoSQL	
  Footprint	
  

                   Amazon	
  Dynamo	
  

                                HBase	
  

           Voldermort	
                            Google	
  Big	
  Table	
  
Size	
  
                                                                                Lotus	
  Notes	
  

                                                                                                                     Graph	
  
                    Cassandra	
                                                                                      Theory	
  




                                            Complexity	
  
                                                                                  ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                        40	
  
                                                                                                                                      Reserved	
  
Map	
  Reduce	
  

n  Technique	
  for	
  indexing	
  and	
  searching	
  large	
  data	
  volumes	
  

n  Two	
  Phases,	
  Map	
  and	
  Reduce	
  
    n  Map	
  
        n  Extract	
  sets	
  of	
  Key-­‐Value	
  pairs	
  from	
  underlying	
  data	
  
        n  Poten1ally	
  in	
  Parallel	
  on	
  mul1ple	
  machines	
  
    n  Reduce	
  
        n  Merge	
  and	
  sort	
  sets	
  of	
  Key-­‐Value	
  pairs	
  
        n  Results	
  may	
  be	
  useful	
  for	
  other	
  searches	
  




                                                         41	
                    copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Textual	
  ETL	
  Engine	
  
Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a
structure of data that can be analyzed by standard analytical tools


                                                                          ü  Textual	
  ETL	
  Engine	
  provides	
  a	
  robust	
  user	
  
                                                                              interface	
  to	
  define	
  rules	
  (or	
  pagerns	
  /	
  
                                                                              keywords)	
  to	
  process	
  unstructured	
  or	
  semi-­‐
                                                                              structured	
  data.	
  
                                                                          ü  The	
  rules	
  engine	
  encapsulates	
  all	
  the	
  
                                                                              complexity	
  and	
  lets	
  the	
  user	
  define	
  simple	
  
                                                                              phrases	
  and	
  keywords	
  
                                                                          ü  Easy	
  to	
  implement	
  and	
  easy	
  to	
  realize	
  ROI	
  




   ü  Advantages	
                                                    ü  Disadvantages	
  
         ü  Simple	
  to	
  use	
                                           ü  Not	
  integrated	
  with	
  Hadoop	
  as	
  a	
  rules	
  
         ü  No	
  MR	
  or	
  Coding	
  required	
  for	
  text	
               interface	
  
             analysis	
  and	
  mining	
                                     ü  Currently	
  uses	
  Sqoop	
  for	
  metadata	
  
         ü  Extensible	
  by	
  Taxonomy	
  integra1on	
                        interchange	
  with	
  Hadoop	
  or	
  NoSQL	
  
         ü  Works	
  on	
  standard	
  and	
  new	
  databases	
                interfaces	
  
         ü  Produces	
  a	
  highly	
  columnar	
  key-­‐value	
            ü  Current	
  GA	
  does	
  not	
  handle	
  distributed	
  
             store,	
  ready	
  for	
  metadata	
  integra1on	
                  processing	
  outside	
  Windows	
  plaxorm	
  	
  All	
  Rights	
  
                                                                                                 ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  
                                                                       42	
  
                                                                                                                                                   Reserved	
  
Integration	
  

S    All	
  RDBMS	
  vendors	
  today	
  are	
  suppor1ng	
  Hadoop	
  or	
  NoSQL	
  as	
  an	
  integra1on	
  or	
  extension	
  
      S    Oracle	
  Exaly1cs	
  /	
  Big	
  Data	
  Appliance	
  
      S    Teradata	
  Aster	
  Appliance	
  
      S    EMC	
  Greenplum	
  Appliance	
  
      S    IBM	
  BigInsights	
  
      S    Microsoo	
  Windows	
  Azure	
  Integra1on	
  

S    There	
  are	
  mul1ple	
  providers	
  of	
  Hadoop	
  distribu1on	
  
      S    CloudEra	
  
      S    HortonWorks	
  
      S    Hadapt	
  
      S    Zegaset	
  
      S    IBM	
  

S    Adapters	
  from	
  vendors	
  to	
  interface	
  with	
  CloudEra	
  or	
  HortonWorks	
  distribu1ons	
  of	
  Hadoop	
  
      are	
  available	
  today.	
  There	
  are	
  integra1on	
  efforts	
  to	
  release	
  Hadoop	
  as	
  an	
  integral	
  engine	
  
      across	
  the	
  RDBMS	
  vendor	
  plaxorms	
  



                                                                                                 ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                                      43	
  
                                                                                                                                                     Reserved	
  
Conceptual	
  Solu1on	
  Architecture	
  
                                                                     Metadata	
           MDM	
  


                  ETL	
  
                                         Data	
  
 OLTP	
           ELT	
  
                                       Warehouse	
  
                  CDC	
  
                                                                       DataMart’s	
  


                                          Big	
  Data	
  
BIG	
  Data	
     Textual	
                 DW	
  
Content	
           ETL	
  
 Email	
                                 Taxonomy	
  
  Docs	
  
                  And	
  /	
  Or	
  




                                                                                    ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                            44	
  
                                                                                                                                        Reserved	
  
Which	
  Tool	
  
   ApplicaDon	
             Hadoop	
     NoSQL	
          Textual	
  ETL	
  
Machine	
  Learning	
           x	
               x	
  
    Sen1ments	
                 x	
               x	
                x	
  
 Text	
  Processing	
           x	
               x	
                x	
  
Image	
  Processing	
           x	
               x	
  
 Video	
  Analy1cs	
            x	
               x	
  
    Log	
  Parsing	
            x	
               x	
                x	
  
  Collabora1ve	
                x	
               x	
                x	
  
    Filtering	
  
 Context	
  Search	
                                                 x	
  
Email	
  &	
  Content	
                                              x	
  



                                                             ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                         45	
  
                                                                                                                 Reserved	
  
Integration	
  Tips 	
  	
  

S  The	
  key	
  to	
  the	
  castle	
  in	
  integra1ng	
  Big	
  Data	
  is	
  metadata	
  

S  Whatever	
  the	
  tool,	
  technology	
  and	
  technique,	
  if	
  you	
  do	
  not	
  know	
  
     your	
  metadata,	
  your	
  integra1on	
  will	
  fail	
  
S  Seman1c	
  technologies	
  and	
  architectures	
  will	
  be	
  the	
  way	
  to	
  process	
  
     and	
  integrate	
  the	
  Big	
  Data,	
  much	
  akin	
  to	
  Web	
  2.0	
  models	
  
S  Data	
  quality	
  for	
  Big	
  Data	
  is	
  a	
  very	
  ques1onable	
  goal.	
  To	
  get	
  some	
  
     semblance	
  of	
  quality,	
  taxonomies	
  and	
  ontologies	
  can	
  be	
  of	
  help	
  
S  3rd	
  part	
  data	
  providers	
  also	
  provide	
  keywords,	
  trending	
  tags	
  and	
  
     scores,	
  these	
  can	
  provide	
  a	
  lot	
  of	
  integra1on	
  support	
  
S  Wri1ng	
  business	
  rules	
  for	
  Big	
  Data	
  can	
  be	
  very	
  cumbersome	
  and	
  not	
  
     all	
  programs	
  can	
  be	
  wrigen	
  in	
  MapReduce	
  


                                                                                ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                        46	
  
                                                                                                                                    Reserved	
  
Success	
  Stories	
  
S  Machine	
  learning	
  &	
  Recommenda1on	
  Engines	
  –	
  Amazon,	
  Orbitz	
  

S  CRM	
  -­‐	
  Consumer	
  Analy1cs,	
  Metrics,	
  Social	
  Network	
  Analy1cs,	
  Churn,	
  
    Sen1ment,	
  Influencer,	
  Proximity	
  

S  Finance	
  –	
  Fraud,	
  Compliance	
  

S  Telco	
  –	
  CDR,	
  Fraud	
  

S  Healthcare	
  –	
  Provider	
  /	
  Pa1ent	
  analy1cs,	
  fraud,	
  proac1ve	
  care	
  

S  Lifesciences	
  –	
  clinical	
  analy1cs,	
  physician	
  outreach	
  

S  Pharma	
  –	
  Pharmacovigilance,	
  clinical	
  trials	
  

S  Insurance	
  –	
  fraud,	
  geo-­‐spa1al	
  

S  Manufacturing	
  –	
  warranty	
  analy1cs,	
  supplier	
  quality	
  metrics	
  

                                                                        ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                   47	
  
                                                                                                                            Reserved	
  
Big	
  Data	
  Challenges	
  

S  Integra1on	
  to	
  the	
  EDW	
  is	
  s1ll	
  an	
  open	
  issue	
  –	
  Big	
  Data	
  reduces	
  
    to	
  small	
  metrics,	
  and	
  this	
  translates	
  into	
  the	
  current	
  state	
  issues	
  
    faced	
  with	
  EDW	
  data	
  

S  Big	
  Data	
  requires	
  lot	
  of	
  Taxonomy	
  processing	
  especially	
  in	
  
    Content	
  related	
  Search	
  

S  There	
  are	
  several	
  applica1ons	
  that	
  need	
  high	
  performing	
  
    memory	
  architectures	
  as	
  data	
  is	
  compute	
  intensive	
  –	
  example	
  
    image	
  processing	
  of	
  brain	
  scans	
  

S  Technology	
  is	
  improving	
  by	
  the	
  day,	
  but	
  integra1on	
  and	
  
    deployment	
  are	
  becoming	
  equally	
  complex.	
  

                                                          48	
                   copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  
Data	
  Science	
  
            	
  
            	
  
                                                     Art	
  &	
  Science	
  
          Data Analytics
          	
  
               	
  Content
               	
  Customer
                   Product
                   Behaviors
               	
  Optimization
                   Big Data Processing & ETL




          Business	
  Intelligence	
                                                  Advanced	
  Analy1cs	
  

Business	
  Analysts,	
  Data	
  Analysts,	
  Metadata	
  Architects,	
  
Data	
  Architects	
  are	
  all	
  in	
  some	
  evolu1onary	
  stage	
  of	
  a	
  Data	
  Scien1st	
  
                                                                                       ©2012	
  Sixth	
  Sense	
  Advisors,	
  Inc.	
  	
  All	
  Rights	
  
                                                             49	
  
                                                                                                                                           Reserved	
  
Summary	
  

S  With	
  effec1ve	
  use	
  of	
  Big	
  Data	
  and	
  Analy1cs	
  
    S  You	
  can	
  drive	
  successful	
  business	
  transforma1ons	
  
    S  Create	
  an	
  agile	
  environment	
  for	
  business	
  decision	
  processes	
  
    S  Use	
  the	
  Data	
  Warehouse	
  for	
  Analy1cal	
  Processes	
  as	
  it	
  was	
  
        originally	
  designed	
  for	
  
    S  Create	
  predic1ve	
  insights	
  
    S  Prac1cally	
  “mine	
  (explore)”	
  any	
  data	
  from	
  any	
  source	
  
    S  Create	
  powerful	
  dashboards	
  from	
  near	
  real	
  1me	
  data	
  
    S  Reduce	
  risk	
  
    S  Increase	
  compe11veness	
  
Contact	
  

Krish	
  Krishnan	
  

rkrish1124@yahoo.com	
  

Twiger	
  -­‐	
  @datagenius	
  




                        51	
       copyright:	
  Sixth	
  Sense	
  Advisors	
  Inc	
  @2012	
  

DAMA Presentation

  • 1.
    Driving  Business   Transformations  with   Big  Data  Analytics     DAMA  SouthWest  Ohio   September  13,  2012   S
  • 2.
    Key  Business  Trends   S  Mega  Trends   S  Socializa1on   S  Collabora1on   S  Gamifica1on   S  Mobile   S  Micro  Trends   S  Micro-­‐Segmenta1on   S  Advanced  Analy1cs  
  • 3.
    Crowdsourcing  &  Collabora1on     Within  1  month:   GoldCorp   Within  a  few  years:   S  More  than  1000  virtual  prospectors   •  From  a  $100  million  company  into  a  $9   billion  juggernaut     S  50  countries   S  110  new  targets,  50%  previously   uniden1fied   S  80%  yielded  gold   •  $575,000 prize money •  400Mb data •  55,000 acres 3   copyright  @Sixth  Sense  Advisors  Inc  2012  
  • 4.
    Collaboration  &  GamiCication   4   copyright  @Sixth  Sense  Advisors  Inc  2012  
  • 5.
  • 6.
    Peer  2  Peer  Collabora1on  
  • 7.
  • 8.
    Game  Changer   S To  become  a  leader  from  a  compe1tor  and  create  an   undisputed  market  presence,  companies  need  to  create  new   and  vibrant  business  models   S  These  business  models  need  a  lot  of  research,  idea1on  and   execu1on  (read  –  Data,  Data  and  more  Data)   S  Companies  that  can  harvest  data  efficiently  and  effec1vely  will   emerge  as  the  winner  of  the  Game,  ul1mately  changing  the   Game.  
  • 9.
    What  Does  It  Take   S
  • 10.
    A  Growing  Trend   Expecta1ons  for  BI  are  changing  w/o  anyone  telling  us   Requirement   ExpectaDons   Reality   Speed   Speed  of  the  Internet   Speed  =  Infra  +  Arch  +   Design   Accessibility   Accessibility  of  a   BI  Tool  licenses  &  security   Smartphone   Usability   IPAD  -­‐  Mobility   Web  Enabled  BI  Tool   Availability   Google  Search   Data  &  Report  Metadata   Delivery   Speed  of  ques1ons   Methodology  &  Signoff   Data   Access  to  everything     Structured  Data   Scalability   Cloud  (Amazon)   Exis1ng  Infrastructure   Cost   Cell  phone  or  Free  WIFI   Millions   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   10   Reserved  
  • 11.
    Long  Tail   The New Way (with a bigger, longer tail) The Old Way (Pareto Principle, Control or 80/20 rule) Source: http://en.wikipedia.org/wiki/The_Long_Tail 20% When Web 2.0 is applied… copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 12.
    2008 US PresidentialElections $32 million raised from 275,000 people who gave $100 or less copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 13.
    Long  Tail  Example   Web 2.0 significantly increases total value contributed/received by aggregating the long tail of smaller value donors. High $ value donors, Small constellation Source: http://en.wikipedia.org/wiki/The_Long_Tail 20% Low $ value donors, Larger constellation copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 14.
    Brand  Management   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 15.
  • 16.
    The  Buzz   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 17.
    Data  Disruptions   Porter  CompeDDve  Model   17   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 18.
    State  of  Data  Today   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   18   Reserved  
  • 19.
    Future  of  Data   19   copyright  @Sixth  Sense  Advisors  Inc  2012  
  • 20.
    Big  Data   BigData can be defined as data that can grow in volume, velocity, variety and complexity at unprecedented pace. The growth and complexity present challenges with the capture, storage, management, analysis and visualization using the typical BI tool stack 20   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 21.
    Tapping into thedata Business   Infrastructure   Today  we  do  Big  or  Small   Structured data compute  with  Small  and  Large   used today   structured  data  sets   Big Data Big  Data  will  mean  Big  or   existing across Small  compute  with  Big   the enterprise data  sets,  not  always   that can be available  in  structured  or   made available semi-­‐structured  formats   to business   21   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 22.
    Analytics   S  Analy1cs  is  the  key  visualiza1on  technique  to  analyze  and  mone1ze   from  Big  Data   S  The  field  of  analy1cs  is  resurging  from  the  advent  of  Big  Data     S  Social  Analy1cs   S  Sensor  Analy1cs   S  Text  Analy1cs   S  Deep  Data  Mining   S  Analy1cs  needs  metadata  for  integra1on   S  Applica1ons   S  Fraud  Detec1on   S  Campaign  Op1miza1on   S  Demand  and  Supply  Op1miza1on   S  Forecast  Op1miza1on   22   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 23.
    What’s  so  Big  about  Big  Data   Velocity   Volume   Variety   Complexity   Ambiguity     ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   23   Reserved  
  • 24.
    What  do  we  collect   •  Facebook has an average of 30 billion pieces of content added every month •  YouTube receives 24hours of video, every minute •  5 Billion mobile phones in use in 2010 •  A leading retailer in the UK collects 1.5 billion pieces of information to adjust prices and promotions •  Amazon.com: 30% of sales is out of its recommendation engine •  A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements 24   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 25.
    Potential  Business  Insights     S  Trends   S  Pharmaceu1cal  Companies     S  Pa1ent  Educa1on   S  Brand  Iden1ty  &  Management   S  Physician  Enriched  Content   Management   S  Consumer  Educa1on   S  Reduce  Clinical  Trial  Cycles  and   Errors   S  Compe11ve  Intelligence   S  Pharmacovigilance   S  Micro-­‐Targe1ng  Leverage   S  Financial   “Crowdsourcing”  driven   innova1on  to  beger  products  and   S  Fraud   services  (DELL,  Innocen1ve  (SAP,   S  Customer  Management   P&G))   S  Manufacturing   S  eDiscovery  (Legal  trends  and   S  Supply  chain  op1miza1on   pagerns,  financial  fraud)   S  Track  &  Trace   S  Compliance     copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 26.
    Why  DWBI  Fails  Repeatedly   Lost  value  =   Business  Value   Sum  (Latencies) +  Opportunity   Business  SituaDon   Cost   Data  Latency   Value   Data  is  ready   Lost   Analysis  Latency   InformaDon  is  available   Decision  Latency   Decision  is  made   AcDon  Dme  or  AcDon  distance   Time   Base  Graph  Courtesy  –  Dr.  Richard  Hackathorn   26   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 27.
    The  Data  Landscape   Datamarts  &   Transac1onal   Reports   Systems   ODS   Analy1cal   Databases   Dashboard s   Enterprise   Transac1onal   Datawarehouse     Datamarts  &   Systems   ODS   Analy1cal   Databases   Analy1c   Models   Other   Transac1onal   Applica1on ODS   Datamarts  &   s   Systems   Analy1cal   Databases   Data  Transforma1on   27   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 28.
    ACID  Kills   S Atomic – All of the work in a transaction completes (commit) or none of it completes S  Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints. S  Isolated – The results of any changes made during a transaction are not visible until the transaction has committed. S  Durable – The results of a committed transaction survive failures 28   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 29.
    BIG  Data  Scenarios  EXAMPLES   To:  Bob.Collins@bankwithus.com     Dear  Mr.  Collins,     This  email  is  in  reference  to  my  bank  account  which  has  been   efficiently  handled  by  your  bank  for  more  than  five  years.   There  has  been  no  problem  1ll  date  un1l  last  week  the   situa1on  went  out  of  the  hand.     I  have  deposited  one  of  my  high  amount  cheque  to  my  bank   account  no:  65656512  which  was  to  be  credited  same  day  but   due  to  your  staff  carelessness  it  wasn’t  done  and  because  of   this  negligence  my  reputa1on  in  the  market  has  been   tarnished.  Furthermore  I  had  issued  one  payment  cheque  to   the  party  which  was  showing  bounced  due  to  “Insufficient   balance”  just  because  my  cheque  didn’t  make  on  1me.     My  rela1onship  with  your  bank  has  matured  with  the  1me  and   it’s  a  shame  to  tell  you  about  this  kind  of  services  are  not   acceptable  when  it  is  ques1on  of  somebody’s  reputa1on.  I   hope  you  got  my  point  and  I  am  agaching  a  copy  of  the  same   for  further  rapid  procedures  and  remit  into  my  account  in  a   day.     Yours  sincerely     Daniel  Carter     Ph:  564-­‐009-­‐2311   29   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 30.
    BIG  Data  Text  Example   S  We  will  ooen  imply  addi1onal  informa1on  in  spoken  language  by  the  way  we  place   stress  on  words.     S  The  sentence  "I  never  said  she  stole  my  money"  demonstrates  the  importance  stress   can  play  in  a  sentence,  and  thus  the  inherent  difficulty  a  natural  language  processor  can   have  in  parsing  it.     S  "I  never  said  she  stole  my  money"  -­‐  Someone  else  said  it,  but  I  didn't.     S  "I  never  said  she  stole  my  money"  -­‐  I  simply  didn't  ever  say  it.     S  "I  never  said  she  stole  my  money"  -­‐  I  might  have  implied  it  in  some  way,  but  I  never   explicitly  said  it.     S  "I  never  said  she  stole  my  money"  -­‐  I  said  someone  took  it;  I  didn't  say  it  was  she.     S  "I  never  said  she  stole  my  money"  -­‐  I  just  said  she  probably  borrowed  it.     S  "I  never  said  she  stole  my  money"  -­‐  I  said  she  stole  someone  else's  money.     S  "I  never  said  she  stole  my  money"  -­‐  I  said  she  stole  something,  but  not  my  money   S  Depending  on  which  word  the  speaker  places  the  stress,  this  sentence  could  have   several  dis1nct  meanings.   30   copyright:  Sixth  Sense  Advisors  Inc  @2012   Example Source: Wikepedia
  • 31.
    Pattern  Detection   Clustering  Techniques   U1li1es   K-­‐Means   Accuracy  Measures   Maximin   Range  Filters   Agglomera1ve   K-­‐Fold  Cross  Valida1on   Divisive   Merge  &  Subset   Regression   Vector  Magnitude   Classifica1on  Techniques   Na1ve  Bayes   Examples     Neural  Networks   • Text  –  OCR,  Machine,  Digital   Back  Propoga1onal   •   Face  recogni1on,  verifica1on,  retrieval.     Recursively  Spliung     •   Finger  prints  recogni1on.   K-­‐Nearest  Neighbor   •   Speech  recogni1on.   Minimum  Distance   •   Medical  diagnosis:  X-­‐Ray,  EKG  analysis   •     Machine  diagnos1cs  data   Reduc1on  Techniques   •   Geological  data   Backward  Elimina1on   •   Automated  Target  Recogni1on  (ATR).   Forward  Selec1on   •     Image  segmenta1on  and  analysis  (recogni1on  from   Agribute  Removal   aerial  or  satelite  photographs).   Principal  Components   31   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 32.
    So  you  are  about  to  start   the  Big  Data  Project   Tools   Output   Data   instruc1ons   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   32   Reserved  
  • 33.
    The  Normal  Way  Results  In  ……..   33   @2012  Copyright  Sixth  Sense  Advisors  
  • 34.
    Performance   Re-­‐Engineering  a  Ferrari  Engine  in  a  Yugo  does  not  make  the  fastest  race  car. + New Data Types + New volume + New Analytics + New Data Retention + New Data Workloads 34   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 35.
    BIG  Data   ü Workload  Demands   ü  Infrastructure  Needs   ü  Process  dynamic  data  content   ü  Scalable  plaxorm   ü  Process  unstructured  data   ü  Database  independence   ü  Systems  that  can  scale  up  and   ü  Fault  Tolerance   scale  out  with  high  volume  data   ü  Supported  by  standard  toolsets   ü  Perform  complex  opera1ons   within  reasonable  response  1me   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   35   Reserved  
  • 36.
    Data  Warehouse  Appliance   High Availability   •  A  Data  Warehouse  (DW)   Appliance  is  an  integrated   Standard SQL Interface   set  of  servers,  storage,  OS,   database  and  interconnect   Advanced Compression   specifically  preconfigured   and  tuned  for  the  rigors  of   MPP   data  warehousing.     Leverages existing BI, ETL and OLTP investments   •  DW  appliances  offer  an   agrac1ve  price  /   Hadoop & MapReduce Interface / Embedded   performance  value   proposi1on  and  are   Minimal  disk  I/O  bogleneck;  simultaneously  load  &  query   frequently  a  frac1on  of  the   cost  of  tradi1onal  data   Auto Database Management   warehouse  solu1ons.     36   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 37.
    Hadoop   37   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 38.
    Hadoop & RDBMSAnalogy RDBMS   Hadoop   Sports car: Cargo train: •  refined •  rough •  has a lot of features •  missing a lot of •  accelerates very fast luxury •  pricey •  slow to accelerate •  expensive to maintain   •  carries almost anything •  moves a lot of stuff very efficiently *  Original  Slide  Author-­‐  Amr  Adwallah  ,  CloudEra   38   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 39.
    NoSQL   S  Stands  for  Not  Only  SQL   S  Based  on  CAP  Theorem  /  BASE   S  Usually  do  not  require  a  fixed  table  schema  nor  do  they  use  the  concept  of  joins   S  All  NoSQL  offerings  relax  one  or  more  of  the  ACID  properDes     S  Scalable replication and distribution S  Potentially thousands of machines S  Potentially distributed around the world S  Queries need to return answers quickly S  Mostly query, few updates S  Asynchronous Inserts & Updates S  NoSQL  databases  come  in  a  variety  of  flavors   S  XML  (myXMLDB,  Tamino,  Sedna)     S  Wide  Column  (Cassandra,  Hbase,  Big  Table)   S  Key/Value  (Redis,  Memcached  with  BerkleyDB)       S  Graph  (neo4j,  InfoGrid)   S  Document  store  (CouchDB,  MongoDB)   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   39   Reserved  
  • 40.
    NoSQL  Footprint   Amazon  Dynamo   HBase   Voldermort   Google  Big  Table   Size   Lotus  Notes   Graph   Cassandra   Theory   Complexity   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   40   Reserved  
  • 41.
    Map  Reduce   n Technique  for  indexing  and  searching  large  data  volumes   n  Two  Phases,  Map  and  Reduce   n  Map   n  Extract  sets  of  Key-­‐Value  pairs  from  underlying  data   n  Poten1ally  in  Parallel  on  mul1ple  machines   n  Reduce   n  Merge  and  sort  sets  of  Key-­‐Value  pairs   n  Results  may  be  useful  for  other  searches   41   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 42.
    Textual  ETL  Engine   Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into a structure of data that can be analyzed by standard analytical tools ü  Textual  ETL  Engine  provides  a  robust  user   interface  to  define  rules  (or  pagerns  /   keywords)  to  process  unstructured  or  semi-­‐ structured  data.   ü  The  rules  engine  encapsulates  all  the   complexity  and  lets  the  user  define  simple   phrases  and  keywords   ü  Easy  to  implement  and  easy  to  realize  ROI   ü  Advantages   ü  Disadvantages   ü  Simple  to  use   ü  Not  integrated  with  Hadoop  as  a  rules   ü  No  MR  or  Coding  required  for  text   interface   analysis  and  mining   ü  Currently  uses  Sqoop  for  metadata   ü  Extensible  by  Taxonomy  integra1on   interchange  with  Hadoop  or  NoSQL   ü  Works  on  standard  and  new  databases   interfaces   ü  Produces  a  highly  columnar  key-­‐value   ü  Current  GA  does  not  handle  distributed   store,  ready  for  metadata  integra1on   processing  outside  Windows  plaxorm    All  Rights   ©2012  Sixth  Sense  Advisors,  Inc.   42   Reserved  
  • 43.
    Integration   S  All  RDBMS  vendors  today  are  suppor1ng  Hadoop  or  NoSQL  as  an  integra1on  or  extension   S  Oracle  Exaly1cs  /  Big  Data  Appliance   S  Teradata  Aster  Appliance   S  EMC  Greenplum  Appliance   S  IBM  BigInsights   S  Microsoo  Windows  Azure  Integra1on   S  There  are  mul1ple  providers  of  Hadoop  distribu1on   S  CloudEra   S  HortonWorks   S  Hadapt   S  Zegaset   S  IBM   S  Adapters  from  vendors  to  interface  with  CloudEra  or  HortonWorks  distribu1ons  of  Hadoop   are  available  today.  There  are  integra1on  efforts  to  release  Hadoop  as  an  integral  engine   across  the  RDBMS  vendor  plaxorms   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   43   Reserved  
  • 44.
    Conceptual  Solu1on  Architecture   Metadata   MDM   ETL   Data   OLTP   ELT   Warehouse   CDC   DataMart’s   Big  Data   BIG  Data   Textual   DW   Content   ETL   Email   Taxonomy   Docs   And  /  Or   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   44   Reserved  
  • 45.
    Which  Tool   ApplicaDon   Hadoop   NoSQL   Textual  ETL   Machine  Learning   x   x   Sen1ments   x   x   x   Text  Processing   x   x   x   Image  Processing   x   x   Video  Analy1cs   x   x   Log  Parsing   x   x   x   Collabora1ve   x   x   x   Filtering   Context  Search   x   Email  &  Content   x   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   45   Reserved  
  • 46.
    Integration  Tips     S  The  key  to  the  castle  in  integra1ng  Big  Data  is  metadata   S  Whatever  the  tool,  technology  and  technique,  if  you  do  not  know   your  metadata,  your  integra1on  will  fail   S  Seman1c  technologies  and  architectures  will  be  the  way  to  process   and  integrate  the  Big  Data,  much  akin  to  Web  2.0  models   S  Data  quality  for  Big  Data  is  a  very  ques1onable  goal.  To  get  some   semblance  of  quality,  taxonomies  and  ontologies  can  be  of  help   S  3rd  part  data  providers  also  provide  keywords,  trending  tags  and   scores,  these  can  provide  a  lot  of  integra1on  support   S  Wri1ng  business  rules  for  Big  Data  can  be  very  cumbersome  and  not   all  programs  can  be  wrigen  in  MapReduce   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   46   Reserved  
  • 47.
    Success  Stories   S Machine  learning  &  Recommenda1on  Engines  –  Amazon,  Orbitz   S  CRM  -­‐  Consumer  Analy1cs,  Metrics,  Social  Network  Analy1cs,  Churn,   Sen1ment,  Influencer,  Proximity   S  Finance  –  Fraud,  Compliance   S  Telco  –  CDR,  Fraud   S  Healthcare  –  Provider  /  Pa1ent  analy1cs,  fraud,  proac1ve  care   S  Lifesciences  –  clinical  analy1cs,  physician  outreach   S  Pharma  –  Pharmacovigilance,  clinical  trials   S  Insurance  –  fraud,  geo-­‐spa1al   S  Manufacturing  –  warranty  analy1cs,  supplier  quality  metrics   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   47   Reserved  
  • 48.
    Big  Data  Challenges   S  Integra1on  to  the  EDW  is  s1ll  an  open  issue  –  Big  Data  reduces   to  small  metrics,  and  this  translates  into  the  current  state  issues   faced  with  EDW  data   S  Big  Data  requires  lot  of  Taxonomy  processing  especially  in   Content  related  Search   S  There  are  several  applica1ons  that  need  high  performing   memory  architectures  as  data  is  compute  intensive  –  example   image  processing  of  brain  scans   S  Technology  is  improving  by  the  day,  but  integra1on  and   deployment  are  becoming  equally  complex.   48   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  • 49.
    Data  Science       Art  &  Science   Data Analytics    Content  Customer Product Behaviors  Optimization Big Data Processing & ETL Business  Intelligence   Advanced  Analy1cs   Business  Analysts,  Data  Analysts,  Metadata  Architects,   Data  Architects  are  all  in  some  evolu1onary  stage  of  a  Data  Scien1st   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   49   Reserved  
  • 50.
    Summary   S  With  effec1ve  use  of  Big  Data  and  Analy1cs   S  You  can  drive  successful  business  transforma1ons   S  Create  an  agile  environment  for  business  decision  processes   S  Use  the  Data  Warehouse  for  Analy1cal  Processes  as  it  was   originally  designed  for   S  Create  predic1ve  insights   S  Prac1cally  “mine  (explore)”  any  data  from  any  source   S  Create  powerful  dashboards  from  near  real  1me  data   S  Reduce  risk   S  Increase  compe11veness  
  • 51.
    Contact   Krish  Krishnan   rkrish1124@yahoo.com   Twiger  -­‐  @datagenius   51   copyright:  Sixth  Sense  Advisors  Inc  @2012