DAMA Presentation


Published on

Krish Krishnan\'s Power Point Presentation from the 9/13/12 Joint CBIG & DAMA Event

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

DAMA Presentation

  1. 1. Driving  Business  Transformations  with   Big  Data  Analytics     DAMA  SouthWest  Ohio   September  13,  2012   S
  2. 2. Key  Business  Trends  S  Mega  Trends   S  Socializa1on   S  Collabora1on   S  Gamifica1on   S  Mobile  S  Micro  Trends   S  Micro-­‐Segmenta1on   S  Advanced  Analy1cs  
  3. 3. Crowdsourcing  &  Collabora1on     Within  1  month:  GoldCorp   Within  a  few  years:   S  More  than  1000  virtual  prospectors   •  From  a  $100  million  company  into  a  $9   billion  juggernaut     S  50  countries   S  110  new  targets,  50%  previously   uniden1fied   S  80%  yielded  gold   •  $575,000 prize money •  400Mb data •  55,000 acres 3   copyright  @Sixth  Sense  Advisors  Inc  2012  
  4. 4. Collaboration  &  GamiCication   4   copyright  @Sixth  Sense  Advisors  Inc  2012  
  5. 5. Gamifica1on  
  6. 6. Peer  2  Peer  Collabora1on  
  7. 7. Crowdsourcing    
  8. 8. Game  Changer  S  To  become  a  leader  from  a  compe1tor  and  create  an   undisputed  market  presence,  companies  need  to  create  new   and  vibrant  business  models  S  These  business  models  need  a  lot  of  research,  idea1on  and   execu1on  (read  –  Data,  Data  and  more  Data)  S  Companies  that  can  harvest  data  efficiently  and  effec1vely  will   emerge  as  the  winner  of  the  Game,  ul1mately  changing  the   Game.  
  9. 9. What  Does  It  Take   S
  10. 10. A  Growing  Trend   Expecta1ons  for  BI  are  changing  w/o  anyone  telling  us  Requirement   ExpectaDons   Reality   Speed   Speed  of  the  Internet   Speed  =  Infra  +  Arch  +   Design  Accessibility   Accessibility  of  a   BI  Tool  licenses  &  security   Smartphone   Usability   IPAD  -­‐  Mobility   Web  Enabled  BI  Tool   Availability   Google  Search   Data  &  Report  Metadata   Delivery   Speed  of  ques1ons   Methodology  &  Signoff   Data   Access  to  everything     Structured  Data   Scalability   Cloud  (Amazon)   Exis1ng  Infrastructure   Cost   Cell  phone  or  Free  WIFI   Millions   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   10   Reserved  
  11. 11. Long  Tail   The New Way (with a bigger, longer tail) The Old Way(Pareto Principle, Control or 80/20 rule) Source: http://en.wikipedia.org/wiki/The_Long_Tail 20% When Web 2.0 is applied… copyright:  Sixth  Sense  Advisors  Inc  @2012  
  12. 12. 2008 US Presidential Elections $32 million raised from 275,000 people who gave $100 or less copyright:  Sixth  Sense  Advisors  Inc  @2012  
  13. 13. Long  Tail  Example   Web 2.0 significantly increases total value contributed/received by aggregating the long tail of smaller value donors.High $ value donors, Smallconstellation Source: http://en.wikipedia.org/wiki/The_Long_Tail 20% Low $ value donors, Larger constellation copyright:  Sixth  Sense  Advisors  Inc  @2012  
  14. 14. Brand  Management   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  15. 15. Big  Data   S
  16. 16. The  Buzz   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  17. 17. Data  Disruptions  Porter  CompeDDve  Model   17   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  18. 18. State  of  Data  Today   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   18   Reserved  
  19. 19. Future  of  Data   19   copyright  @Sixth  Sense  Advisors  Inc  2012  
  20. 20. Big  Data  Big Data can be defined as data that can grow in volume, velocity, variety and complexity atunprecedented pace. The growth and complexity present challenges with the capture, storage,management, analysis and visualization using the typical BI tool stack 20   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  21. 21. Tapping into the data Business   Infrastructure   Today  we  do  Big  or  Small  Structured data compute  with  Small  and  Large  used today   structured  data  sets  Big Data Big  Data  will  mean  Big  or  existing across Small  compute  with  Big  the enterprise data  sets,  not  always  that can be available  in  structured  or  made available semi-­‐structured  formats  to business   21   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  22. 22. Analytics  S  Analy1cs  is  the  key  visualiza1on  technique  to  analyze  and  mone1ze   from  Big  Data  S  The  field  of  analy1cs  is  resurging  from  the  advent  of  Big  Data     S  Social  Analy1cs   S  Sensor  Analy1cs   S  Text  Analy1cs   S  Deep  Data  Mining  S  Analy1cs  needs  metadata  for  integra1on  S  Applica1ons   S  Fraud  Detec1on   S  Campaign  Op1miza1on   S  Demand  and  Supply  Op1miza1on   S  Forecast  Op1miza1on   22   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  23. 23. What’s  so  Big  about  Big  Data   Velocity   Volume   Variety   Complexity   Ambiguity     ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   23   Reserved  
  24. 24. What  do  we  collect  •  Facebook has an average of 30 billion pieces of content added every month•  YouTube receives 24hours of video, every minute•  5 Billion mobile phones in use in 2010•  A leading retailer in the UK collects 1.5 billion pieces of information to adjust prices and promotions•  Amazon.com: 30% of sales is out of its recommendation engine•  A Boeing Jet Engine produces 20TB/Hour for engineers to examine in real time to make improvements 24   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  25. 25. Potential  Business  Insights    S  Trends   S  Pharmaceu1cal  Companies     S  Pa1ent  Educa1on  S  Brand  Iden1ty  &  Management   S  Physician  Enriched  Content   Management  S  Consumer  Educa1on   S  Reduce  Clinical  Trial  Cycles  and   Errors  S  Compe11ve  Intelligence   S  Pharmacovigilance  S  Micro-­‐Targe1ng  Leverage   S  Financial   “Crowdsourcing”  driven   innova1on  to  beger  products  and   S  Fraud   services  (DELL,  Innocen1ve  (SAP,   S  Customer  Management   P&G))   S  Manufacturing  S  eDiscovery  (Legal  trends  and   S  Supply  chain  op1miza1on   pagerns,  financial  fraud)   S  Track  &  Trace   S  Compliance     copyright:  Sixth  Sense  Advisors  Inc  @2012  
  26. 26. Why  DWBI  Fails  Repeatedly   Lost  value  =  Business  Value   Sum  (Latencies) +  Opportunity   Business  SituaDon   Cost   Data  Latency   Value   Data  is  ready   Lost   Analysis  Latency   InformaDon  is  available   Decision  Latency   Decision  is  made   AcDon  Dme  or  AcDon  distance   Time  Base  Graph  Courtesy  –  Dr.  Richard  Hackathorn   26   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  27. 27. The  Data  Landscape   Datamarts  &  Transac1onal   Reports   Systems   ODS   Analy1cal   Databases   Dashboard s   Enterprise  Transac1onal   Datawarehouse     Datamarts  &   Systems   ODS   Analy1cal   Databases   Analy1c   Models   Other  Transac1onal   Applica1on ODS   Datamarts  &   s   Systems   Analy1cal   Databases   Data  Transforma1on   27   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  28. 28. ACID  Kills  S  Atomic – All of the work in a transaction completes (commit) or none of it completesS  Consistent – A transaction transforms the database from one consistent state to another consistent state. Consistency is defined in terms of constraints.S  Isolated – The results of any changes made during a transaction are not visible until the transaction has committed.S  Durable – The results of a committed transaction survive failures 28   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  29. 29. BIG  Data  Scenarios  EXAMPLES  To:  Bob.Collins@bankwithus.com    Dear  Mr.  Collins,    This  email  is  in  reference  to  my  bank  account  which  has  been  efficiently  handled  by  your  bank  for  more  than  five  years.  There  has  been  no  problem  1ll  date  un1l  last  week  the  situa1on  went  out  of  the  hand.    I  have  deposited  one  of  my  high  amount  cheque  to  my  bank  account  no:  65656512  which  was  to  be  credited  same  day  but  due  to  your  staff  carelessness  it  wasn’t  done  and  because  of  this  negligence  my  reputa1on  in  the  market  has  been  tarnished.  Furthermore  I  had  issued  one  payment  cheque  to  the  party  which  was  showing  bounced  due  to  “Insufficient  balance”  just  because  my  cheque  didn’t  make  on  1me.    My  rela1onship  with  your  bank  has  matured  with  the  1me  and  it’s  a  shame  to  tell  you  about  this  kind  of  services  are  not  acceptable  when  it  is  ques1on  of  somebody’s  reputa1on.  I  hope  you  got  my  point  and  I  am  agaching  a  copy  of  the  same  for  further  rapid  procedures  and  remit  into  my  account  in  a  day.    Yours  sincerely    Daniel  Carter    Ph:  564-­‐009-­‐2311   29   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  30. 30. BIG  Data  Text  Example   S  We  will  ooen  imply  addi1onal  informa1on  in  spoken  language  by  the  way  we  place   stress  on  words.     S  The  sentence  "I  never  said  she  stole  my  money"  demonstrates  the  importance  stress   can  play  in  a  sentence,  and  thus  the  inherent  difficulty  a  natural  language  processor  can   have  in  parsing  it.     S  "I  never  said  she  stole  my  money"  -­‐  Someone  else  said  it,  but  I  didnt.     S  "I  never  said  she  stole  my  money"  -­‐  I  simply  didnt  ever  say  it.     S  "I  never  said  she  stole  my  money"  -­‐  I  might  have  implied  it  in  some  way,  but  I  never   explicitly  said  it.     S  "I  never  said  she  stole  my  money"  -­‐  I  said  someone  took  it;  I  didnt  say  it  was  she.     S  "I  never  said  she  stole  my  money"  -­‐  I  just  said  she  probably  borrowed  it.     S  "I  never  said  she  stole  my  money"  -­‐  I  said  she  stole  someone  elses  money.     S  "I  never  said  she  stole  my  money"  -­‐  I  said  she  stole  something,  but  not  my  money   S  Depending  on  which  word  the  speaker  places  the  stress,  this  sentence  could  have   several  dis1nct  meanings.   30   copyright:  Sixth  Sense  Advisors  Inc  @2012  Example Source: Wikepedia
  31. 31. Pattern  Detection  Clustering  Techniques   U1li1es   K-­‐Means   Accuracy  Measures   Maximin   Range  Filters   Agglomera1ve   K-­‐Fold  Cross  Valida1on   Divisive   Merge  &  Subset   Regression   Vector  Magnitude  Classifica1on  Techniques   Na1ve  Bayes   Examples     Neural  Networks   • Text  –  OCR,  Machine,  Digital   Back  Propoga1onal   •   Face  recogni1on,  verifica1on,  retrieval.     Recursively  Spliung     •   Finger  prints  recogni1on.   K-­‐Nearest  Neighbor   •   Speech  recogni1on.   Minimum  Distance   •   Medical  diagnosis:  X-­‐Ray,  EKG  analysis   •     Machine  diagnos1cs  data  Reduc1on  Techniques   •   Geological  data   Backward  Elimina1on   •   Automated  Target  Recogni1on  (ATR).   Forward  Selec1on   •     Image  segmenta1on  and  analysis  (recogni1on  from   Agribute  Removal   aerial  or  satelite  photographs).   Principal  Components   31   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  32. 32. So  you  are  about  to  start   the  Big  Data  Project   Tools   Output   Data  instruc1ons   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   32   Reserved  
  33. 33. The  Normal  Way  Results  In  ……..   33   @2012  Copyright  Sixth  Sense  Advisors  
  34. 34. Performance  Re-­‐Engineering  a  Ferrari  Engine  in  a  Yugo  does  not  make  the  fastest  race  car. + New Data Types + New volume + New Analytics + New Data Retention + New Data Workloads 34   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  35. 35. BIG  Data  ü  Workload  Demands   ü  Infrastructure  Needs   ü  Process  dynamic  data  content   ü  Scalable  plaxorm   ü  Process  unstructured  data   ü  Database  independence   ü  Systems  that  can  scale  up  and   ü  Fault  Tolerance   scale  out  with  high  volume  data   ü  Supported  by  standard  toolsets   ü  Perform  complex  opera1ons   within  reasonable  response  1me   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   35   Reserved  
  36. 36. Data  Warehouse  Appliance  High Availability   •  A  Data  Warehouse  (DW)   Appliance  is  an  integrated  Standard SQL Interface   set  of  servers,  storage,  OS,   database  and  interconnect  Advanced Compression   specifically  preconfigured   and  tuned  for  the  rigors  of  MPP   data  warehousing.    Leverages existing BI, ETL and OLTP investments   •  DW  appliances  offer  an   agrac1ve  price  /  Hadoop & MapReduce Interface / Embedded   performance  value   proposi1on  and  are  Minimal  disk  I/O  bogleneck;  simultaneously  load  &  query   frequently  a  frac1on  of  the   cost  of  tradi1onal  data  Auto Database Management   warehouse  solu1ons.     36   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  37. 37. Hadoop   37   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  38. 38. Hadoop & RDBMS Analogy RDBMS   Hadoop   Sports car: Cargo train: •  refined •  rough •  has a lot of features •  missing a lot of •  accelerates very fast luxury •  pricey •  slow to accelerate •  expensive to maintain   •  carries almost anything •  moves a lot of stuff very efficiently*  Original  Slide  Author-­‐  Amr  Adwallah  ,  CloudEra   38   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  39. 39. NoSQL  S  Stands  for  Not  Only  SQL  S  Based  on  CAP  Theorem  /  BASE  S  Usually  do  not  require  a  fixed  table  schema  nor  do  they  use  the  concept  of  joins  S  All  NoSQL  offerings  relax  one  or  more  of  the  ACID  properDes    S  Scalable replication and distribution S  Potentially thousands of machines S  Potentially distributed around the worldS  Queries need to return answers quicklyS  Mostly query, few updatesS  Asynchronous Inserts & UpdatesS  NoSQL  databases  come  in  a  variety  of  flavors   S  XML  (myXMLDB,  Tamino,  Sedna)     S  Wide  Column  (Cassandra,  Hbase,  Big  Table)   S  Key/Value  (Redis,  Memcached  with  BerkleyDB)       S  Graph  (neo4j,  InfoGrid)   S  Document  store  (CouchDB,  MongoDB)   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   39   Reserved  
  40. 40. NoSQL  Footprint   Amazon  Dynamo   HBase   Voldermort   Google  Big  Table  Size   Lotus  Notes   Graph   Cassandra   Theory   Complexity   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   40   Reserved  
  41. 41. Map  Reduce  n  Technique  for  indexing  and  searching  large  data  volumes  n  Two  Phases,  Map  and  Reduce   n  Map   n  Extract  sets  of  Key-­‐Value  pairs  from  underlying  data   n  Poten1ally  in  Parallel  on  mul1ple  machines   n  Reduce   n  Merge  and  sort  sets  of  Key-­‐Value  pairs   n  Results  may  be  useful  for  other  searches   41   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  42. 42. Textual  ETL  Engine  Forest Rim Technology – Textual ETL Engine (TETLE) – is an integration tool for turning text into astructure of data that can be analyzed by standard analytical tools ü  Textual  ETL  Engine  provides  a  robust  user   interface  to  define  rules  (or  pagerns  /   keywords)  to  process  unstructured  or  semi-­‐ structured  data.   ü  The  rules  engine  encapsulates  all  the   complexity  and  lets  the  user  define  simple   phrases  and  keywords   ü  Easy  to  implement  and  easy  to  realize  ROI   ü  Advantages   ü  Disadvantages   ü  Simple  to  use   ü  Not  integrated  with  Hadoop  as  a  rules   ü  No  MR  or  Coding  required  for  text   interface   analysis  and  mining   ü  Currently  uses  Sqoop  for  metadata   ü  Extensible  by  Taxonomy  integra1on   interchange  with  Hadoop  or  NoSQL   ü  Works  on  standard  and  new  databases   interfaces   ü  Produces  a  highly  columnar  key-­‐value   ü  Current  GA  does  not  handle  distributed   store,  ready  for  metadata  integra1on   processing  outside  Windows  plaxorm    All  Rights   ©2012  Sixth  Sense  Advisors,  Inc.   42   Reserved  
  43. 43. Integration  S  All  RDBMS  vendors  today  are  suppor1ng  Hadoop  or  NoSQL  as  an  integra1on  or  extension   S  Oracle  Exaly1cs  /  Big  Data  Appliance   S  Teradata  Aster  Appliance   S  EMC  Greenplum  Appliance   S  IBM  BigInsights   S  Microsoo  Windows  Azure  Integra1on  S  There  are  mul1ple  providers  of  Hadoop  distribu1on   S  CloudEra   S  HortonWorks   S  Hadapt   S  Zegaset   S  IBM  S  Adapters  from  vendors  to  interface  with  CloudEra  or  HortonWorks  distribu1ons  of  Hadoop   are  available  today.  There  are  integra1on  efforts  to  release  Hadoop  as  an  integral  engine   across  the  RDBMS  vendor  plaxorms   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   43   Reserved  
  44. 44. Conceptual  Solu1on  Architecture   Metadata   MDM   ETL   Data   OLTP   ELT   Warehouse   CDC   DataMart’s   Big  Data  BIG  Data   Textual   DW  Content   ETL   Email   Taxonomy   Docs   And  /  Or   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   44   Reserved  
  45. 45. Which  Tool   ApplicaDon   Hadoop   NoSQL   Textual  ETL  Machine  Learning   x   x   Sen1ments   x   x   x   Text  Processing   x   x   x  Image  Processing   x   x   Video  Analy1cs   x   x   Log  Parsing   x   x   x   Collabora1ve   x   x   x   Filtering   Context  Search   x  Email  &  Content   x   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   45   Reserved  
  46. 46. Integration  Tips    S  The  key  to  the  castle  in  integra1ng  Big  Data  is  metadata  S  Whatever  the  tool,  technology  and  technique,  if  you  do  not  know   your  metadata,  your  integra1on  will  fail  S  Seman1c  technologies  and  architectures  will  be  the  way  to  process   and  integrate  the  Big  Data,  much  akin  to  Web  2.0  models  S  Data  quality  for  Big  Data  is  a  very  ques1onable  goal.  To  get  some   semblance  of  quality,  taxonomies  and  ontologies  can  be  of  help  S  3rd  part  data  providers  also  provide  keywords,  trending  tags  and   scores,  these  can  provide  a  lot  of  integra1on  support  S  Wri1ng  business  rules  for  Big  Data  can  be  very  cumbersome  and  not   all  programs  can  be  wrigen  in  MapReduce   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   46   Reserved  
  47. 47. Success  Stories  S  Machine  learning  &  Recommenda1on  Engines  –  Amazon,  Orbitz  S  CRM  -­‐  Consumer  Analy1cs,  Metrics,  Social  Network  Analy1cs,  Churn,   Sen1ment,  Influencer,  Proximity  S  Finance  –  Fraud,  Compliance  S  Telco  –  CDR,  Fraud  S  Healthcare  –  Provider  /  Pa1ent  analy1cs,  fraud,  proac1ve  care  S  Lifesciences  –  clinical  analy1cs,  physician  outreach  S  Pharma  –  Pharmacovigilance,  clinical  trials  S  Insurance  –  fraud,  geo-­‐spa1al  S  Manufacturing  –  warranty  analy1cs,  supplier  quality  metrics   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   47   Reserved  
  48. 48. Big  Data  Challenges  S  Integra1on  to  the  EDW  is  s1ll  an  open  issue  –  Big  Data  reduces   to  small  metrics,  and  this  translates  into  the  current  state  issues   faced  with  EDW  data  S  Big  Data  requires  lot  of  Taxonomy  processing  especially  in   Content  related  Search  S  There  are  several  applica1ons  that  need  high  performing   memory  architectures  as  data  is  compute  intensive  –  example   image  processing  of  brain  scans  S  Technology  is  improving  by  the  day,  but  integra1on  and   deployment  are  becoming  equally  complex.   48   copyright:  Sixth  Sense  Advisors  Inc  @2012  
  49. 49. Data  Science       Art  &  Science   Data Analytics    Content  Customer Product Behaviors  Optimization Big Data Processing & ETL Business  Intelligence   Advanced  Analy1cs  Business  Analysts,  Data  Analysts,  Metadata  Architects,  Data  Architects  are  all  in  some  evolu1onary  stage  of  a  Data  Scien1st   ©2012  Sixth  Sense  Advisors,  Inc.    All  Rights   49   Reserved  
  50. 50. Summary  S  With  effec1ve  use  of  Big  Data  and  Analy1cs   S  You  can  drive  successful  business  transforma1ons   S  Create  an  agile  environment  for  business  decision  processes   S  Use  the  Data  Warehouse  for  Analy1cal  Processes  as  it  was   originally  designed  for   S  Create  predic1ve  insights   S  Prac1cally  “mine  (explore)”  any  data  from  any  source   S  Create  powerful  dashboards  from  near  real  1me  data   S  Reduce  risk   S  Increase  compe11veness  
  51. 51. Contact  Krish  Krishnan  rkrish1124@yahoo.com  Twiger  -­‐  @datagenius   51   copyright:  Sixth  Sense  Advisors  Inc  @2012