Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache	  Hadoop	  and	  the	  	  Big	  Data	  Opportunity	  in	  Banking	  A	   w ebcast	   f rom	   H ortonworks	   & 	  ...
Presenters	  •  Arun	  C.	  Murthy	     − Co-­‐founder	  of	  Hortonworks	     − Lead	  of	  NextGen	  MapReduce	  in	  Ap...
Agenda	  •  What is Apache Hadoop?•  Creating Value from Big Data•  Tresata and Hadoop for Banking•  Future of Hadoop     ...
What	  is	  Apache	  Hadoop?	  A	  set	  of	  open	  source	  projects	  owned	  by	  the	  Apache	  FoundaJon	  that	  tr...
Core Apache Hadoop Projects                                                                                               ...
Apache	  Hadoop:	  Why	  is	  it	  TransformaDonal?	  Data	  Deluge	  (growth	  faster	  than	  Moore’s	  law)	  •        ...
Apache	  Hadoop:	  Why	  is	  it	  TransformaDonal?	  New	  way	  of	  thinking	  about	  your	  data:	  	  •  Current	  	...
5	  Ways	  to	  Create	  Value	  from	  Big	  Data	  Source:	  	  McKinsey	  &	  Company	  report.	  Big	  data:	  The	  n...
Crossing	  the	  Chasm	                        DisrupJon:	  Data	                                                 Geoffrey	...
Typical	  ApplicaDons	  &	  Early	  Adopters	                                                          data               ...
Big	  Data	  Has	  Reached	  Every	  Market	  Sector	  Digital	  data	  is	  personal,	  everywhere,	  increasingly	  acce...
Big	  Data	  Value	  CreaDon	  OpportuniDes	  Financial	  Services	                                          Healthcare	  ...
Why	  can	  you	  bank	  on	  Hortonworks?	  •  Architects	  of	  Hadoop	  since	  big	  bang,	  circa	  2006	  •  Real	  ...
Tresata	  =	  Hadoop	  for	  Banking	                     15	  
What	  We	  Believe	  1.  Data	  will	  reboot	  the	  financial	  service	  industry	  2.  Data	  growth	  is	  viral…exis...
sharpen their data skills, even                                                        the accessibility of data sources, ...
The	  Opportunity	  1.  1-­‐5%	  of	  data	  in	  a	  financial	  insJtuJon	  is	  analyzed	  2.  Not	  all	  data	  is	  s...
The	  Business	  Problems	  1.  Storage	  &	  retrieval	  –	  archive	  data	  on	  disk	  2.  Consumer	  behavior	  analy...
Hadoop	  in	  Banking	  Today	  1.  Early	  adopter,	  like	  any	  other	  technology	  trend	   a.  Interest	  is	  glob...
Elephants	  in	  the	  Room	  1.  Resources	   a.  Talent	  –	  how	  many	  MapReduce	  developers	  can	  we	  get	   b....
How	  Tresata	  Changes	  the	  Game…	  1.  Our	  ApplicaDon	         a.  FS	  for	  Hadoop	  –	  fully	  built	  on	  had...
Tresata	  Case	  Study	   A.  Client	  Business	  Problem	          i.  Problem	  –	  Process	  data	  and	  score	  for	 ...
Tresata	  Big	  Data	  ApplicaDon	                           24	  
Tresata	  AnalyDcs	                            25	  
Tresata	  +	  Hortonworks	  1.  Commitment	  to	  train	   a.  Series	  of	  webinars	   b.  Hadoop	  training	  tailored	...
Where	  Hadoop	  is	  Going	  ©	  Hortonworks	  Inc.	  2011	     ©	  tresata	  	  	  all	  rights	  reserved	  
The	  Future	  •  Hadoop	  is	  going	  mainstream	    − Amazon,	  Microsou,	  Oracle,	  EMC,	  NetApp,	  etc.	  •  Apache...
Technology	  Roadmap	  	                                                                                                  ...
Next	  Steps	  •  Engage	  with	  Hortonworks	  &	  Tresata	    − www.hortonworks.com	    − www.tresata.com	  •  AddiJonal...
QuesDons?	  	  ©	  Hortonworks	  Inc.	  2011	     ©	  tresata	  	  	  all	  rights	  reserved	  
Upcoming SlideShare
Loading in …5
×

Apache hadoop bigdata-in-banking

8,644 views

Published on

Published in: Technology

Apache hadoop bigdata-in-banking

  1. 1. Apache  Hadoop  and  the    Big  Data  Opportunity  in  Banking  A   w ebcast   f rom   H ortonworks   &   T resata  ©  Hortonworks  Inc.  2011   ©  tresata      all  rights  reserved  
  2. 2. Presenters  •  Arun  C.  Murthy   − Co-­‐founder  of  Hortonworks   − Lead  of  NextGen  MapReduce  in  Apache  Hadoop   − Long-­‐Jme  contributor  and  commiKer  to  Apache  Hadoop  •  Abhishek  (Abhi)  Mehta   − Co-­‐founder  of  Tresata   − Creator  of  first  Hadoop-­‐powered  big  data  &  analyJcs  plaOorm  for   financial  industry  data   3  
  3. 3. Agenda  •  What is Apache Hadoop?•  Creating Value from Big Data•  Tresata and Hadoop for Banking•  Future of Hadoop 4  
  4. 4. What  is  Apache  Hadoop?  A  set  of  open  source  projects  owned  by  the  Apache  FoundaJon  that  transforms  commodity  computers  and  network  into  a  distributed  service    •  HDFS  –  Stores  petabytes  of  data  reliably  •  MapReduce  –  Allows  huge  distributed   computaJons    Key  A9ributes  •  Reliable  and  redundant  –  Doesn’t  slow  down  or  loose  data  even  as  hardware  fails  •  Very  powerful  –  Harnesses  huge  clusters,  supports  best  of  breed  analyJcs  •  Scalable  –  scales  linearly  to  handle  “big  data”  volumes  •  Cost-­‐effecDve  –  runs  on  commodity  machines  &  network  •  Simple  and  flexible  APIs  –  enabling  a  large  ecosystem  of  soluJon  providers   5  
  5. 5. Core Apache Hadoop Projects Programming Pig   Hive   (Data  Flow)   (SQL)   Languages MapReduce   Computation Zookeeper     (Management)   (CoordinaJon)   (Distributed  Programing  Framework)  HMS   HCatalog   HBase   Table Storage (Meta  Data)   (Columnar  Storage)     HDFS     Object Storage (Hadoop  Distributed  File  System)   Core Apache Hadoop Related Apache Projects 6  
  6. 6. Apache  Hadoop:  Why  is  it  TransformaDonal?  Data  Deluge  (growth  faster  than  Moore’s  law)  •  Economist:  Only  5%  of  generated  data  is  structured  •  Gartner:    Data  growth  is  the  biggest  data  center  hardware  infrastructure   challenge  for  large  enterprises  •  Forrester:  Four  Vs  -­‐  Volume,  Velocity,  Variety,  Variability  •  Hundreds  of  exabytes  of  data  per  year!     Source:  IDC  Digital  Universe  Study,  May  2011   7  
  7. 7. Apache  Hadoop:  Why  is  it  TransformaDonal?  New  way  of  thinking  about  your  data:    •  Current     − What  data  do  I  keep?   − What  reports  do  I  run?   − Sample  –  Store  –  Extrapolate  à  What-­‐If  scenarios  à  Variable  insights    •  New  dawn   − Store  everything  (viable  and  economical)  –  Process  whenever!   − Test  every  what-­‐if  situaJon,  prove  every  hypothesis…  do  it  in  a  Jmely  manner!   − No  more  down-­‐sampled  data   8  
  8. 8. 5  Ways  to  Create  Value  from  Big  Data  Source:    McKinsey  &  Company  report.  Big  data:  The  next  fronJer  for  innovaJon,  compeJJon,  and  producJvity.  May  2011.   9  
  9. 9. Crossing  the  Chasm   DisrupJon:  Data   Geoffrey  A.  Moore*   10  
  10. 10. Typical  ApplicaDons  &  Early  Adopters   data analyzing web logs analytics advertising optimization machine learning mail anti-spam text mining web search content optimization customer trend analysis ad selection video & audio processing data mining user interest prediction social media 11  
  11. 11. Big  Data  Has  Reached  Every  Market  Sector  Digital  data  is  personal,  everywhere,  increasingly  accessible,   and  will  con:nue  to  grow  exponen:ally   Source:    McKinsey  &  Company  report.  Big   data:  The  next  fronJer  for  innovaJon,   compeJJon,  and  producJvity.  May  2011.  
  12. 12. Big  Data  Value  CreaDon  OpportuniDes  Financial  Services   Healthcare  •   Detect  fraud   •   OpJmal  treatment  pathways  •   Model  and  manage  risk   •   Remote  paJent  monitoring  •   Improve  debt  recovery  rates   •   PredicJve  modeling  for  new  drugs  •   Personalize  banking/insurance  products   •   Personalized  medicine  Retail   Web  /  Social  /  Mobile  •   In-­‐store  behavior  analysis   •   LocaJon-­‐based  markeJng  •   Cross  selling   •   Social  segmentaJon  •   OpJmize  pricing,  placement,  design   •   SenJment  analysis  •   OpJmize  inventory  and  distribuJon   •   Price  comparison  services  Manufacturing   Government  •   Design  to  value   •   Reduce  fraud  •   Crowd-­‐sourcing   •   Segment  populaJons,  customize  acJon  •   “Digital  factory”  for  lean  manufacturing   •   Support  open  data  iniJaJves  •   Improve  service  via  product  sensor   •   Automate  decision  making  data   13  
  13. 13. Why  can  you  bank  on  Hortonworks?  •  Architects  of  Hadoop  since  big  bang,  circa  2006  •  Real  world  experience  supporJng  the  largest  Hadoop  install   in  the  world:   − 50,000  node  footprint     − Over  200PB  of  data   − 24x7  service     − Billions  of  ad  dollars  •  We’ve  taken  the  3am  calls  to  fix  stuff  when  it  breaks!   14  
  14. 14. Tresata  =  Hadoop  for  Banking   15  
  15. 15. What  We  Believe  1.  Data  will  reboot  the  financial  service  industry  2.  Data  growth  is  viral…exisJng  tools  can’t  keep  up  3.  The  economics  to  store,  process,  analyze  and  visualize  all   of  your  data  makes  it  a  ‘no-­‐brainer’  to  do  so  4.  Big  Data  capabiliDes  are  needed  to  address  business   problems   16  
  16. 16. sharpen their data skills, even the accessibility of data sources, and the low-ranking sectors (by our gauges degree to which managers make data-driven of value potential and data capture), decisions. Q4 2011and  What  McKinsey  Said   such as construction and education, Big data sidebar on sector productivity could see their fortunes change. Exhibit 1 of 1 The ease of capturing big data’s value, and the magnitude of its potential, vary across sectors. Example: US economy Size of bubble indicates relative contribution to GDP High Utilities Health care Computers and other electronic products Natural resources providers Information Manufacturing Big data: ease-of-capture index1 Finance and insurance Professional services Transportation and warehousing Real estate Accommodation and food Management of companies Construction Wholesale trade Administrative services Retail trade Other services Educational services Government Arts and Low entertainment High Big data: value potential index1 Source:    mcksinsey  global  insJtute  october  2011     ! "#$%&()*+&%,-+*.)(*#/%#0%1($*.23%2%)--/&*,%*/%4.5*/26%7+#8)+%9/2(*(:(%0:++%$-#$(%!"#$%&&($)*+$,+-$./0,"+/$.0/$ ",,01&"0,2$3045+""0,2$&,%$5/0%63"1"73%);)*+)8+%0$%#0%.<)$=%#/+*/%)(%1.>*/26?.#1@1=*? A#:$.B%CA%D:$):%#0%E)8#$%A()(*2(*.2F%4.5*/26%7+#8)+%9/2(*(:(%)/)+62*2%% 17  
  17. 17. The  Opportunity  1.  1-­‐5%  of  data  in  a  financial  insJtuJon  is  analyzed  2.  Not  all  data  is  stored  3.  Top-­‐down  macros  cannot  be  implemented  or  acted  on  4.  New  approaches  to  data  &  analyDcs  are  essenJal  Source:    tresata  research   18  
  18. 18. The  Business  Problems  1.  Storage  &  retrieval  –  archive  data  on  disk  2.  Consumer  behavior  analysis  –  as  it  happens  3.  Modeling  &  AnalyDcs  –  model  enJre  data  sets…  4.  Single  View  Of  Customer  –  structured  &  unstructured  data     19  
  19. 19. Hadoop  in  Banking  Today  1.  Early  adopter,  like  any  other  technology  trend   a.  Interest  is  global,  not  just  limited  to  the  US   b.  Approved  for  use  by  most  technology  teams  &  architects     c.  Proof  of  concepts  ongoing  at  most  financial  services  insJtuJons  2.  Broad  agreement  that:   a.  Hadoop  will  be  the  Big  Data  operaDng  system   b.  A  hadoop  powered  Data  processing  &  analyDcs  plaaorm  will  spur   rapid  adopJon   c.  Need  to  apply  to  business  problems  with  revenue/profit  impact   20  
  20. 20. Elephants  in  the  Room  1.  Resources   a.  Talent  –  how  many  MapReduce  developers  can  we  get   b.  ApplicaDons  –  data,  analyJcs  and  business  problems  are  the  same,   why  should  each  insJtuJon  build  their  own  hadoop  applicaJons  to   run  the  same  processes   c.  EssenDals  –  how  to  manage  security,  provisioning,  &  performance    2.  ImplementaDon   a.  IntegraDon  –  fit  with  exisJng  technology  infrastructures  &  business   processes   b.  Support  –  experience  with  managing  thousands  of  nodes/  servers   c.  Open  Source  –  ‘out  of  the  box’  applicability   21  
  21. 21. How  Tresata  Changes  the  Game…  1.  Our  ApplicaDon   a.  FS  for  Hadoop  –  fully  built  on  hadoop.    Store,  process,  analyze  and   visualize    leveraging  the  full  power  of  hadoop   b.  Processing  Pipeline  –  automated  ingesJon,  cleaning,  de-­‐duping,   matching  engine  built  for  financial  data   c.  AnalyDcs  –  massively  parallel  scoring  and  algorithm  containers   codified  by  business  problem  2.  Tame  the  elephants  (making  it  work  at  scale)     22  
  22. 22. Tresata  Case  Study   A.  Client  Business  Problem   i.  Problem  –  Process  data  and  score  for  >30  MM  client  applicaJons   ii.  Data  Sources  –  23  separate  data  sources,  mulDple  Dme  series   iii.  Raw  Variables  –  100  variables  per  client  per  data  source   iv.  Current  state  -­‐  expensive  legacy  plaOorm,  algorithms  developed  on   sub-­‐samples,  unable  to  scale  algorithms  to  full  data  set   B.  Tresata  SoluDon   i.  Data  Engine  –  automated  data  import,  cleaning,  matching,  scoring   ii.  Compute  Engine  –  algorithms  process  &  score  >30MM  in  minutes   iii.  IntegraJon  –  work  with  exisJng  tools  and  processes   iv.  Scalable  deployment  –  Big  Data  as  a  Service  delivery     23  
  23. 23. Tresata  Big  Data  ApplicaDon   24  
  24. 24. Tresata  AnalyDcs   25  
  25. 25. Tresata  +  Hortonworks  1.  Commitment  to  train   a.  Series  of  webinars   b.  Hadoop  training  tailored  for  financial  services   c.  Dedicated  training  programs  (tailored  for  client  needs)  2.  Commitment  to  support   a.  ProducJon  scale  support  model  (Tresata  cerJfied  for  Financial  Svcs)   b.  Meet  and/or  exceed  industry  standards  on  distro     26  
  26. 26. Where  Hadoop  is  Going  ©  Hortonworks  Inc.  2011   ©  tresata      all  rights  reserved  
  27. 27. The  Future  •  Hadoop  is  going  mainstream   − Amazon,  Microsou,  Oracle,  EMC,  NetApp,  etc.  •  Apache  Hadoop  is  covering  new  ground  (Hadoop  .Next)   − Much  of  the  development  led  by  Hortonworks   − More  scale,  more  performance   − Other  paradigms  than  MapReduce  for  data  processing   − Enhanced  operability  and  management  (Ambari)   − Metadata  management  (Hcatalog)   28  
  28. 28. Technology  Roadmap      Hadoop.Now  –  Making  Apache  Hadoop  Accessible   Q4  2011  •  Release  the  most  widely  deployed  version  of  Hadoop  ever  (0.20.205)    •  Release  directly  usable  code  via  Apache  (RPMs,  .debs…)  •  Frequent  sustaining  releases  off  of  the  stable  branches      Hadoop.Next  –  Next  GeneraDon  Apache  Hadoop   2012   (Alphas  starJng  •  Address  key  product  gaps  (HBase  support,  HA,  Management…)    late  2011)  •  Enable  community  &  partner  innovaJon  via  modular  architecture  &   open  APIs  •  Work  with  community  to  define  integrated  stack   29  
  29. 29. Next  Steps  •  Engage  with  Hortonworks  &  Tresata   − www.hortonworks.com   − www.tresata.com  •  AddiJonal  Webcast  Series   − Reference  Architecture  for  Hadoop  in  Financial  Services  -­‐  Nov  15   − Hadoop  in  Financial  Services  DEEP  DIVE  -­‐  Dec  6   − www.hortonworks.com/webcasts  •  Hortonworks  party  @  Hadoop  World   − Tuesday  Nov  8  at  7pm   − Inc  Lounge  at  the  Time  Hotel   − RSVP:  hortonworks.eventbrite.com   30  
  30. 30. QuesDons?    ©  Hortonworks  Inc.  2011   ©  tresata      all  rights  reserved  

×