Big Data in small words

  • 460 views
Uploaded on

A crisp introduction to Big Data, what it is, how it can be used and more importantly what it is not.

A crisp introduction to Big Data, what it is, how it can be used and more importantly what it is not.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
460
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
4
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. ADELTECH  BIG  DATA  RELOADED    Big  Data  –  A  perspec8ve  
  • 2. What  is  it?  ¨  Data  that  exceeds  the  storing  and  processing  capacity  of  conven8onal   database  systems   Tradi&onal  paradigm   BigData  Paradigm   Structured  Data  (usually  tables)   Unstructured-­‐Semi  structured  Data   Rela8onal  DB   Hadoop/Cassandra/other  appropriate  system   Analysis  &  repor8ng   Models  &  Insights     Answers  -­‐  What  happened?  Why?   What  might  happen?  How  do  I  react/evolve?  ¨  Big  data  has  always  been  around  -­‐  what  changed  recently  to  reshape  the   ecosystem:   ¤  Inexpensive  commodity  hardware   ¤  MapReduce  paradigm  and  open-­‐source  soKware  -­‐  to  divide  complex  problems  into   small  chunks  which  can  be  run  on  this  hardware   ¤  Cloud  architecture  -­‐  Accessible  to  everyone   ©  Adeltech  2012  
  • 3. How  do  we  define  it?  ¨  3  terms  describe  how  "big  data"  differs  from  "data"  (it  usually  has  1  or  more  of   these  a[ributes   ¤  Volume  -­‐  Log  data  from  systems  (ERP,  CRM),  sensors  or  social  networks   ¤  Velocity  -­‐  Transac8ons  at  a  worldwide  financial  network   ¤  Variety  -­‐  A  pharma  company  analyzing  medical  records,  claims  data,  FDA  data,  etc   together   ©  Adeltech  2012  
  • 4. What  it  is  NOT?  ¨  IT  IS  NOT  a  magic  bullet  for  all  your  data  issues  ¨  IT  IS  NOT  a  million-­‐dollar  system-­‐wide  framework  -­‐  "Think  big,  start  small".  Though   most  soKware  vendors  like  IBM,  SAP  and  Google  have  their  own  expensive  big  data   offerings,  a  POC  can  be  started  using  simple  open-­‐source  solu8ons  ¨  IT  IS  NOT  a  new-­‐fangled  technology  needed  only  by  huge  corpora8ons  which  deal  with   TBs  and  PBs  of  data,  like  NASA,  Google  or  Facebook  -­‐  It  can  be  used  by  most   companies,  irrespec8ve  of  domain  or  market  size  to  gain  more  insights  from  exis8ng   data  (unknown  unknowns)  or  combine  more  sources  of  data  to  be  analyzed  together   (mul8ple  data  sets  might  throw  up  rela8onships  and  correla8ons  when  combined)  ¨  IT  IS  NOT  a  single  technology  that  needs  to  be  installed  on  expensive  customized   servers  -­‐  Its  a  paradigm  that  uses  innova8ve  algorithms  on  off-­‐the-­‐shelf  commodity   hardware  ¨  IT  IS  NOT  a  one-­‐size-­‐fits-­‐all  solu8on  for  large  amounts  of  data  -­‐  "Data  scien8sts"  need   to  look  at  use  cases  and  apply  domain  exper8se  to  figure  out  algorithms  and   technology  that  can  be  used  for  a  par8cular  problem   ©  Adeltech  2012  
  • 5. Typical  Applica8ons  ¨  Fraud  and  money  laundering  detec&on:  Since  the  worldwide  web  of  money  transfer   spans  across  geographies  and  involves  numerous  people,  banks  and  financial   intermediaries,  it  is  impossible  to  track  them  and  derive  insights  using  normal  tools  ¨  Marke&ng:  This  is  a  natural  use  case  of  big  data  technology  in  order  to  analyze  target   popula8ons  in  terms  of  gender,  geography,  socioeconomic  factors,  and  a  host  of  other   factors,  some  of  which  might  not  be  apparent  directly  ¨  Science  and  technology:  Research  has  become  extremely  data-­‐intensive  in  the  last  few   years.  The  LHC  at  CERN  produces  13  TB  of  data  everyday,  most  of  which  is  discarded   because  it  cant  be  analyzed  at  that  rate  by  exis8ng  technology.  Similarly,  NASAs   Hubble  telescope  and  other  terrestrial  based  radio-­‐telescopes  churn  out  data  faster   than  it  can  be  stored  or  processed.  Big  data  can  help  make  sense  of  all  this.  ¨  Service  industries,  like  airlines  and  mobile  telephony  -­‐  To  keep  track  of  consumer   behavior  and  derive  business  intelligence    so  that  marke8ng  dollars  can  be  focused  in   the  right  direc8on  ¨  Hiring:  The  hiring  boss  for  rank  and  file  jobs  is  now  an  algorithm  at  many  companies   like  Xerox  (using  Evolv),  IBM  (using  Kenexa)  and  Oracle,  etc.     ©  Adeltech  2012  
  • 6. The  Process  ¨  Drive-­‐train  approach  for  BigData  projects   ¤  Define  objec8ve  -­‐-­‐  in  concrete  terms   ¤  Iden8fy  data  sources  (levers)  -­‐-­‐  be  crea8ve  here   ¤  Collect  and  clean  data  -­‐-­‐  technology  play   ¤  Create  Models  (iterate)  -­‐-­‐  maths,  science  and  business  knowledge     ¤  Iterate  8ll  the  desired  result  is  achieved   **  Image  from  Big  Data  Now  –  2012  Strata  Conf.                    ©  Adeltech  2012  
  • 7. The  Engagement  Model  ¨  “Think  Big  –  Start  Small”  ¨  Start  with  a  POC     ¤  using  open  source  soKware  and  small  amounts  of   Data  (representa8ve  sampling  should  be  done   thoughhully)  ¨  Apply  algorithms  to  gain  insights  ¨  Scale  the  models  –  Test  and  Implement   ©  Adeltech  2012  
  • 8. THANKS  FOR  YOUR  TIME   Visit  www.adeltech.com  for  more  details