IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert Heusser

1,304 views

Published on

Presentation IS-4082 by Norbert Heusser at the AMD Developer Summit (APU13) November 11-13, 2013

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,304
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

IS-4082, Real-Time insight in Big Data – Even faster using HSA, by Norbert Heusser

  1. 1. REAL-­‐TIME  INSIGHT  IN  BIG  DATA   EVEN  FASTER  USING  HSA  
  2. 2. AGENDA   WHAT  ARE  BIG  DATA  AND  PARSTREAM   TECHNICAL  ARCHITECTURE   HSA  USAGE   2   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL  
  3. 3. What  are   Big  Data  and   ParStream  
  4. 4. What  is  Big  Data?   COMMON  SENSE  FROM  WIKIPEDIA   “Big  data  is  a  collecRon  of  data  sets  so  large  and  complex  that  it   becomes  difficult  to  process  using  on-­‐hand  database   management  tools  or  tradiBonal  data  processing  applicaRons.   The  challenges  include  capture,  curaRon,  storage,  search,  sharing,   analysis  and  visualizaRon.”     4   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL  
  5. 5. WHAT  BIG  DATA  IS  NOT    A  COMMON  MISTAKE   Big Data is NOT Storage of large datasets   5   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL  
  6. 6. REAL-TIME IN BIG DATA IS A TWO-DIMENSIONAL PROBLEM     Continuous extremely fast data load and availability Sub-second response times 6   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL  
  7. 7. ANALYTICS  LANDSCAPE   BIG  DATA  ANALYTICS  REQUIRES  NEW  TECHNOLOGICAL  SOLUTIONS   OperaBonal  Data   Big  Data   Stream-­‐AnalyBcs   Real-­‐Time   Real-­‐Time  AnalyBcs   Complex  Event     Processing   OperaBons   AnalyBcs   Massively  parallel  (MPP)     Real-­‐Time   1  sec   10  sec   Batch-­‐AnalyBcs   OLAP   1  min   OLTP     ReporBng   Lag  Time   <  1..10  milli  sec   10..100  milli  sec   ● ParStream In-­‐Memory  DB   Response  Rme   Gigabyte   7   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL   10  min   Map  Reduce  Batches   (NoSQL)   Terabyte   1h   Petabyte  
  8. 8. PARSTREAM  IS  A  UNIQUE  PRODUCT   PARSTREAM  EMPOWERS  CUSTOMERS  TO  REALIZE  NEW  BUSINESS  OPPORTUNITIES  EVOLVING  WITH  BIG  DATA       !  Analyze  and  Filter  Billions  of  Records   !  Query  Data  Structures  with  1000’s  of  columns     !  Get    Answers  in  Milliseconds  without  Cubes   !  Get    Answers  in  Milliseconds  without  Cubes   Column   Store   !  Execute  1000’s  of  Concurrent  Queries     High  Performance   Index   Scalability   In-­‐Memory   Technology   High-­‐Speed   Import   8   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL   Clustering   Clustering   Scalability   Real-­‐Rme   Queries  
  9. 9. Technical   Architecture  
  10. 10. ARCHITECTURE  BUILDING  BLOCKS   PARSTREAM  IS  THE  BIG  DATA  ANALYTICS  PLATFORM  BASED  ON  A  UNIQUE  HIGH  PERFORMANCE  COMPRESSED  INDEX   !  Columnar  Storage   !  In  Memory  Technology   !  Shared  Nothing  Architecture   !  Standard  Interfaces   SQL/JDBC/ODBC   C++  UDF  API   !  User  Defined  FuncRons   !  Unique  High  Performance   Compressed  Index                             In-­‐Memory  &   Disc  Technology   MPP   10   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL   Real-­‐Time  AnalyRcs  Engine   Compressed   Index   Shared   Nothing     Fast  Columnar   Storage   ParRRoning  
  11. 11. PARALLEL  ARCHITECTURE   PARSTREAM  OVERCOMES  LIMITATIONS  OF  TRADITIONAL  DW  ARCHITECTURES   Query   !  STANDARD  DW  ARCHITECTURE   ‒  Long  Query  RunRme   ‒  Frequent  Full  Table  Scans   ‒  Data  is  at  Least  1  Day  Old   Nightly  Batch  -­‐  Import     !  PARSTREAM  ARCHITECTURE   ‒  Each  Query  Uses  MulRple  Processor    Cores   ‒  Query  execuRon  using  compressed  indices   ‒  ConRnuous  Import  Assures  Timeliness  of  Data     11   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL   Query   HPCI   Parallel  Import  
  12. 12. TRADITIONAL  DATABASE  QUERY  EXECUTION   STATIC  QUERY  EXECUTION   OpRmizer/ Planner   Parser   SQL-­‐Statement   Parsed-­‐Statement   12   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL   Executor   ExecuRonPlan  
  13. 13. MODULAR  EXECUTION  TREE   ATOMIC  OPERATIONS  COMBINED  USING  QUEUES   ExecuBon  Tree   !  Parsed  query  descripRons  are  transformed   into  execuRon  trees   sort   !  OpRmizer  distributes  execuRon  operaRons  to   available  hardware   aggregate   !  Data-­‐locality  and  current  load  are  used  for   allocaRon   !  During  query  execuRon  opRmizer  can  re-­‐ allocate  if  beneficial   !  OpRmizer  conRnuously  refines  allocaRon   based  on  past  queries   aggregaRon   aggregaRon   aggregaRon   aggregaRon   filter   filter   filter   filter   calc   calc   calc   calc   fetch   fetch   fetch   fetch   !  Flow  based  execuRon  control   !  Each  ExecNode  processes  blocks  of  data   !  Data  transfer  between  nodes  using  queues   13   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL  
  14. 14. HSA  Usage  
  15. 15. ARCHITECUTRE  ALLOWS  USAGE  OF  DIFFERENT  PROCESSING  UNITS   ANY  PART  OF  THE  QUERY  MAY  BE  EXECUTED  INDIVIDUALLY   ExecuBon  Tree   !  Each  atomic  operaRon  may  be  processed  using   any  available  compute  resource   sort   !  Dynamic  workload  assignment  during  query   execuRon   aggregate   !  Overall  workload  management  ensures  opRmal   resource  usage   aggregaRon   aggregaRon   aggregaRon   filter   filter   filter   filter   calc   calc   calc   calc   fetch   15   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL   aggregaRon   fetch   fetch   fetch  
  16. 16. PROBLEMS  USING  TRADITIONAL  GPU  COMPUTE  UNITS   THE  TRANSFER  AND  COMMUNICATION  PROBLEM   !  Target  scenario  Real-­‐Time  BIG  DATA   aggregaRon   filter   ‒  Processing  huge  amounts  of  data   ‒  Dynamically  changing  of  data     ‒  InteracRve  response  Rme   !  Part  of  the  data  fixed  in  GPU  memory   ‒  Input  data  transferred  once  via  PCI  during  loading   ‒  Transfer  of  result  via  PCI  during  execuRon   calc   fetch   aggregaRon   filter   calc   !  Data  resident  in  main  memory   ‒  Offload  of  computaRonal  task  to  GPU   ‒  Transfer  in  and  out  via  PCI  during  execuRon   !  Global  data  needs  to  be  transferred  to  GPU  too   !  Global  data  needs  to  be  synchronized   !  Latency  based  on  blockwise  processing   !  Different  programming  models     16   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL   fetch  
  17. 17. HSA  SOLVES  ALL  OUR  PROBLEMS         !  No  Data  transfer  required   !  Shared  page  table  support   !  Coherent  memory  regions   !   User-­‐level  command  queueing   !  Hardware  scheduling   !  Bold  allows  uniform  programming  model   17   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL  
  18. 18. DISCLAIMER  &  ATTRIBUTION   The  informaRon  presented  in  this  document  is  for  informaRonal  purposes  only  and  may  contain  technical  inaccuracies,  omissions  and  typographical  errors.     The  informaRon  contained  herein  is  subject  to  change  and  may  be  rendered  inaccurate  for  many  reasons,  including  but  not  limited  to  product  and  roadmap   changes,  component  and  motherboard  version  changes,  new  model  and/or  product  releases,  product  differences  between  differing  manufacturers,  soqware   changes,  BIOS  flashes,  firmware  upgrades,  or  the  like.  AMD  assumes  no  obligaRon  to  update  or  otherwise  correct  or  revise  this  informaRon.  However,  AMD   reserves  the  right  to  revise  this  informaRon  and  to  make  changes  from  Rme  to  Rme  to  the  content  hereof  without  obligaRon  of  AMD  to  noRfy  any  person  of   such  revisions  or  changes.     AMD  MAKES  NO  REPRESENTATIONS  OR  WARRANTIES  WITH  RESPECT  TO  THE  CONTENTS  HEREOF  AND  ASSUMES  NO  RESPONSIBILITY  FOR  ANY   INACCURACIES,  ERRORS  OR  OMISSIONS  THAT  MAY  APPEAR  IN  THIS  INFORMATION.     AMD  SPECIFICALLY  DISCLAIMS  ANY  IMPLIED  WARRANTIES  OF  MERCHANTABILITY  OR  FITNESS  FOR  ANY  PARTICULAR  PURPOSE.  IN  NO  EVENT  WILL  AMD  BE   LIABLE  TO  ANY  PERSON  FOR  ANY  DIRECT,  INDIRECT,  SPECIAL  OR  OTHER  CONSEQUENTIAL  DAMAGES  ARISING  FROM  THE  USE  OF  ANY  INFORMATION   CONTAINED  HEREIN,  EVEN  IF  AMD  IS  EXPRESSLY  ADVISED  OF  THE  POSSIBILITY  OF  SUCH  DAMAGES.     ATTRIBUTION   ©  2013  Advanced  Micro  Devices,  Inc.  All  rights  reserved.  AMD,  the  AMD  Arrow  logo  and  combinaRons  thereof  are  trademarks  of  Advanced  Micro  Devices,   Inc.  in  the  United  States  and/or  other  jurisdicRons.    SPEC    is  a  registered  trademark  of  the  Standard  Performance  EvaluaRon  CorporaRon  (SPEC).  Other   names  are  for  informaRonal  purposes  only  and  may  be  trademarks  of  their  respecRve  owners.   18   |      REAL-­‐TIME  INSIGHT  IN  BIG  DATA|      November  19,  2013      |      CONFIDENTIAL  

×