Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi

348 views

Published on

Introduction to Apache NiFi and MiNiFi with Raspberry Pi Temperature/Humidity Demo.

Published in: Software
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
348
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
26
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi

  1. 1. Taking  DataFlow  Management  to  the   Edge  with  Apache  NiFi/MiNiFi   Bryan  Bende  –  So>ware  Engineer  @Hortonworks   Future  of  Data  NY  –  December  5th  2016  
  2. 2. 2   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Agenda   Ã  Problem  DefiniHon   Ã  IntroducHon  to  Apache  NiFi   Ã  IntroducHon  to  Apache  MiNiFi   Ã  Demo!!   Ã  Q&A  
  3. 3. 3   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   About  Me   Ã  SoPware  Engineer  @  Hortonworks   Ã  Apache  NiFi  PMC  &  CommiTer   Ã  Working  with  NiFi  since  2011   Ã  Recent  focus  on  integraHons  with  Hadoop  ecosystem   Ã  bbende@hortonworks.com  /  TwiTer  @bbende  /  bryanbende.com   Ã  Bethpage  Class  of  2001!  
  4. 4. 4   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   The  Problem  
  5. 5. 5   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Team  2   It  starts  out  so  simple…   Hey!  We  have  some   important  data  to   send  you!     Cool!  Your  data  is   really  important  to   us!   Team  1   This  should  be  easy  right?...  
  6. 6. 6   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   But  what  about  formats  &  protocols?   Team  2   We  can  publish   Avro  records  to  a   Kaa  topic,  does   that  work?   Oh,  well  we  have   a  REST  service   that  accepts   JSON…   Team  1  
  7. 7. 7   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   And  what  about  security  &  authenKcaKon?   Team  2   Hmm  what  about   security?  We  can   authenHcate  via   Kerberos   Sorry,  we  only   support  2-­‐Way   TLS  with   cerHficates   Team  1  
  8. 8. 8   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   And  what  about  all  these  devices  at  the  edge?   We  also  need  to   grab  data  from  all   these  devices,  how   are  we  going  to  do   that?   Team  2  
  9. 9. 9   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   And  What  About…   Ã  OrganizaHonal  PoliHcs  (my  data)   Ã  BriTle  ConnecHvity   Ã  Firewalls/Security  Domains   Ã  Partnerships  bring  new  data  /  need   different  formats   Ã  Data  has  to  be  masked  for   compliance  purposes   Ã  Where  is  this  data  even  from?   Ã  Data  is  in  that  other  system  –  I  need   it  over  here     Ã  Bandwidth  between  those  sites  is   limited   Ã  My  Big  Data  system  needs  it  in  this   other  beTer/faster/stronger  format   Ã  What  schema  is  that  from?   Ã  It  needs  to  be  enriched  first!   Ã  No  not  that  reference  set  –  this  one!   Ã  I  didn’t  even  know  that  system   existed    
  10. 10. 10   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Ok  so  let’s  fix  this   •  Enterprise  Architecture  –  Standardize  on     •  …format   •  …a  schema  (one  that  can  evolve)   •  …a  protocol   •  …an  ontology   But  now…   •  Standard  schema  becomes  complex   •  Hard  to  agree  on  common  changes   •  Some  teams  stuck  on  older  versions   •  ProducHvity  starts  slowing…  
  11. 11. 11   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Something  to  ponder  –  the  disconnect  is  healthy   •  Having  Corporate  Standards  is  a  good  thing.   •  InnovaHon  is  a  good  thing.   Innova&on  o(en  does  not  follow  the  Corporate  Standard  
  12. 12. 12   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   What  is  Dataflow  Management?  
  13. 13. 13   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Dataflow  Management   The  systemaKc  process  by  which  data  is  acquired  from   all  producers  and  delivered  to  all  consumers    
  14. 14. 14   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Dataflow  Management  ConsideraKons   •  Promote  Loosely  Coupled  Systems   •  Types  of  coupling:  Format,  Schema,  Protocol,  Priority,  Size,  Interest,  …   •  Promote  Highly  Cohesive  Systems   •  Producers  should  focus  on  producHon  (not  the  intricacies  of  consumpHon)   •  Consumers  should  focus  on  storage  or  processing  (not  the  details  of  producHon)   •  Provide  Provenance   •  The  who/what/when/where/why  of  data   •  Inter  and  Intra  Process  Latency   •  Enable  enterprise  version  control  for  data  
  15. 15. 15   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Dataflow  Management  ConsideraKons   •  Empower  Understanding  and  InteracKon   •  Ability  to  see  the  flow,  safely  and  quickly  iterate  and  experiment   •  Breaking  producHon  is  bad  –  so  too  is  not  being  able  to  evolve  fast  enough   •  Secure   •  Bridge  between  security  domains   •  Data  Plane  (transport)   •  Control  Plane  (C&C,  Monitoring)   •  Self  Service   •  Centralized  teams  –  hard  to  scale  –  slow  turnaround  Hmes   •  Centralized  systems  –  mulH-­‐tenant  management  works  
  16. 16. 16   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   The  role  of  messaging  systems   •  Reduce  variables:  Fix  protocol,  Data  Size,  Provide  Buffering   •  Historically  not  very  fast  or  replayable:  Apache  Ka]a  solved  that   •  Strong  soluKon  within  a  controlled  domain   •  But  numerous  challenges  remain   •  Topics  do  not  separate  key  concerns  between  producer  and  consumer  pairs  such  as   §  AuthorizaHon   §  Format   §  Schema   §  Interest   §  PrioriHzaHon   •  Flow  control  
  17. 17. 17   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   IntroducKon  to  Apache  NiFi  
  18. 18. 18   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   The NSA Years •  Created in 2006 •  Improved over eight years •  Simple  IniHal  vision  –  Visio  for  real-­‐Hme  dataflow  management   •  Key Lessons Learned •  What  scale  means  –  down,  up,  and  out   •  The  fearsome  force  known  as  Compliance  Requirements   •  The  power  of  provenance!   •  OperaHonal  best-­‐pracHces  and  anH-­‐paTerns   •  NSA donated the codebase to the ASF in late 2014
  19. 19. 19   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   NiFi Key Features •  Guaranteed  delivery   •  Data  buffering     -  Backpressure   -  Pressure  release   •  PrioriKzed  queuing   •  Flow  specific  QoS   -  Latency  vs.  throughput   -  Loss  tolerance   •  Data  provenance   •  Recovery/recording     a  rolling  log  of  fine-­‐grained   history   •  Visual  command  and  control   •  Flow  templates   •  Pluggable/mulK-­‐role  security   •  Designed  for  extension   •  Clustering  
  20. 20. 20   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   NiFi Core Concepts FBP  Term   NiFi  Term   DescripKon   InformaHon   Packet   FlowFile   Each  object  moving  through  the  system.   Black  Box   FlowFile   Processor   Performs  the  work,  doing  some  combinaHon  of  data  rouHng,  transformaHon,   or  mediaHon  between  systems.   Bounded   Buffer   ConnecHon   The  linkage  between  processors,  acHng  as  queues  and  allowing  various   processes  to  interact  at  differing  rates.   Scheduler   Flow   Controller   Maintains  the  knowledge  of  how  processes  are  connected,  and  manages  the   threads  and  allocaHons  thereof  which  all  processes  use.   Subnet   Process   Group   A  set  of  processes  and  their  connecHons,  which  can  receive  and  send  data  via   ports.  A  process  group  allows  creaHon  of  enHrely  new  component  simply  by   composiHon  of  its  components.  
  21. 21. 21   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Visual  Command  &  Control   •  Drag  &  drop  processors  to  build  a  flow   •  Start,  stop,  &  configure  components  in   real-­‐Hme     •  View  errors  &  corresponding  messages   •  View  staHsHcs  &  health  of  the   dataflow   •  Create  shareable  templates  of   common  flows    
  22. 22. 22   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Provenance/Lineage   •  Tracks  data  at  each  point  as  it  flows   through  the  system   •  Records,  indexes,  and  makes  events   available  for  display   •  Handles  fan-­‐in/fan-­‐out,  i.e.  merging   and  splisng  data   •  View  aTributes  and  content  at  given   points  in  Hme  
  23. 23. 23   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   PrioriKzaKon   •  Configure  a  prioriHzer  per  connecHon   •  Determine  what  is  important  for  your   data  –  Hme  based,  arrival  order,   importance  of  a  data  set   •  Funnel  many  connecHons  down  to  a   single  connecHon  to  prioriHze  across   data  sets  
  24. 24. 24   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Back-­‐Pressure   •  Configure  back-­‐pressure  per   connecHon   •  Based  on  number  of  FlowFiles  or   total  size  of  FlowFiles   •  Upstream  processor  no  longer   scheduled  to  run  unHl  below   threshold  
  25. 25. 25   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Latency  vs.  Throughput   •  Choose  between  lower  latency,  or  higher  throughput  on  each  processor   •  Higher  throughput  allows  framework  to  batch  together  all  operaHons  for  the  selected   amount  of  Hme  for  improved  performance   •  Processor  developer  determines  whether  to  support  this  by  using  @SupportsBatching   annotaHon  
  26. 26. 26   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Security   Ã  Control  Plane   –  Pluggable  authenHcaHon   •  2-­‐Way  TLS/SSL,  LDAP,  Kerberos   –  Pluggable  authorizaHon  with  mulH-­‐tenancy   •  NiFi  Policy  Based  Authorizer   •  Apache  Ranger  Authorizer   –  Audit  trail  of  all  user  acHons   Ã  Data  Plane   –  OpHonal  2-­‐Way  TLS/SSL  between  cluster  nodes   –  OpHonal  2-­‐Way  TLS/SSL  on  Site-­‐To-­‐Site  connecHons  (NiFi-­‐to-­‐NiFi)   –  EncrypHon/DecrypHon  of  data  through  processors   –  Provenance  for  audit  trail  of  data  
  27. 27. 27   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Extensibility   Ã  Built  from  the  ground  up  with  extensions  in  mind   Ã  Service-­‐loader  paTern  for…   •  Processors   •  Controller  Services   •  ReporHng  Tasks   Ã  Extensions  packaged  as  NiFi  Archives  (NARs)   •  Deploy  NiFi  lib  directory  and  restart   •  Provides  ClassLoader  isolaHon   •  Same  model  as  standard  components  
  28. 28. 28   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Architecture  -­‐  Standalone   OS/Host   JVM   Flow  Controller   Web  Server   Processor  1   Extension  N   FlowFile   Repository   Content   Repository   Provenance   Repository   Local  Storage   Ã  FlowFile  Repository   –  Write  Ahead  Log     –  State  of  every  FlowFile   –  Pointers  to  content  repository   (pass-­‐by-­‐reference)   Ã  Content  Repository   –  FlowFile  content   –  Copy-­‐on-­‐write   Ã  Provenance  Repository   –  Write  Ahead  Log  +  Lucene  Indexes   –  Store  &  search  lineage  events  
  29. 29. 29   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   OS/Host   JVM   Flow  Controller   Web  Server   Processor  1   Extension  N   FlowFile   Repository   Content   Repository   Provenance   Repository   Local  Storage   OS/Host   JVM   Flow  Controller   Web  Server   Processor  1   Extension  N   FlowFile   Repository   Content   Repository   Provenance   Repository   Local  Storage   Architecture  -­‐  Cluster   OS/Host   JVM   Flow  Controller   Web  Server   Processor  1   Extension  N   FlowFile   Repository   Content   Repository   Provenance   Repository   Local  Storage   ZooKeeper   Ã  Same  dataflow  on  each  node,   data  parHHoned  across  cluster   Ã  Access  the  UI  from  any  node   Ã  ZooKeeper  for  auto-­‐elecHon  of   Cluster  Coordinator  &  Primary   Node     Ã  Cluster  Coordinator  receives   heartbeats  from  other  nodes,   manages  joining/  disconnecHng   Ã  Primary  Node  for  scheduling   processors  on  a  single  node  
  30. 30. 30   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Site-­‐To-­‐Site   Ã  Direct  communicaHon  between  two  NiFi  instances   Ã  Push  to  Input  Port  on  receiver,  or  Pull  from  Output  Port  on  source   Ã  Communicate  between  clusters,  standalone  instances,  or  both   Ã  Handles  load  balancing  and  reliable  delivery   Ã  Secure  connecHons  using  cerHficates  (opHonal)   Ã  Communicate  over  TCP  or  HTTP    
  31. 31. 31   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Site-­‐To-­‐Site  Push  Model   Ã  Source  connects  Remote  Process  Group  to  Input  Port  on  desHnaHon   Ã  Site-­‐To-­‐Site  takes  care  of  load  balancing  across  the  nodes  in  the  cluster   NiFi  Cluster  -­‐  Node  2   Input  Port   NiFi  Cluster  -­‐  Node  3   Input  Port   Standalone  NiFi   RPG   NiFi  Cluster  -­‐  Node  1   Input  Port  
  32. 32. 32   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Site-­‐To-­‐Site  Pull  Model   Ã  DesHnaHon  connects  Remote  Process  Group  to  Output  Port  on  the  source   Ã  If  source  was  a  cluster,  each  node  would  pull  from  each  node  in  cluster   NiFi  Cluster  -­‐  Node  2   RPG   NiFi  Cluster  -­‐  Node  3   RPG   Standalone  NiFi   Output  Port   NiFi  Cluster  -­‐  Node  1   RPG  
  33. 33. 33   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   IntroducKon  to  Apache  MiNiFi  
  34. 34. 34   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Apache  MiNiFi     Ã  Sub-­‐project  of  Apache  NiFi   Ã  Created  to  more  effecHvely  collect  data  at  the  edge   Ã  Smaller  footprint,  run  where  the  JVM  can’t   Ã  Design  &  Deploy  vs.  Command  &  Control  
  35. 35. 35   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  DistribuKons   Ã  Java   –  <40MB  binary  distribuHon   –  Requires  Java  1.8   –  More  feature  complete   –  Targeted  for  any  systems  that  can  run  a  JVM  (ie.  Servers,  Raspberry  Pi)   Ã  C++   –  600KB  code  size  and  staHc  data  ~50KB   –  Dynamic  heap  of  ~1MB  based  on  use-­‐case   –  Targeted  for  resource  constrained  environments  (ie.  edge  IoT  devices)     Ã  Both  use  same  config  format  and  use  NiFi  terminology   Different  focuses  depending  on  requirements  
  36. 36. 36   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  Java   NiFi  Framework   Components   MiNiFi   NiFi  Framework   User  Interface   Components   NiFi  
  37. 37. 37   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  Java     Ã  Uses  same  NAR  structure  as  NiFi   Ã  Use  any  NAR  from  NiFi  with  MiNiFi  Java   Ã  NiFi  standard  processors  are  bundled  by  default   –  TailLog   –  UpdateATribute   –  Route  on  content  and  aTributes   –  PutEmail   –  ….  
  38. 38. 38   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  C++     Ã  IniHal  set  of  processors     –  TailFile   –  GetFile   –  GenerateFlowFile   –  LogATribute   –  ListenSyslog   Ã  Site  to  Site  Client  implementaHon  in  C++  for  talking  to  NiFi  instances    
  39. 39. 39   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Design  &  Deploy   Same  approach  for  Java  &  C++…   1.  Design  a  flow  in  NiFi  UI   2.  Export  template  to  XML  file   3.  Run  MiNiFi  Toolkit  to  convert  NiFi  template  to  MiNiFi  YAML   4.  Deploy  config.yaml  to  MiNiFi  instances   IniHally  targeHng  flows  like…   1.  GetFile/TailFile   2.  RouHng  Decision   3.  Site-­‐To-­‐Site  Back  to  core  NiFi  
  40. 40. 40   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Simple  config.yml   Tail  a  rolling  file  -­‐>  Site  to  Site  
  41. 41. 41   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   MiNiFi  Command  and  Control   Ã  Design  Flow  at  a  centralized  place,  deploy  on  the  edge   Ã  Version  control  of  flows     –  Align  with  NiFi  SDLC  work   Ã  Agent  status  monitoring   Ã  Bi-­‐direcHonal  command  and  control   Currently  a  feature  proposal,  iniKal  version  being  architected   hTps://cwiki.apache.org/confluence/display/MINIFI/MiNiFi+Command+and+Control  
  42. 42. 42   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Demo!  
  43. 43. 43   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Demo  Scenario   Raspberry  Pi   MiNiFi  Java   Temp/Humidity   Sensor   NiFi   Raspberry  Pi   MiNiFi  Java   Temp/Humidity   Sensor   site-­‐to-­‐site   Solr   Banana  
  44. 44. 44   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   QuesKons?  
  45. 45. 45   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Learn  more  and  join  us!   Apache NiFi site http://nifi.apache.org Subproject MiNiFi site http://nifi.apache.org/minifi/ Subscribe to and collaborate at dev@nifi.apache.org users@nifi.apache.org Submit Ideas or Issues https://issues.apache.org/jira/browse/NIFI https://issues.apache.org/jira/browse/MINIFI Follow us on Twitter @apachenifi
  46. 46. 46   ©  Hortonworks  Inc.  2011  –  2016.  All  Rights  Reserved   Thank  you!  

×