FOSS4G	
  in	
  the	
  Cloud	
  
	
  
Mohamed	
  Sayed	
  
mohamed@fossworx.org	
  
Version	
  092013	
  
License:	
  CC-­...
Agenda	
  
•  Disclaimers	
  
•  Goals/MoLves	
  
•  The	
  historical	
  path	
  to	
  ‘Cloud	
  CompuLng’	
  
•  ‘DefiniL...
Disclaimers	
  
•  The	
  work	
  presented	
  was	
  funded	
  personally	
  
and	
  done	
  during	
  my	
  vacaLon.	
  ...
Goals/MoLves	
  
•  Goals	
  
– We	
  will	
  learn	
  or	
  validate	
  some	
  ideas.	
  
– Get	
  some	
  feedback	
  o...
Cloud Computing
Hardware Changes
Virtualization Mobile Computing
Path to Cloud Computing
MultiScreen
Tablets
KVM/
Xen
Sola...
Cloud	
  CompuLng	
  definiLon	
  
(IMHO)	
  
•  Cloud	
  compuLng	
  is	
  a	
  compuLng	
  paradigm	
  
composed	
  	
  o...
Compute	
   Storage	
  Network	
  
PrimiLves	
  
AbstracLons	
  
FoundaLon	
  
Image	
   Volumes	
   Snapshots	
   Autosca...
Example	
  “High	
  level”	
  Architecture	
  
OpenStack	
  
In	
  reality,	
  it	
  sorta	
  looks	
  like	
  this	
  
AWS	
  as	
  a	
  Public	
  Cloud	
  
FOSS4G	
  Use	
  Cases	
  
•  Disaster	
  Recover/Backup	
  
•  StaLc,	
  Logic-­‐free,	
  web	
  publishing	
  
•  Online...
Example	
  FOSS4G	
  AWS	
  Use	
  Case	
  
StaLc	
  publishing	
  blueprint	
  
How	
  to	
  Build	
  your	
  Cloud	
  Infrastructure	
  
Architectural	
  PaUerns	
  
•  The	
  Cookie	
  CuUer/Soloist.	
  
•  The	
  Centrist.	
  
•  The	
  Replicator.	
  
•  T...
CAP:	
  Cookie	
  CuUer	
  
The	
  Cookie	
  CuUer/Soloist	
  
•  Pros:	
  
– Simple.	
  
– Scales	
  Horizontally	
  w/load.	
  
– Localized	
  failu...
CAP	
  –	
  The	
  Centrist	
  
The	
  Centrist	
  
•  Pros:	
  
–  Scales	
  at	
  components	
  level.	
  
–  Moderate	
  complexity	
  up	
  to	
  midd...
CAP	
  –	
  The	
  Replicator	
  
The	
  Replicator	
  
•  Pros:	
  
– Scales	
  at	
  components	
  level.	
  
– Improved	
  read	
  performance.	
  
– BeU...
Masters	
  of	
  Colonies	
  
CAP	
  –	
  Master	
  of	
  Colonies	
  
•  Pros:	
  
– Improved	
  write	
  performance.	
  
– Decompose	
  large	
  data...
Cultural	
  Changes	
  
•  Get	
  stakeholders	
  buy-­‐in	
  early.	
  
•  Build	
  a	
  full	
  ownership	
  culture.	
 ...
Processes	
  Changes	
  
•  Somware	
  Architecture:	
  
–  Know	
  the	
  floor,	
  and	
  the	
  ceiling.	
  
–  Be	
  as...
Things	
  to	
  remember	
  
•  Review	
  any	
  legal	
  implicaLons.	
  
•  Use	
  the	
  cloud	
  primiLves.	
  
•  Pay...
FOSS4G	
  in	
  AWS	
  
Performance/Architecture	
  EvaluaLon	
  
•  Tools	
  used:	
  
– Siege	
  
– Sar	
  
– Oprofile	
 ...
OSM	
  Data	
  into	
  AWS	
  
•  Setup	
  1	
  
–  M1.Large	
  (	
  2	
  Cores)	
  
–  Standard	
  EBS	
  
–  EU-­‐West	
...
ImporLng	
  OSM	
  data	
  into	
  AWS	
  
TesLng	
  the	
  water	
  
ImporLng	
  OSM	
  data	
  into	
  AWS	
  
TesLng	
  the	
  water	
  some	
  more	
  
Enough	
  Water	
  TesLng	
  
ImporLng	
  Planet	
  to	
  SSD	
  
•  Guess	
  how	
  long	
  it	
  took	
  to	
  finish	
  
ImporLng	
  Planet	
  into	
  AWS	
  
Using	
  SSD	
  
•  It	
  only	
  took	
  35	
  hours!	
  
•  Disk	
  uLlizaLon:	
  ...
ImporLng	
  Planet	
  into	
  AWS	
  
•  I	
  made	
  a	
  copy	
  of	
  course	
  J	
  
•  Create	
  a	
  RAID	
  0	
  s...
ImporLng	
  Planet	
  into	
  AWS	
  
•  It	
  only	
  took	
  2.5	
  	
  hours.	
  
Data	
  Import	
  in	
  AWS	
  
OSM	
  full	
  planet	
  
Profiling	
  OSM2PGSQL	
  
•  Data	
  sets	
  used	
  
•  Links/Ways/nodes	
  of	
  each	
  set	
  
•  Time	
  
Data	
  import	
  notes	
  
•  Create	
  the	
  DB	
  on	
  SSD	
  and	
  clone	
  to	
  EBS:	
  
– Use	
  case:	
  quickl...
Data	
  Import	
  in	
  AWS	
  
Lessons	
  learned	
  
•  It	
  is	
  not	
  only	
  the	
  disk.	
  
•  Risk	
  on	
  mul...
Rendering	
  –	
  ModLle/mapnik	
  
•  Apache	
  module	
  +	
  a	
  unix	
  daemon.	
  
•  Apache	
  module	
  is	
  proc...
Renderd	
  Threads	
  Profiling	
  
Renderd	
  Profiling	
  
Renderd	
  Profiling	
  
Renderd	
  Profiling	
  
Renderd	
  Profiling	
  
Rendering	
  –	
  GeoServer/GWC	
  
•  Single	
  layer,	
  ZL	
  15,	
  RAM	
  Disk	
  :	
  100	
  Lles/sec	
  
•  TruncaL...
GWC/Geoserver	
  in	
  AWS	
  
Example	
  deployment	
  
Cost?	
  
•  Screenshot	
  of	
  my	
  account	
  acLvity	
  
Released	
  arLfacts	
  
Snapshots	
  of	
  OSM	
  data	
  in	
  flat	
  PGSQL	
  
•  2	
  drives	
  :	
  
–  snap-­‐f9affde...
Backlog	
  
•  Geocoding	
  tesLng	
  with	
  Twofish	
  and	
  
GISGraphy	
  
•  OSRM	
  profiling	
  
•  SuggesLons?	
  
Many	
  thanks	
  to	
  
•  Geofabrik	
  for	
  compiling	
  all	
  those	
  sets/formats.	
  
•  FOSS4G2013	
  for	
  thi...
Upcoming SlideShare
Loading in …5
×

FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

436 views

Published on

Using Open Source and Cloud Computing principles, these slides walk through the architectural patterns for building scalable cloud services. The second part of the presentation focuses on profiling common geolocation tasks like importing large datasets and rendering map tiles.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
436
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

FOSS4G In The Cloud: Using Open Source to build Cloud based Spatial Infrastructure

  1. 1. FOSS4G  in  the  Cloud     Mohamed  Sayed   mohamed@fossworx.org   Version  092013   License:  CC-­‐BY-­‐SA  
  2. 2. Agenda   •  Disclaimers   •  Goals/MoLves   •  The  historical  path  to  ‘Cloud  CompuLng’   •  ‘DefiniLon’  of  cloud  compuLng   •  FOSS4G  in  Cloud  Use  cases   •  AWS:  Components  and  Services   •  Building  for  the  cloud   –  Architectural  paUerns  for    Cloud  Services   –  Cultural  changes   –  Processes  changes   –  Things  to  remember   •  Common  FOSS4G  tasks  in  AWS   –  ImporLng  OSM  data  into  POSTGIS   –  Mod_Lle/Mapnik   –  GWC/Geoserver   •  QuesLons?  
  3. 3. Disclaimers   •  The  work  presented  was  funded  personally   and  done  during  my  vacaLon.  All  opinions  are   my  own  and  not  my  employer.   •  I  am  not  affiliated  with  AWS  in  any  other  way   than  being  a  customer,  I  choose  them  when   that  choice  makes  sense    and  would  use   others  where  applicable.   •  This  is  sLll  Work  in  progress.  YMMV  
  4. 4. Goals/MoLves   •  Goals   – We  will  learn  or  validate  some  ideas.   – Get  some  feedback  on  what  to  do  next.   – Help    save  someone  Lme/money/frustraLon   – Raise  awareness  about  some  risks.   •  MoLves   – The  new  disrupLon  is  in  data  and  services  around   it,  we(Open  Source  people)  should  not  miss  out   on  that  and  I  believe  I  can  help.  
  5. 5. Cloud Computing Hardware Changes Virtualization Mobile Computing Path to Cloud Computing MultiScreen Tablets KVM/ Xen Solaris Zones VMWare/ Parallels Storage/Network Virtualization I/O Offloading NPT/EPT Multicore Support Smart Phones
  6. 6. Cloud  CompuLng  definiLon   (IMHO)   •  Cloud  compuLng  is  a  compuLng  paradigm   composed    of  abstracLons  ,    a  set  of  primiLves   and  a  set  of  interfaces  and  tools  to  drive  those   abstracLons  and  primiLves.  The  abstracLons   and  primiLves  need  not  be  new  in  themselves,   but  their  combinaLon  and  impact  is  what   create  ‘The  Cloud’  culture.  
  7. 7. Compute   Storage  Network   PrimiLves   AbstracLons   FoundaLon   Image   Volumes   Snapshots   Autoscale   Tools   APIs   Config  Management  
  8. 8. Example  “High  level”  Architecture   OpenStack  
  9. 9. In  reality,  it  sorta  looks  like  this  
  10. 10. AWS  as  a  Public  Cloud  
  11. 11. FOSS4G  Use  Cases   •  Disaster  Recover/Backup   •  StaLc,  Logic-­‐free,  web  publishing   •  Online  FOSS4G  as  a  Service   •  Data  transformaLon  jobs   •  Content  CuraLon  and  Batch  processes  
  12. 12. Example  FOSS4G  AWS  Use  Case   StaLc  publishing  blueprint  
  13. 13. How  to  Build  your  Cloud  Infrastructure  
  14. 14. Architectural  PaUerns   •  The  Cookie  CuUer/Soloist.   •  The  Centrist.   •  The  Replicator.   •  The  Masters  of  Colonies.  
  15. 15. CAP:  Cookie  CuUer  
  16. 16. The  Cookie  CuUer/Soloist   •  Pros:   – Simple.   – Scales  Horizontally  w/load.   – Localized  failure  impact.   •  Cons:   – Poor  support  for  write-­‐oriented  services.   – Coarse  grained  scalability.   – Node  capacity  has  verLcal  scalability  issues.  
  17. 17. CAP  –  The  Centrist  
  18. 18. The  Centrist   •  Pros:   –  Scales  at  components  level.   –  Moderate  complexity  up  to  middle  range  load.   –  Faster/Easier  fault  isolaLon/detecLon.   –  Data  stores  Master/Slave  is  a  well  studied  concept.   •  Cons:   –  Central  data  store  becomes  more  criLcal/boUleneck.   –  MulL-­‐region  deployments  suffer  from  latency.   –  VerLcal  scaling  characterisLcs  pronounced  on  the   Data  store.  
  19. 19. CAP  –  The  Replicator  
  20. 20. The  Replicator   •  Pros:   – Scales  at  components  level.   – Improved  read  performance.   – BeUer  Disaster  Recovery.   – Well  suited  for  mulL  regions  deployments.   •  Cons:   – Writes  are  sLll  central.   – Added  complexity.   – Increased  bandwidth  requirements.  
  21. 21. Masters  of  Colonies  
  22. 22. CAP  –  Master  of  Colonies   •  Pros:   – Improved  write  performance.   – Decompose  large  data  sets  into  smaller  ones.   – Faster  data  iteraLons.   – Good  disaster  recovery  strategy.   •  Cons:   – Complex!   – Weak/Varying  support  by  various  data  stores.   – High  maintenance  overhead  
  23. 23. Cultural  Changes   •  Get  stakeholders  buy-­‐in  early.   •  Build  a  full  ownership  culture.   •  Adopt  an  agile  approach.   •  Encourage  prototyping  and  experimentaLon.   •  AutomaLon  as  a  way  of  life.  
  24. 24. Processes  Changes   •  Somware  Architecture:   –  Know  the  floor,  and  the  ceiling.   –  Be  as  stateless  as  possible.   –  Graceful  failure  response.   –  Good  Logging  as  a  way  of  life.   •  Release  Engineering   –  The  VM  as  an  arLfact   –  AutomaLon   –  Versioning   –  Snapshot   •  AutomaLon:   –  ConfiguraLon  management   –  OrchestraLon   –  Auto-­‐scaling  
  25. 25. Things  to  remember   •  Review  any  legal  implicaLons.   •  Use  the  cloud  primiLves.   •  Pay  aUenLon  to  security:  Security  groups,   Encrypted  data  at  rest,  etc.   •  Cleanup  old  stuff.   •  Things  fail:  don’t  fight  it,  just  handle  it.   •  You  will  not  get  it  right  the  first  Lme  but  things   should  look  good  on  3rd  iteraLon.(Read  the   mythical  man  month)  
  26. 26. FOSS4G  in  AWS   Performance/Architecture  EvaluaLon   •  Tools  used:   – Siege   – Sar   – Oprofile   – R/AWK/Python/Ruby   •  Postgresql  queries  log.   •  Test  client  -­‐>  Target  server  as  separate  nodes.  
  27. 27. OSM  Data  into  AWS   •  Setup  1   –  M1.Large  (  2  Cores)   –  Standard  EBS   –  EU-­‐West  region   •  Setup  2   –  M1.Large   –  Provisioned  EBS  :  8000  IOPS   –  EU-­‐West  region   •  Setup  3   –  Hi.4xlarge   –  SSD  drive   –  EU-­‐West  region   •  Setup  4   –  M2.2xlarge   –  EU-­‐West   –  Ephemeral  drives  
  28. 28. ImporLng  OSM  data  into  AWS   TesLng  the  water  
  29. 29. ImporLng  OSM  data  into  AWS   TesLng  the  water  some  more  
  30. 30. Enough  Water  TesLng   ImporLng  Planet  to  SSD   •  Guess  how  long  it  took  to  finish  
  31. 31. ImporLng  Planet  into  AWS   Using  SSD   •  It  only  took  35  hours!   •  Disk  uLlizaLon:  ~250Gb   •  Guess  what  was  the  first  thing  I  did  when  it   finished?  
  32. 32. ImporLng  Planet  into  AWS   •  I  made  a  copy  of  course  J   •  Create  a  RAID  0  set   •  Create  LVM  on  top  of  RAID  0   •  Kick  off  data  copy   •  Guess  how  long  it  took  
  33. 33. ImporLng  Planet  into  AWS   •  It  only  took  2.5    hours.  
  34. 34. Data  Import  in  AWS   OSM  full  planet  
  35. 35. Profiling  OSM2PGSQL   •  Data  sets  used   •  Links/Ways/nodes  of  each  set   •  Time  
  36. 36. Data  import  notes   •  Create  the  DB  on  SSD  and  clone  to  EBS:   – Use  case:  quickly  import  the  data  but  make  it   persistent.   – Full  planet  volume  takes  2-­‐2.5  hours.   •  Create  Provisioned  EBS  and  clone  to  SSD:   – Use  case:  Need  very  fast  runLme  access   – Full  planet  volume  takes  5.4  hours   •  Can  we  get  OSM  primiLves  summary  per   dump  and  full  planet  as  part  of  the  pbf?  
  37. 37. Data  Import  in  AWS   Lessons  learned   •  It  is  not  only  the  disk.   •  Risk  on  mulLple  levels   – Dev  teams  can’t  possibly  be  tesLng  to  their  full   potenLal(in  the  data  context).   – Evident  in  outdated/incorrect  documentaLon  for   bootstraping  
  38. 38. Rendering  –  ModLle/mapnik   •  Apache  module  +  a  unix  daemon.   •  Apache  module  is  process  model,  Renderd  is   mulLthreaded.   •  Apache  module  sends  a  command  to  renderd  over  a   unix  socket.   •  The  renderer  will  fetch  the  data  and  writes  it  out.   •  Non  cached  data  will:   –  Fail  on  first  aUempt(return  404)   –  Pass  on  second  aUempt(~600  msec)   •  Cached  data  is  served  <  10  msec   •  Very  SQL  chaUery  
  39. 39. Renderd  Threads  Profiling  
  40. 40. Renderd  Profiling  
  41. 41. Renderd  Profiling  
  42. 42. Renderd  Profiling  
  43. 43. Renderd  Profiling  
  44. 44. Rendering  –  GeoServer/GWC   •  Single  layer,  ZL  15,  RAM  Disk  :  100  Lles/sec   •  TruncaLon  is  very  slow.  Please  version  your   published  layers.   •  Standalone  GWC  offers  much  beUer  scalability   model   •  Possible  race  condiLons  in  threads  wriLng   Lles.   •  Didn’t  hit  the  getAlphaTile()  issue.    
  45. 45. GWC/Geoserver  in  AWS   Example  deployment  
  46. 46. Cost?   •  Screenshot  of  my  account  acLvity  
  47. 47. Released  arLfacts   Snapshots  of  OSM  data  in  flat  PGSQL   •  2  drives  :   –  snap-­‐f9affde6   –  snap-­‐ffaffde0   •  To  use:   –  Create  a  volume  based  on  the  snapshot   –  Mdadm  acLvate  (  raid0  ,  2  drives)   –  Pvscan,vgscan,vgchange,lvscan   –  Installing  mdadm  and  rebooLng  should  work  on  most   machines  to  do  this  for  you  automagically.   –  Mount  on  the  volume  on  your  PGDATA  path  
  48. 48. Backlog   •  Geocoding  tesLng  with  Twofish  and   GISGraphy   •  OSRM  profiling   •  SuggesLons?  
  49. 49. Many  thanks  to   •  Geofabrik  for  compiling  all  those  sets/formats.   •  FOSS4G2013  for  this  opportunity   •  And  THANK  YOU  

×