Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

Beam Me Up: Voyaging into Big Data Slide 1 Beam Me Up: Voyaging into Big Data Slide 2 Beam Me Up: Voyaging into Big Data Slide 3 Beam Me Up: Voyaging into Big Data Slide 4 Beam Me Up: Voyaging into Big Data Slide 5 Beam Me Up: Voyaging into Big Data Slide 6 Beam Me Up: Voyaging into Big Data Slide 7 Beam Me Up: Voyaging into Big Data Slide 8 Beam Me Up: Voyaging into Big Data Slide 9 Beam Me Up: Voyaging into Big Data Slide 10 Beam Me Up: Voyaging into Big Data Slide 11 Beam Me Up: Voyaging into Big Data Slide 12 Beam Me Up: Voyaging into Big Data Slide 13 Beam Me Up: Voyaging into Big Data Slide 14 Beam Me Up: Voyaging into Big Data Slide 15 Beam Me Up: Voyaging into Big Data Slide 16 Beam Me Up: Voyaging into Big Data Slide 17 Beam Me Up: Voyaging into Big Data Slide 18 Beam Me Up: Voyaging into Big Data Slide 19 Beam Me Up: Voyaging into Big Data Slide 20 Beam Me Up: Voyaging into Big Data Slide 21 Beam Me Up: Voyaging into Big Data Slide 22 Beam Me Up: Voyaging into Big Data Slide 23 Beam Me Up: Voyaging into Big Data Slide 24 Beam Me Up: Voyaging into Big Data Slide 25 Beam Me Up: Voyaging into Big Data Slide 26 Beam Me Up: Voyaging into Big Data Slide 27 Beam Me Up: Voyaging into Big Data Slide 28 Beam Me Up: Voyaging into Big Data Slide 29 Beam Me Up: Voyaging into Big Data Slide 30 Beam Me Up: Voyaging into Big Data Slide 31 Beam Me Up: Voyaging into Big Data Slide 32 Beam Me Up: Voyaging into Big Data Slide 33 Beam Me Up: Voyaging into Big Data Slide 34 Beam Me Up: Voyaging into Big Data Slide 35 Beam Me Up: Voyaging into Big Data Slide 36 Beam Me Up: Voyaging into Big Data Slide 37 Beam Me Up: Voyaging into Big Data Slide 38 Beam Me Up: Voyaging into Big Data Slide 39 Beam Me Up: Voyaging into Big Data Slide 40 Beam Me Up: Voyaging into Big Data Slide 41 Beam Me Up: Voyaging into Big Data Slide 42 Beam Me Up: Voyaging into Big Data Slide 43 Beam Me Up: Voyaging into Big Data Slide 44 Beam Me Up: Voyaging into Big Data Slide 45 Beam Me Up: Voyaging into Big Data Slide 46 Beam Me Up: Voyaging into Big Data Slide 47 Beam Me Up: Voyaging into Big Data Slide 48 Beam Me Up: Voyaging into Big Data Slide 49 Beam Me Up: Voyaging into Big Data Slide 50 Beam Me Up: Voyaging into Big Data Slide 51 Beam Me Up: Voyaging into Big Data Slide 52 Beam Me Up: Voyaging into Big Data Slide 53 Beam Me Up: Voyaging into Big Data Slide 54 Beam Me Up: Voyaging into Big Data Slide 55 Beam Me Up: Voyaging into Big Data Slide 56 Beam Me Up: Voyaging into Big Data Slide 57 Beam Me Up: Voyaging into Big Data Slide 58 Beam Me Up: Voyaging into Big Data Slide 59 Beam Me Up: Voyaging into Big Data Slide 60 Beam Me Up: Voyaging into Big Data Slide 61 Beam Me Up: Voyaging into Big Data Slide 62 Beam Me Up: Voyaging into Big Data Slide 63 Beam Me Up: Voyaging into Big Data Slide 64 Beam Me Up: Voyaging into Big Data Slide 65 Beam Me Up: Voyaging into Big Data Slide 66 Beam Me Up: Voyaging into Big Data Slide 67 Beam Me Up: Voyaging into Big Data Slide 68 Beam Me Up: Voyaging into Big Data Slide 69 Beam Me Up: Voyaging into Big Data Slide 70 Beam Me Up: Voyaging into Big Data Slide 71 Beam Me Up: Voyaging into Big Data Slide 72 Beam Me Up: Voyaging into Big Data Slide 73 Beam Me Up: Voyaging into Big Data Slide 74 Beam Me Up: Voyaging into Big Data Slide 75 Beam Me Up: Voyaging into Big Data Slide 76 Beam Me Up: Voyaging into Big Data Slide 77 Beam Me Up: Voyaging into Big Data Slide 78 Beam Me Up: Voyaging into Big Data Slide 79 Beam Me Up: Voyaging into Big Data Slide 80 Beam Me Up: Voyaging into Big Data Slide 81 Beam Me Up: Voyaging into Big Data Slide 82 Beam Me Up: Voyaging into Big Data Slide 83 Beam Me Up: Voyaging into Big Data Slide 84 Beam Me Up: Voyaging into Big Data Slide 85 Beam Me Up: Voyaging into Big Data Slide 86 Beam Me Up: Voyaging into Big Data Slide 87 Beam Me Up: Voyaging into Big Data Slide 88 Beam Me Up: Voyaging into Big Data Slide 89 Beam Me Up: Voyaging into Big Data Slide 90
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Beam Me Up: Voyaging into Big Data

Download to read offline

More engineering organizations than ever are dealing with big data. The long times required to process big datasets slow down development cycles and delay analysis. Apache Beam pipelines distribute processing across many workers, reducing the time it takes to transform large datasets. Creating an effective Beam pipeline requires following best practices and using the specialized data structures Beam introduces. In this talk, I’ll share strategies and lessons learned from scaling Apache Beam pipelines to handle ever-increasing workloads.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Beam Me Up: Voyaging into Big Data

  1. 1. Beam Me Up: Voyaging Into Big Data Michele Titolo Senior Software Engineer, Square @micheletitolo
  2. 2. @micheletitolo Big Data Is All Around Us
  3. 3. @micheletitolo Apache Beam Is One Of The Available Tools
  4. 4. @micheletitolo
  5. 5. @micheletitolo Benefits At Square
  6. 6. @micheletitolo 24x Faster 10x Uploads <$100 Backfill
  7. 7. @micheletitolo ‣ What is Beam ‣ How to build a pipeline ‣ Tips and Gotchas What We Will Cover
  8. 8. What Is Beam
  9. 9. @micheletitolo Abstraction
  10. 10. @micheletitolo Built For Parallelism
  11. 11. @micheletitolo Time 1 2 3
  12. 12. Time 1 2 3
  13. 13. @micheletitolo Highly Scalable
  14. 14. @micheletitolo Sits On Top Of Or Adjacent To Other Tools
  15. 15. @micheletitolo Portable
  16. 16. BIG Not Just for Data
  17. 17. Building Beam Pipelines
  18. 18. @micheletitolo Runners Pipeline Code
  19. 19. @micheletitolo Runners Pipeline Code
  20. 20. @micheletitolo Executor. Don’t Build Yourself
  21. 21. @micheletitolo
  22. 22. @micheletitolo
  23. 23. @micheletitolo https://beam.apache.org/documentation/runners/capability-matrix/
  24. 24. @micheletitolo 1 2 3
  25. 25. @micheletitolo 1 2 3 0 0> 1
  26. 26. @micheletitolo 1 2 3 > 1
  27. 27. @micheletitolo 1 2 3 > 1
  28. 28. @micheletitolo Deployment https://beam.apache.org/documentation/runners/capability-matrix/
  29. 29. @micheletitolo Runners Pipeline Code
  30. 30. @micheletitolo Pipelines Are Defined Solely In Code
  31. 31. @micheletitolo Java, Python, Go
  32. 32. @micheletitolo No Explicit Dependency Graph
  33. 33. @micheletitolo Create A Pipeline Object
  34. 34. @micheletitolo Initial Data https://beam.apache.org/documentation/io/built-in/
  35. 35. @micheletitolo Use A Small Dataset To Test
  36. 36. @micheletitolo Run And Test Locally
  37. 37. @micheletitolo Collections and Transformations
  38. 38. @micheletitolo PCollection DoFn & PTransform
  39. 39. @micheletitolo PCollections Are Kind Of Like Arrays
  40. 40. @micheletitolo Must Be Uniform
  41. 41. @micheletitolo Transformations Applied To Entire PCollection
  42. 42. @micheletitolo Contents
  43. 43. @micheletitolo Inputs And Outputs Must Serialize To Disk
  44. 44. @micheletitolo KV: one key hash
  45. 45. @micheletitolo Composite Objects
  46. 46. @micheletitolo GroupByKey :{ }
  47. 47. @micheletitolo CoGroupByKey : { }{ } A B : :
  48. 48. @micheletitolo DoFn & PTransform
  49. 49. @micheletitolo Most Of The Code Is In These
  50. 50. @micheletitolo DoFn
  51. 51. @micheletitolo Process PCollection 1 Element at a Time
  52. 52. @micheletitolo PTransform
  53. 53. @micheletitolo Single Input And Output Type
  54. 54. @micheletitolo Side Inputs
  55. 55. @micheletitolo Built In Transformations
  56. 56. @micheletitolo Flatten, Combine, Partition
  57. 57. @micheletitolo Statistics: Count, Mean, Max Etc
  58. 58. @micheletitolo Metrics
  59. 59. @micheletitolo Outputs
  60. 60. @micheletitolo 1 2 3 PCollection DoFn PCollection PCollection PCollection Pipeline https://beam.apache.org/get-started/wordcount-example/ DoFn DoFn
  61. 61. Tips And Gotchas
  62. 62. @micheletitolo Input
  63. 63. @micheletitolo Input worker 1 worker 2
  64. 64. @micheletitolo Input workerworker workerworkerworker workerworkerworker worker
  65. 65. @micheletitolo Input worker
  66. 66. @micheletitolo Keep Transformations Small And Simple
  67. 67. @micheletitolo A B C3 B C2 Time A B C1 A
  68. 68. @micheletitolo 3 B C2 Time A B C1 A RESHUFFLE D E F
  69. 69. @micheletitolo Smaller -> Resilient
  70. 70. @micheletitolo Input
  71. 71. @micheletitolo Input
  72. 72. @micheletitolo Something WILL PROBABLY GO WRONG
  73. 73. 1 2 3
  74. 74. 1 2 3
  75. 75. 1 2
  76. 76. @micheletitolo No Dead Letter Queue
  77. 77. @micheletitolo : : { }{ } :
  78. 78. @micheletitolo : { } : { }: { } : { } : { } Partition : { }{ }: : { }{ }: : { }{ }: : { }{ }: : { }{ }:
  79. 79. @micheletitolo Idempotency
  80. 80. @micheletitolo Intermediate State Goes Away After Finish
  81. 81. @micheletitolo
  82. 82. @micheletitolo Api Ratelimits
  83. 83. @micheletitolo Multiple Of The Same Pipeline Can Be Running
  84. 84. In Summary
  85. 85. @micheletitolo Beam Is A General Purpose Tool
  86. 86. @micheletitolo Adaptable To Many Scenarios
  87. 87. @micheletitolo Easy To Get Started
  88. 88. @micheletitolo Significantly Improved Some ETLs
  89. 89. @micheletitolo Questions?
  90. 90. • https://unsplash.com/photos/MShiKyjGhck • https://unsplash.com/photos/DByY8MbE9OE • https://unsplash.com/photos/fR47SivxkSM • https://unsplash.com/photos/m3TYLFI_mDo Photo Credits

More engineering organizations than ever are dealing with big data. The long times required to process big datasets slow down development cycles and delay analysis. Apache Beam pipelines distribute processing across many workers, reducing the time it takes to transform large datasets. Creating an effective Beam pipeline requires following best practices and using the specialized data structures Beam introduces. In this talk, I’ll share strategies and lessons learned from scaling Apache Beam pipelines to handle ever-increasing workloads.

Views

Total views

324

On Slideshare

0

From embeds

0

Number of embeds

6

Actions

Downloads

1

Shares

0

Comments

0

Likes

0

×