Big datalittletests heintz

283 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
283
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Big datalittletests heintz

  1. 1. Big Data Little Tests John Heintz Founder, Gist Labs Technical Consultant, Cutter Consortium john@gistlabs.com @jheintz http://gistlabs.com
  2. 2. About John Heintz •  Developer since 1995 •  Agilist since 1999 •  Founded Gist Labs in 2008 •  Developer, Mentor, Consultant •  Intuitive, Abstract, Precise Kool-Aids I’ve drank: Agile/Lean/Kanban, OO, TDD, REST, Mentoring, Craftsmanship, Emergent/Progressive Design, InnovationGames®, Systems and Complexity Theory 2 © 2012 Gist Labs, LLC
  3. 3. My Goals for You •  Demystify test automation for Big Data •  Provide executable examples 3 © 2012 Gist Labs, LLC
  4. 4. What you shouldn’t expect… •  Barely introduce Big Data concepts •  No performance tuning 4 © 2012 Gist Labs, LLC
  5. 5. Simple Code, Config •  I went as simple and clear as possible •  Java, JUnit4 •  Maven… okay maybe not simple :- 5 © 2012 Gist Labs, LLC
  6. 6. Mostly Code •  Remember the Law of Two Feet •  If code isn’t what you were looking for I totally respect you finding something better for your time J 6 © 2012 Gist Labs, LLC
  7. 7. •  Everything available from http://gistlabs.com/2012/08/big-data-little-tests/ •  The entire command script is there… so you can take notes assuming that’s available 7 © 2012 Gist Labs, LLC
  8. 8. My Soapboxes… These are topics I’ll repeat myself on •  Fast test execution •  One-click build 8 © 2012 Gist Labs, LLC
  9. 9. Big Data •  Too much •  Too fast •  Not trivially structured 9 © 2012 Gist Labs, LLC
  10. 10. Map Reduce •  Map from one input to one output •  Reduce from many inputs to one output •  Can be run in parallel •  Crude, but massive 10 © 2012 Gist Labs, LLC
  11. 11. CAP Theorem •  Consistency •  Availability •  Partition Tolerance 11 © 2012 Gist Labs, LLC
  12. 12. Big Data Ecosystem •  Hadoop: A giant among giants (Tons of projects on this platform!!) •  Cassandra: Feels like a weird RDBMS •  Riak: An elegant key/value/search store •  MongoDB: Document store 12 © 2012 Gist Labs, LLC
  13. 13. Let’s Run Some Code 13 © 2012 Gist Labs, LLC
  14. 14. Hadoop Tests 14 © 2012 Gist Labs, LLC
  15. 15. Riak tests 15 © 2012 Gist Labs, LLC
  16. 16. Other Frameworks •  CassandraUnit https://github.com/jsevellec/cassandra-unit •  PigUnit, Hadoop Query Language http://pig.apache.org/docs/r0.8.1/pigunit.html 16 © 2012 Gist Labs, LLC
  17. 17. Code Questions? •  Fast test execution? •  One-click build? 17 © 2012 Gist Labs, LLC
  18. 18. What about Big Tests? •  Real test data •  Realistic cluster 18 © 2012 Gist Labs, LLC
  19. 19. Real Test Data My favorite strategy is to: •  Develop with small, crafted data •  Build/test the same way •  Run another test on top of real prod data 19 © 2012 Gist Labs, LLC
  20. 20. Production Continuous Integration Servers Continuous Deployment Servers Build Test1 Cluster Cluster Test2 Cluster Staging Developers Version Control Developers Virtual vs Physical Servers Private vs Public Cloud Developer Sandboxes Network Infrastructure Self-service Provisioning Storage Infrastructure20 © 2012 Gist Labs, LLC
  21. 21. Realistic Cluster •  Use a CI/DevOps environment •  Virtualize, “X as a Service” •  Virtual Machines •  Virtual Infrastructure (Network, Storage) 21 © 2012 Gist Labs, LLC
  22. 22. Jenkins CI Server •  Master/slave clusters •  Plugins for Hadoop and VMWare •  http://jenkins-ci.org/ 22 © 2012 Gist Labs, LLC
  23. 23. Big Questions? 23 © 2012 Gist Labs, LLC
  24. 24. Thank you! •  Everything available from: http://gistlabs.com/2012/08/big-data-little-tests/ •  John Heintz, @jheintz, http://gistlabs.com 24 © 2012 Gist Labs, LLC

×