Data Breaking Bad at Berlin Buzzwords

  • 163 views
Uploaded on

Talk given by Michael Hausenblas - Chief Data Engineer EMEA at MapR Technologies. Berlin Buzzwords 2013, Open Stage Talk

Talk given by Michael Hausenblas - Chief Data Engineer EMEA at MapR Technologies. Berlin Buzzwords 2013, Open Stage Talk

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
163
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Da Michael Hausenblas, MapRTechnologies Berlin Buzzwords 2013, Open StageTalk Friday, 7 June 13
  • 2. Nope. Not this one. Friday, 7 June 13
  • 3. Friday, 7 June 13
  • 4. things you can influence things that affect you try and focus on this stuff Friday, 7 June 13
  • 5. The awkward moment when I open the data I got from a customer Friday, 7 June 13
  • 6. http://techcrunch.com/2012/11/25/the-big-data-fallacy-data-%E2%89%A0-information-%E2%89%A0-insights/ aka crap in, crap out Friday, 7 June 13
  • 7. Some examples … Friday, 7 June 13
  • 8. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  • 9. Encöding hell application-specific encodings • URL encoding • HTML encoding • Database escaping non-ASCII? a%20percent-encoded%20string%20as%20of%20RFC%203986 a <strong>HTML</strong> encoded string Friday, 7 June 13
  • 10. • Use Unicode • Use Unicode • Use Unicode Encöding hell http://www.swedishfika.com/2010/01/19/escaping-from-encoding-hell/ Friday, 7 June 13
  • 11. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  • 12. Schema? Sure, I fax you a screenshot Friday, 7 June 13
  • 13. Schema? Sure, I fax you a screenshot • There is a need for proper, formal documentation • For humans and machines • Basis for validation—automate! Friday, 7 June 13
  • 14. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  • 15. Dupes and other fakes Friday, 7 June 13
  • 16. Dupes and other fakes Friday, 7 June 13
  • 17. Dupes and other fakes • Use plots to get an overview • Watch out for outliers • Try to establish source for errors and fix • Document (in any case) Friday, 7 June 13
  • 18. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  • 19. • My data is too big. I can’t check it all. • Why don’t you sample, then? Sampling Friday, 7 June 13
  • 20. http://mortardata.com/ Friday, 7 June 13
  • 21. Friday, 7 June 13
  • 22. Goandbuythisbook.Now. Friday, 7 June 13