0
Da
Michael Hausenblas, MapRTechnologies
Berlin Buzzwords 2013, Open StageTalk
Friday, 7 June 13
Nope. Not this one.
Friday, 7 June 13
Friday, 7 June 13
things you
can influence
things that
affect you
try and focus on this stuff
Friday, 7 June 13
The awkward moment when I open the data I got from a customer
Friday, 7 June 13
http://techcrunch.com/2012/11/25/the-big-data-fallacy-data-%E2%89%A0-information-%E2%89%A0-insights/
aka crap in, crap out...
Some examples …
Friday, 7 June 13
• Encöding hell
• Schema? Sure, I fax you a screenshot
• Dupes and other fakes
• Sampling
Friday, 7 June 13
Encöding hell
application-specific encodings
• URL encoding
• HTML encoding
• Database escaping
non-ASCII?
a%20percent-enc...
• Use Unicode
• Use Unicode
• Use Unicode
Encöding hell
http://www.swedishfika.com/2010/01/19/escaping-from-encoding-hell/...
• Encöding hell
• Schema? Sure, I fax you a screenshot
• Dupes and other fakes
• Sampling
Friday, 7 June 13
Schema? Sure, I fax you
a screenshot
Friday, 7 June 13
Schema? Sure, I fax you
a screenshot
• There is a need for proper, formal
documentation
• For humans and machines
• Basis ...
• Encöding hell
• Schema? Sure, I fax you a screenshot
• Dupes and other fakes
• Sampling
Friday, 7 June 13
Dupes and other fakes
Friday, 7 June 13
Dupes and other fakes
Friday, 7 June 13
Dupes and other fakes
• Use plots to get an overview
• Watch out for outliers
• Try to establish source for errors and fix
...
• Encöding hell
• Schema? Sure, I fax you a screenshot
• Dupes and other fakes
• Sampling
Friday, 7 June 13
• My data is too big. I can’t check it all.
• Why don’t you sample, then?
Sampling
Friday, 7 June 13
http://mortardata.com/
Friday, 7 June 13
Friday, 7 June 13
Goandbuythisbook.Now.
Friday, 7 June 13
Upcoming SlideShare
Loading in...5
×

Data Breaking Bad at Berlin Buzzwords

188

Published on

Talk given by Michael Hausenblas - Chief Data Engineer EMEA at MapR Technologies. Berlin Buzzwords 2013, Open Stage Talk

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
188
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Data Breaking Bad at Berlin Buzzwords"

  1. 1. Da Michael Hausenblas, MapRTechnologies Berlin Buzzwords 2013, Open StageTalk Friday, 7 June 13
  2. 2. Nope. Not this one. Friday, 7 June 13
  3. 3. Friday, 7 June 13
  4. 4. things you can influence things that affect you try and focus on this stuff Friday, 7 June 13
  5. 5. The awkward moment when I open the data I got from a customer Friday, 7 June 13
  6. 6. http://techcrunch.com/2012/11/25/the-big-data-fallacy-data-%E2%89%A0-information-%E2%89%A0-insights/ aka crap in, crap out Friday, 7 June 13
  7. 7. Some examples … Friday, 7 June 13
  8. 8. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  9. 9. Encöding hell application-specific encodings • URL encoding • HTML encoding • Database escaping non-ASCII? a%20percent-encoded%20string%20as%20of%20RFC%203986 a <strong>HTML</strong> encoded string Friday, 7 June 13
  10. 10. • Use Unicode • Use Unicode • Use Unicode Encöding hell http://www.swedishfika.com/2010/01/19/escaping-from-encoding-hell/ Friday, 7 June 13
  11. 11. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  12. 12. Schema? Sure, I fax you a screenshot Friday, 7 June 13
  13. 13. Schema? Sure, I fax you a screenshot • There is a need for proper, formal documentation • For humans and machines • Basis for validation—automate! Friday, 7 June 13
  14. 14. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  15. 15. Dupes and other fakes Friday, 7 June 13
  16. 16. Dupes and other fakes Friday, 7 June 13
  17. 17. Dupes and other fakes • Use plots to get an overview • Watch out for outliers • Try to establish source for errors and fix • Document (in any case) Friday, 7 June 13
  18. 18. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  19. 19. • My data is too big. I can’t check it all. • Why don’t you sample, then? Sampling Friday, 7 June 13
  20. 20. http://mortardata.com/ Friday, 7 June 13
  21. 21. Friday, 7 June 13
  22. 22. Goandbuythisbook.Now. Friday, 7 June 13
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×