Your SlideShare is downloading. ×
Data Breaking Bad at Berlin Buzzwords
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Breaking Bad at Berlin Buzzwords

174

Published on

Talk given by Michael Hausenblas - Chief Data Engineer EMEA at MapR Technologies. Berlin Buzzwords 2013, Open Stage Talk

Talk given by Michael Hausenblas - Chief Data Engineer EMEA at MapR Technologies. Berlin Buzzwords 2013, Open Stage Talk

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
174
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Da Michael Hausenblas, MapRTechnologies Berlin Buzzwords 2013, Open StageTalk Friday, 7 June 13
  • 2. Nope. Not this one. Friday, 7 June 13
  • 3. Friday, 7 June 13
  • 4. things you can influence things that affect you try and focus on this stuff Friday, 7 June 13
  • 5. The awkward moment when I open the data I got from a customer Friday, 7 June 13
  • 6. http://techcrunch.com/2012/11/25/the-big-data-fallacy-data-%E2%89%A0-information-%E2%89%A0-insights/ aka crap in, crap out Friday, 7 June 13
  • 7. Some examples … Friday, 7 June 13
  • 8. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  • 9. Encöding hell application-specific encodings • URL encoding • HTML encoding • Database escaping non-ASCII? a%20percent-encoded%20string%20as%20of%20RFC%203986 a <strong>HTML</strong> encoded string Friday, 7 June 13
  • 10. • Use Unicode • Use Unicode • Use Unicode Encöding hell http://www.swedishfika.com/2010/01/19/escaping-from-encoding-hell/ Friday, 7 June 13
  • 11. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  • 12. Schema? Sure, I fax you a screenshot Friday, 7 June 13
  • 13. Schema? Sure, I fax you a screenshot • There is a need for proper, formal documentation • For humans and machines • Basis for validation—automate! Friday, 7 June 13
  • 14. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  • 15. Dupes and other fakes Friday, 7 June 13
  • 16. Dupes and other fakes Friday, 7 June 13
  • 17. Dupes and other fakes • Use plots to get an overview • Watch out for outliers • Try to establish source for errors and fix • Document (in any case) Friday, 7 June 13
  • 18. • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
  • 19. • My data is too big. I can’t check it all. • Why don’t you sample, then? Sampling Friday, 7 June 13
  • 20. http://mortardata.com/ Friday, 7 June 13
  • 21. Friday, 7 June 13
  • 22. Goandbuythisbook.Now. Friday, 7 June 13

×