Data Breaking Bad at Berlin Buzzwords
Upcoming SlideShare
Loading in...5
×
 

Data Breaking Bad at Berlin Buzzwords

on

  • 425 views

Talk given by Michael Hausenblas - Chief Data Engineer EMEA at MapR Technologies. Berlin Buzzwords 2013, Open Stage Talk

Talk given by Michael Hausenblas - Chief Data Engineer EMEA at MapR Technologies. Berlin Buzzwords 2013, Open Stage Talk

Statistics

Views

Total Views
425
Views on SlideShare
425
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data Breaking Bad at Berlin Buzzwords Data Breaking Bad at Berlin Buzzwords Presentation Transcript

    • Da Michael Hausenblas, MapRTechnologies Berlin Buzzwords 2013, Open StageTalk Friday, 7 June 13
    • Nope. Not this one. Friday, 7 June 13
    • Friday, 7 June 13
    • things you can influence things that affect you try and focus on this stuff Friday, 7 June 13
    • The awkward moment when I open the data I got from a customer Friday, 7 June 13
    • http://techcrunch.com/2012/11/25/the-big-data-fallacy-data-%E2%89%A0-information-%E2%89%A0-insights/ aka crap in, crap out Friday, 7 June 13
    • Some examples … Friday, 7 June 13
    • • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
    • Encöding hell application-specific encodings • URL encoding • HTML encoding • Database escaping non-ASCII? a%20percent-encoded%20string%20as%20of%20RFC%203986 a <strong>HTML</strong> encoded string Friday, 7 June 13
    • • Use Unicode • Use Unicode • Use Unicode Encöding hell http://www.swedishfika.com/2010/01/19/escaping-from-encoding-hell/ Friday, 7 June 13
    • • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
    • Schema? Sure, I fax you a screenshot Friday, 7 June 13
    • Schema? Sure, I fax you a screenshot • There is a need for proper, formal documentation • For humans and machines • Basis for validation—automate! Friday, 7 June 13
    • • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
    • Dupes and other fakes Friday, 7 June 13
    • Dupes and other fakes Friday, 7 June 13
    • Dupes and other fakes • Use plots to get an overview • Watch out for outliers • Try to establish source for errors and fix • Document (in any case) Friday, 7 June 13
    • • Encöding hell • Schema? Sure, I fax you a screenshot • Dupes and other fakes • Sampling Friday, 7 June 13
    • • My data is too big. I can’t check it all. • Why don’t you sample, then? Sampling Friday, 7 June 13
    • http://mortardata.com/ Friday, 7 June 13
    • Friday, 7 June 13
    • Goandbuythisbook.Now. Friday, 7 June 13