So you’re a big data and distributed systems “expert”, you’ve collected 500 billion data points, thrown it into sci-lib-of-the-week, you’re using Hadoop, backing onto those cool AWS GPU instances, let it grind away for days and its spit out the answer to life the universe and everything. But is it really better than a coin toss?
How do you validate whether your data analysis algorithm works? Are you learning a solution to your problems or just the data you already have? What problems can you encounter when analysing your data? How do you solve them, and what can you do easily under the time pressures of a business environment?