Your SlideShare is downloading. ×
0
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Hive London Meetup: Interactive (functional) programming with Apache Hive and F#
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Hive London Meetup: Interactive (functional) programming with Apache Hive and F#

407

Published on

Talk on 31.10.2013 by Matthew Moloney the founder of Tsunami (tsunami.io). Matthew previously worked on Big Data tooling at eBay and Microsoft and is particularly interested in Functional Programming …

Talk on 31.10.2013 by Matthew Moloney the founder of Tsunami (tsunami.io). Matthew previously worked on Big Data tooling at eBay and Microsoft and is particularly interested in Functional Programming and Machine Learning.

Abstract: Many common mistakes in Hive programming are preventable and waste both user time and cluster time. Matt will present an interface that not only prevents these mistakes but is able to give you helpful hints while your typing.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
407
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Big Data Scripting Social @tsunamiide tsunami.io Earthquake Enterprises
  • 2.  Founder of Lift Analytics  (F# IDE)  Big Data / Machine Learning Applied Researcher  Business Intelligence  Process Engineering Social @tsunamiide tsunami.io Earthquake Enterprises
  • 3.  Two main tasks on the Big Data Pipeline are Getting More Data and Finding More Factors Get More Data Social @tsunamiide Find More Factors tsunami.io Machine Learning A/B Testing Earthquake Enterprises
  • 4.  Data Science is an inherently exploratory pursuit  Over 50% of the queries you write will only be executed once  Very few queries are worth saving  Exploration involves writing a whole lot of new code and then throwing most of it away  There are no QA teams and no test suites  Many more new opportunities to make mistakes Social @tsunamiide tsunami.io Earthquake Enterprises
  • 5.  Most queries are exploratory in nature  The decision on which query to run next often depends on the result of the previous query  Queries are often very short (e.g. 5 minutes)  Queues are often very long (e.g. 2 hours)  Mistakes waste a lot of your time Social @tsunamiide tsunami.io Earthquake Enterprises
  • 6.  Clusters are a shared resource  A simple mistake may kill a big job after 12 hours of cluster processing time  Mistakes waste everyone's time Social @tsunamiide tsunami.io Earthquake Enterprises
  • 7.  Democratized write access to the cluster  Meta-data is often kept as tribal knowledge, team wikis, and files sent in emails Social @tsunamiide tsunami.io Earthquake Enterprises
  • 8.  Users will start to share datasets amongst themselves without any formal agreements or dependency management  Other people can now break your code Social @tsunamiide tsunami.io Earthquake Enterprises
  • 9.  There are many more chances for making mistakes, they have become easier to make and they are far more costly.  Most of the time they are not even your fault and there is nothing you could have reasonably done to prevent them. Social @tsunamiide tsunami.io Earthquake Enterprises
  • 10. Social @tsunamiide tsunami.io Earthquake Enterprises

×