This document discusses Apache Drill, an open source SQL query engine for analysis of large scale datasets across various data sources like Hadoop, HBase etc. It provides interactive queries on large datasets with low latency. The document explains Drill's architecture, data model, query processing, extensibility features and how it integrates with Hadoop ecosystem. It encourages readers to get involved with the Apache Drill community.
No graphic changes….Note for Bullet changes:Open Source-- Community consensusAPIAvailable for all Distributions--
Likely to support theseCould add HiveQL and more as well. Could even be clever and support HiveQL to MR or Drill based upon queryPig as wellPluggabilityData formatQuery languageSomething 6-9 months alpha qualityCommunity driven, I can’t speak for projectMapRFS gives better chunk size controlNFS support may make small test drivers easierUnified namespace will allow multi-cluster accessMight even have drill component that autoformats dataRead only model
Protocol buffers are conceptual data modelWill support multiple data modelsWill have to define a way to explain data format (filtering, fields, etc)Schema-less will have perf penaltyHbase will be one format
Note: we have an already partially built execution engine
Example query that Drill should supportNeed to talk more here about what Dremel does
Be prepared for Apache questionsCommitter vs committee vs contributorIf can’t answer question, ask them to answer and contributeLisa - Need landing pageReferences to paper and such at end