Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to get started in Big Data without Big Costs - StampedeCon 2016


Published on

Looking to implement Hadoop but haven’t pulled the trigger yet? You are not alone. Many companies have heard the hype about how Hadoop can solve the challenges presented by big data, but few have actually implemented it. What’s preventing them from taking the plunge? Can it be done in small steps to ensure project success?

This session will discuss some of the items to consider when getting started with Hadoop and how to go about making the decision to move to the de facto big data platform. Starting small can be a good approach when your company is learning the basics and deciding what direction to take. There is no need to invest large amounts of time and money up front if a proof of concept is all you aim to provide. Using well known data sets on virtual machines can provide a low cost and effort implementation to know if your big data journey will be successful with Hadoop.

Published in: Technology
  • Login to see the comments

  • Be the first to like this

How to get started in Big Data without Big Costs - StampedeCon 2016

  1. 1. Dipping Your Toe Into Hadoop How to get started in Big Data without Big Costs Bobby Dewitt VP, Systems Architect Aisle411 StampedeCon 2016
  2. 2. My Background • Oracle, MySQL, and PostgreSQL DBA with 15 years of experience • Led database, infrastructure, and business intelligence teams to deliver highly available data systems • Currently responsible for design, implementation, and operational availability of infrastructure and systems at Aisle411
  3. 3. Aisle411 • Digitizing the indoor world • Indoor maps, positioning, and analytics • Asset and customer tracking within locations • Using augmented reality to make indoor solutions more interactive • Small company - big data
  4. 4. RDBMS Versus Hadoop • Relational databases • Very structured data • Good for transactional and operational systems • Difficult to scale out • Hardware failures can be disastrous • Hadoop • Semistructured or unstructured data • Good for batch and bulk processing as well as analytic systems • Simple to scale out • Hardware failures are handled seamlessly
  5. 5. Hadoop Adoption • Still not a reality for many companies • Major barriers include • Lack of skilled employees • Getting value out of the investment • Constant changes to the ecosystem
  6. 6. Kick the Tires • Play around with it • A Hadoop cluster can reside on a single machine • Pre-loaded virtual machines • Install on EC2 or other cloud VM
  7. 7. What Data Should I Use? • Stick with what you know • Choose a dataset that is not specific to your company • Try documented examples and use cases
  8. 8. Example Datasets • Apache web server logs • Twitter feeds • Stock market prices • Census data • Sports statistics • Song data
  9. 9. Apache Web Log Data • Many online resources • Potentially large data set • Real business value • Combine with other data sources
  10. 10. From Batch to Streaming • Initial testing done with a batch load using HDFS tools • Setup streaming to provide near real-time updates • Used several Hadoop components • HDFS • Flume • Morphlines • Avro • Hive • Impala
  11. 11. Quick Wins • Get data into HDFS • Get data into Hive or Impala • Stream live data • Combine with other data sources • Create pretty graphs and charts
  12. 12. Costs • Start small with a data puddle • Use virtual machines, not the big appliance • Research and experimentation time may be biggest cost
  13. 13. Where Am I? • Evaluate your initial trials • Is Hadoop everything you thought it would be? • Do you have a real business need to use it? • Can you migrate any existing data or processes?
  14. 14. Training • Hortonworks University • MapR Academy • Cloudera quick start tutorials • Online classes through Coursera, edX, and others • Conferences like StampedeCon
  15. 15. Hadoop Is Not For Everyone • Your “big data” may not be big enough • Still some work to be done with security and tools • Skills are being learned, but not quickly enough
  16. 16. Thank You • Questions?