Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BigData Meets the Federal Data Center


Published on

BigData Meets the Federal Data Center - an overview of nosql solutions to data challenges (e.g. Hadoop, Hbase, Mongodb, cassandra, redis etc). Also includes a vignette on Google Prediction API.

Published in: Technology, Business

BigData Meets the Federal Data Center

  1. 1. BigData Meets the Federal Data Center:Practical Solutions for Wicked Problems<br />Abe Usher, CCHP, CISSP – Chief Technology Officer<br /><br />
  2. 2. whoami<br /><ul><li>CTO at HumanGeo Group
  3. 3. Former Google Engineer
  4. 4. Former officer USA, USAF
  5. 5. Cloudera Certified Hadoop Professional</li></ul>For the past three years, I’ve been focused<br />on BigData analytics for special operations<br />
  6. 6. Outline<br />Warm up<br /><ul><li>Trends
  7. 7. Challenges
  8. 8. Architectures & Patterns
  9. 9. Solutions
  10. 10. Action plan for decision makers
  11. 11. Homework for data engineers</li></ul>Fun stuff<br />
  12. 12. What is Cloud?<br />Cloud Computing defined:<br />“Delivery of computing as a service”<br />
  13. 13. Big Trends<br />Michael Driscoll<br />
  14. 14. More Trends<br />BigData is now cool (not just geeky)<br />There is an explosion of open source technology for BigData<br />Available cloud technologies are significantly changing our society<br />
  15. 15. Common Challenges<br />Federal organizations face declining IT budgets<br />Legacy systems not engineered for BigData<br />Creating value from data is hard<br />
  16. 16. Possible Solutions<br />Federal organizations face declining IT budgets<br /><ul><li>Outsource and eliminate non-core activities</li></ul>Legacy systems not engineered for BigData<br /><ul><li>Augmentyour enterprise with open source tools & platforms</li></ul>Creating value from data is hard<br /><ul><li>Apply transparency to enable crowdsourcing</li></li></ul><li>Where are we? <br />Where do we want to go?<br />Data processing “State of the Art”<br />“We have great intentions, but<br />It is a big mess.”<br />“Don’t look behind the curtain.”<br />
  17. 17. Where are we? <br />Where do we want to go?<br />Data processing “State of the Art”<br />A Better (elusive) Future<br />“We have great intentions, but<br />It is a big mess.”<br />“Don’t look behind the curtain.”<br />“The right tool for the right problem.”<br />“Outsource/eliminate things outside <br />of core competencies.”<br />
  18. 18. Pattern 1:<br />Outsource Infrastructure & Apps<br />“The Enterprise”<br />“The Cloud”<br />Just-in-time Servers<br />Email & Calendar<br />Travel Coordination<br />
  19. 19. Pattern 2: <br />Consolidate Data and Analyze It<br />“Future Enterprise”<br />“The Enterprise Today”<br />Redis<br />MongoDB<br />1. Incrementally adopt BigData tools as you evolve your Enterprise<br />2. Maintain parallel capabilities if<br />necessary<br />Hadoop<br />
  20. 20. Vignette 1<br />Tame massive streaming data <br />in 5 minutes or less.<br />
  21. 21. Recipe1:<br />MongoDB tames Twitter<br />Directions<br />Combine ingredients on laptop<br />Run cURL command to grab Twitter sprinkler<br />Pipe data into Mongo & ES<br />Search & Analyze<br />Ingredients<br /><ul><li>Twitter account
  22. 22. cURL utility
  23. 23. MongoDB
  24. 24. ElasticSearch (optional)
  25. 25. Laptop</li></li></ul><li>Recipe1:<br />MongoDB tames Twitter<br />Simple steps:<br />Run a mongoDB server<br />Run the script to capture twitter from the command line*<br />View the results in MongoVUE<br />Inspired by Elliot Horowitz<br />
  26. 26. Recipe1:<br />MongoDB tames Twitter<br />Why MongoDB:<br />Incredibly easy to setup<br />Fast data inserts (> 20,000 per second or 1,728,000,000 per day)<br />Horizontal scaling as data grows<br />Pluggable compression with Snappy<br />Get the code!<br />
  27. 27. Vignette 2<br />Ask Google to <br />solve your problems<br />
  28. 28. Google Prediction API<br />
  29. 29. Recipe2:<br />Google categorizes languages<br />Directions<br />Upload data to be categorized or predicted<br />Ask Google to do prediction<br />Evaluate results<br />Repeat the process<br />Ingredients<br /><ul><li>Google account
  30. 30. cURL utility
  31. 31. Raw data in multiple languages
  32. 32. Laptop</li></li></ul><li>Recipe2:<br />Google categorizes languages<br />Get the code!<br />
  33. 33. Recipe2:<br />Results<br />
  34. 34. More Google Prediction API Ideas<br />Capabilities*<br /><ul><li>Language detection / categorization
  35. 35. Spam detection
  36. 36. Recommendation system (e.g. Netflix, Amazon)
  37. 37. Customer sentiment analysis
  38. 38. Document / email classification
  39. 39. Suspicious activity identification
  40. 40. Purchase predictions
  41. 41. Predict driver behavior and optimize vehicle control systems**</li></ul>Between the Google Prediction API and our own research, we are discovering ways <br />to make information work for the driver and help deliver optimal vehicle performance.<br />–Ryan McGee, Technical Expert, Ford Research and Innovation<br />*<br />**<br />
  42. 42. Action Plan for Decision Makers<br />Experiment with Cloud-sourcing:<br />Inventory your data and your systems<br />Join a Meetup to get informed<br />Take a risk*<br />
  43. 43. Homework for Data Engineers<br />Understand Google MapReduce:<br />Experiment with NoSQL:<br />Ask Google to Predict the future:<br />Take the cloud for a test drive:<br />Try something, fail fast<br />
  44. 44. On-line<br />Twitter: @abeusher<br />E-mail:<br />Web:<br />Facebook:<br />This presentation:<br /><br />
  45. 45. BACKUP<br />
  46. 46. HumanGeo Group<br />What is HumanGeo:<br />HumanGeo is an innovative software & data analysis company that enables Big Data Analytics for Decision Support, Social Media Monitoring, Market Intelligence, and Business Intelligence for Corporations and Government. <br />We maximize the value of your data<br /><ul><li>Aggregating data sources and indicators
  47. 47. Monitoring trends and opportunities
  48. 48. Augmenting and enriching data with influence and sentiment indicators</li></ul>Tier One special operations intelligence and technology experts with Google experience and agility<br />