Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Thailand Hadoop Big Data Challenge #1

1,764 views

Published on

Thailand Hadoop Big Data Challenge #1: 13-15 February 2015

Published in: Technology
  • Be the first to comment

Thailand Hadoop Big Data Challenge #1

  1. 1. Thailand Hadoop Big Data Challenge #1 13-15 March 2015
  2. 2. 2 Special thanks to Amazon Web Services for supporting AWS's credit to run EMR Hadoop cluster
  3. 3. 3 Schedule 13 March – 16.00 - 18.00 Workshop / Demo on Big Data Analytics using Amazon EMR – 18.00: Start registration for those who interested in running the cluster for 30 Hours & Account access to Amazon EMR will be given 14 March – 06.00 Amazon EMR Cluster will be opened – Participant will be discussed via online / Social Media 15 March (@ EGA Office) – 12.00 Amazon EMR will be closed – 13.00 Presentation by each competitor on the result – 15.30 Winner Announcement
  4. 4. 4 Architecture Overview of Amazon EMR
  5. 5. 5 Hadoop Cluster for the challenge 10 AWS’s m3.xlarge EC2 server each with 4vCPU, 15 GByte Memory, 80 GB SSD Memory A sample data set with more than 10 million records will be given
  6. 6. 6 Challenge rules A competitor can use a sample data to analyse with Hive, Pig or Map/Reduce In addition, a competitor can use own large set of data. A winner will be judged from those who have a best innovation / result from the analytics. Those who are just would like to try using the cluster are also welcome
  7. 7. 7 Judging Criteria: Complexity of the problem & Data Set 30% Benefit to the society 20% Innovation 30% Presentation 20%
  8. 8. 8 Judges Assoc.Prof. Dr.Jirapun Daengdej Mr. Danairat Thanabodithammachari Dr.Thanachart Numnonda Ms.Nantawan Wongkachonkitti
  9. 9. 9 Awards The best winner will receive an Apple TV. Two winners will be selected for two free training courses on – Big Data using Hadoop Workshop; 30-31 March 2015 – Business Intelligence Design and Process; 18-20, 25-26 May 2015 Starbucks Card 200 Baht
  10. 10. 10 EMR Cluster Setup (This will be done by IMC Institute)
  11. 11. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Select EMR
  12. 12. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Creating a cluster in EMR
  13. 13. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Creating a cluster in EMR (cont.) Name the cluster and also specify Log folder
  14. 14. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Creating a cluster in EMR (cont.) Leave the Software Configuration as default
  15. 15. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Creating a cluster in EMR (cont.) Leave the Hardware Configuration as default Choose an exisitng EC2 key pair
  16. 16. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Creating a cluster in EMR (cont.) Leave the others as default Select Create Cluster
  17. 17. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop EMR Cluster Details Note on the Master public DNS: To see the details on how to connect to the Master Node using SSH click at SSH
  18. 18. 18 Running the cluster
  19. 19. 19 Set Up an SSH Tunnel to the Master Node – See instruction at – http://docs.aws.amazon.com/ElasticMapReduce/latest/ DeveloperGuide/emr-ssh-tunnel.html
  20. 20. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop SSH Instruction for Mac/Linux
  21. 21. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop SSH Instruction for Windows
  22. 22. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Connect to the master node
  23. 23. 23 Launch the Hue Web Interface Set Up an SSH Tunnel to the Master Node – See instruction at – http://docs.aws.amazon.com/ElasticMapReduce/latest/Devel operGuide/emr-ssh-tunnel.html Configure Proxy Settings to View Websites – See instruction at – http://docs.aws.amazon.com/ElasticMapReduce/latest/Devel operGuide/emr-connect-master-node-proxy.html
  24. 24. 24 Launch the Hue Web Interface (Cont.) http://master-public-dns-name:8888/
  25. 25. 25
  26. 26. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Web Interface Host on EMR Cluster
  27. 27. 27 Running Hive Demo
  28. 28. 28 Movielen Data http://grouplens.org/datasets/movielens/ MovieLens 10M (http://files.grouplens.org/datasets/movielens/ml-10m.zip) – ratings.dat – users.dat – movies.dat
  29. 29. 29 Transfer Data to Hadoop Cluster wget http://files.grouplens.org/datasets/movielens/ml-10m.zip
  30. 30. 30 Change data format
  31. 31. 31 Upload Data to Amazon S3 hadoop fs -put movies.csv s3://imcinstitute/data
  32. 32. 32 Running Hive from CLI
  33. 33. 33 Running Hive from Hue
  34. 34. 34 Running Example https://github.com/myui/hivemall/wiki/MovieLens-Dataset
  35. 35. 35 Data Challenge
  36. 36. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Flight Details Data http://stat-computing.org/dataexpo/2009/the-data.html
  37. 37. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Data Description
  38. 38. Thanachart Numnonda, thanachart@imcinstitute.com Feb 2015Big Data Hadoop on Amazon EMR – Hands On Workshop Snapshot of Dataset
  39. 39. 39 Register for the challenge
  40. 40. 40 Registration Provide your name, organization, mobile, e-mail address On-site registartion at 17.00 pm, 13 March E-mail: contact@imcinstitute.com Facebook message to Thanachart Numnonda Your username & password & key & public DNS will be send to your e-mail by 6 am, 14 March
  41. 41. 41 On-line communication Facebook Group: Hadoop-Thailand Line group Facebook message E-mail to contact@imcinstitute.com
  42. 42. 42 www.facebook.com/imcinstitute
  43. 43. 43 Thank you thanachart@imcinstitute.com www.facebook.com/imcinstitute www.slideshare.net/imcinstitute www.thanachart.org

×