An introduction to Apache Mahout

  • 947 views
Uploaded on

A introduction to Apache Mahout, what is it and …

A introduction to Apache Mahout, what is it and
how does it work ? What is machine inteligence ?
How can mahout be installed and tested on Hadoop ?

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
947
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
52
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Apache Mahout ● What is it ? ● How does it work ? ● Machine Learning ● Algorithms ● Install www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 2. Mahout – What is it ? ● Machine learning ● For large data ● Based on Hadoop ● But can work on a non Hadoop cluster ● Scaleable ● Licensed by Apache www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 3. Mahout – How does it work ? ● Uses Hadoop Map Reduce ● Has many supplied algorithms ● Supports four use cases – Recommendation mining – Clustering – Classification – Frequent Itemset Mining www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 4. Mahout - Machine Learning Machine learning – what does it mean ? ● A branch of artificial intelligence ● Systems that learn from data ● Classify data after learning ● Learn on test data sets ● Generalisation – the ability to classify unseen data sets – after learning www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 5. Mahout – Algorithms Some of the available algorithms (among many others) – Collaborative filtering ● Narrow Sense – make predictions about user interests by collecting preferences ● General - Multi agent collaboration for information filtering – Mean shift clustering ● Mode seeking, used for visual tracking – Parallel frequent pattern mining ● Find unique features www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 6. Mahout – Install So how do we install Mahout and test it ? – Install Maven ● sudo apt-get install maven3 – Install Apache Mahout ● You will need subversion installed ● svn co http://svn.apache.org/repos/asf/mahout/trunk ● Go to dir containing pom.xml file – mvn install ## in ./trunk Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 7. Mahout – Test Install So let us run a test ● cd $MAHOUT_HOME/examples/bin ● ./build-reuters.sh ● choose option 1 kmeans clustering ● Should finish with – see next slide Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 8. Mahout – Test Install cd $MAHOUT_HOME/examples/bin ; ./build-reuters.sh Please call cluster-reuters.sh directly next time. This file is going away. Please select a number to choose the corresponding clustering algorithm 1. kmeans clustering 2. fuzzykmeans clustering 3. lda clustering Enter your choice : 1 ok. You chose 1 and we'll use kmeans Clustering ................................. Inter-Cluster Density: NaN Intra-Cluster Density: 0.0 CDbw Inter-Cluster Density: NaN CDbw Intra-Cluster Density: NaN CDbw Separation: NaN Full details available in the Mahout install guide on our web site shop www.semtech-solutions.co.nz info@semtech-solutions.co.nz
  • 9. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems
  • 10. Contact Us ● Feel free to contact us at – www.semtech-solutions.co.nz – info@semtech-solutions.co.nz ● We offer IT project consultancy ● We are happy to hear about your problems ● You can just pay for those hours that you need ● To solve your problems