• Save
Using Jython To Prototype Mahout Code
Upcoming SlideShare
Loading in...5
×
 

Using Jython To Prototype Mahout Code

on

  • 2,858 views

Presentation to the DC BigData meetup #2 on using Mahout via jython

Presentation to the DC BigData meetup #2 on using Mahout via jython

Statistics

Views

Total Views
2,858
Views on SlideShare
2,841
Embed Views
17

Actions

Likes
2
Downloads
0
Comments
0

4 Embeds 17

http://aaron.jorb.in 6
http://storify.com 6
https://twitter.com 3
http://dschool.co 2

Accessibility

Categories

Upload Details

Uploaded via as Apple Keynote

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • First we built a travel booking tool\nThen we integrated it with expense and built reporting\nThen we went back and built the trip data storage subsystem to handle increased volumes of data\nNow we are trying to put the combined travel and expense data into Hadoop to do analysis and leverage the knowledge of our customers for their benefit\n
  • So Mahout looked like it might be a good way to bootstrap our efforts around building recommendations. If nothing else, it might be a fast path to v1 while we write more specialized algorithms tuned to our specific data sets as a v2.\n
  • It’s very cool: Jim H started both projects as tests: jython to see if jvm would be faster than python’s vm. IronPython to “prove” CLR was slow compared to e.g. JVM (it wasn’t)\nYeah, jython’s definitely on the cutting edge with python 2.5 support\n
  • Mahout appears to be a good system for doing recommendation engines. We need to find out how good, and what its strengths and limitations are.\n\nI do know some java; enough to do some light recreational Android programming. But not only do I know python, the data scientist who will actually determine the optimal factors to build our recommendation engine on knows python. She also doesn’t know java (yet?). So I have a tool that the team is familiar with\n\nJust building Mahout so I could test it out was painful enough. It requires maven2 to build, but since this is an existing project it was all configured for me to just build after downloading. But I still find it painful to watch maven work.\n\nI shuddered at the thought of having to actually do the maven setup for a new project that would have to be built\n\nMost importantly here, what you end up with when you make Mahout accessible via jython is a rapid prototyping/testing/experimentation tool for building out Mahout code. We’ve taken out the ceremony. That’s all.\n\nWhen you’re done figuring out what you need to do, you could then move to compiled java for speed.\n\nBut, for many/most applications, you can probably stop there. The actual Mahout processing is the serious limiting factor here, not the jython code. My suspicion is that there’s far more performance to be gained optimizing the actual Mahout implementation than moving the jython code (which is native jvm by the time it runs) to java/scala/clojure\n
  • \n
  • The single largest chunk of my time was actually spent trying to decide what jars I had to append to my jython path, followed by really grokking the jython path/import stuff\n\nAs you can see, after enough time I just punted on the jar dependencies. Every single jar is on the path, although I only import from the ones I need. Worth some research into jython to see if I’m adding any overhead other than search path like opening/inspecting the jars. I suspect not.\n\nNow, if you knew maven, it might take less time to start a new project and get it up than I would take, but once *I* was done, every subsequent jython script takes almost no time to set up, and the project is ready to run as soon as you’ve saved your source code.\n\nWe can work without having to either build a new app for every experiment, or build in some way to control which experiment runs in some ever-growing app\n
  • I haven’t really tested either PyCharm or PyDev to do these things. Someone else can do *that* lightning talk at a later meetup\n

Using Jython To Prototype Mahout Code Using Jython To Prototype Mahout Code Presentation Transcript

  • Using
Jython
To
Prototype
 Mahout
Code Jonathan
Altman Principal
Engineer,
Concur Twi=er:
@async_io
  • Who
Am
I
and
What
Do
I
Do?• By
day:
principal
engineer
at
Concur• Architect
of
high‐volume
travel
booking
site• Architect
of
travel
data
model
for
iKnerary
 storage/expense
integraKon• Currently
team
lead
for
effort
to
leverage
our
 travel
and
spend
data
into
an
effecKve
 recommendaKon
engine
  • What
is
Mahout?• Java
library
of
pre‐built
implementaKons
of
 various
machine
learning
tasks• Recommenders:
collaboraKve
filtering• Clustering:
grouping
things
by
similarity• ClassificaKon:
analysis
of
a
corpus
for
clustering• Intended
to
run
against
Hadoop‐based
data
sets• h=p://mahout.apache.org/
  • What
is
jython?• ImplementaKon
of
python
that
runs
against
 the
jvm• Has
full
access
to
any
well‐behaved
java
library• Started
in
1997
by
Jim
Hugunin,
who
also
later
 did
IronPython
for
the
.Net
CLR• Version
2.5.2
mirrors
python
2.5• h=p://www.jython.org/
  • Why
Do
This?• I
needed
to
evaluate
Mahout’s
suitability
as
 the
toolkit
for
our
travel
recommender
system• I
am
not
primarily
a
java
dev
(yet?),
and
I
don’t
 know
how
to
create
a
maven
project• But
I
do
know
python• Fastest
way
between
2
points
is
a
straight
line• Step
1:
adapt
sample
code
from
“Mahout
In
 AcKon”
to
jython
  • How
Do
I
Do
This?# Add Mahout jars to jython’s pathsys.path.append(os.environ.get("MAHOUT_CORE"))for jar in glob.glob(os.environ.get("MAHOUT_JAR_DIR") +"/*.jar"): sys.path.append(jar)# import classes from Mahout jar…from org.apache.mahout.cf.taste.impl.model.file import *# Bunch of imports deleteddef main(): # and we are using the imported FileDataModel model = FileDataModel(File(sys.argv[1]))
  • What
Did
We
Learn?• About
3
hours
to
port
first
“Mahout
In
AcKon”
 example
to
jython• 3
minutes
to
port
the
second• Includes
learning
how
to
import
jars
into
python• And
building
a
nice
loop
to
punt
on
jar
 dependency
management
:‐)• Increases
ability
to
experiment
with
ideas
in
 Mahout
by
reducing
ceremony
  • Want
Some
Extra
Stuff?• Python
IDEs
that
work
with
jython: – PyCharm
(JetBrains) – PyDev
(Eclipse
add‐on) – WingIDE
(no
debugger)• Ported
GroupLens
100k
data
set
example
from
 secKon
2.5
of
“Mahout
In
AcKon”
is
at
h=ps:// gist.github.com/1041033