Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using Jython To Prototype Mahout Code

Presentation to the DC BigData meetup #2 on using Mahout via jython

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to comment

Using Jython To Prototype Mahout Code

  1. 1. Using
Jython
To
Prototype
 Mahout
Code Jonathan
Altman Principal
Engineer,
Concur Twi=er:
@async_io
  2. 2. Who
Am
I
and
What
Do
I
Do?• By
day:
principal
engineer
at
Concur• Architect
of
high‐volume
travel
booking
site• Architect
of
travel
data
model
for
iKnerary
 storage/expense
integraKon• Currently
team
lead
for
effort
to
leverage
our
 travel
and
spend
data
into
an
effecKve
 recommendaKon
engine
  3. 3. What
is
Mahout?• Java
library
of
pre‐built
implementaKons
of
 various
machine
learning
tasks• Recommenders:
collaboraKve
filtering• Clustering:
grouping
things
by
similarity• ClassificaKon:
analysis
of
a
corpus
for
clustering• Intended
to
run
against
Hadoop‐based
data
sets• h=p://mahout.apache.org/
  4. 4. What
is
jython?• ImplementaKon
of
python
that
runs
against
 the
jvm• Has
full
access
to
any
well‐behaved
java
library• Started
in
1997
by
Jim
Hugunin,
who
also
later
 did
IronPython
for
the
.Net
CLR• Version
2.5.2
mirrors
python
2.5• h=p://www.jython.org/
  5. 5. Why
Do
This?• I
needed
to
evaluate
Mahout’s
suitability
as
 the
toolkit
for
our
travel
recommender
system• I
am
not
primarily
a
java
dev
(yet?),
and
I
don’t
 know
how
to
create
a
maven
project• But
I
do
know
python• Fastest
way
between
2
points
is
a
straight
line• Step
1:
adapt
sample
code
from
“Mahout
In
 AcKon”
to
jython
  6. 6. How
Do
I
Do
This?# Add Mahout jars to jython’s pathsys.path.append(os.environ.get("MAHOUT_CORE"))for jar in glob.glob(os.environ.get("MAHOUT_JAR_DIR") +"/*.jar"): sys.path.append(jar)# import classes from Mahout jar…from org.apache.mahout.cf.taste.impl.model.file import *# Bunch of imports deleteddef main(): # and we are using the imported FileDataModel model = FileDataModel(File(sys.argv[1]))
  7. 7. What
Did
We
Learn?• About
3
hours
to
port
first
“Mahout
In
AcKon”
 example
to
jython• 3
minutes
to
port
the
second• Includes
learning
how
to
import
jars
into
python• And
building
a
nice
loop
to
punt
on
jar
 dependency
management
:‐)• Increases
ability
to
experiment
with
ideas
in
 Mahout
by
reducing
ceremony
  8. 8. Want
Some
Extra
Stuff?• Python
IDEs
that
work
with
jython: – PyCharm
(JetBrains) – PyDev
(Eclipse
add‐on) – WingIDE
(no
debugger)• Ported
GroupLens
100k
data
set
example
from
 secKon
2.5
of
“Mahout
In
AcKon”
is
at
h=ps:// gist.github.com/1041033

×