Use dependency injection to get Hadoop out of your application code

Opower @ Hadoop Summit North America
Use Dependency Injection to
get Hadoop out of your
application code
June 27, 2013
Eric Chang
Technology Lead, Data Services
Opower

OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE
Agenda
1. Problem Statement
2. Solution
3. Example
4. Opower case study
5. Wrap up

Opower @ Hadoop Summit North America 3
Problem Statement
“Hadoop is hard, let’s go shopping!”,
or Effective Separation of Concerns in
Hadoop

Problem Statement
» Why Separation of Concerns?
• Integration/migration of existing code
• Allows for code re-use
• Allows for different levels of expertise
• Greater testability
» Hadoop doesn’t do Separation of Concerns
• serialization, input/output formats, and partitioning are
not portable
• provides little guidance/out of the box functionality for
integrating code components (existing or new)

Solution
Dependency Injection, or
“Don’t call us, we’ll call you”

Solution: DI, illustrated
aRealtimeCallFromTheWeb() {
IoC container
<<BusinessService>>
BizServiceImpl
<<ReadDAO>>
<<WriteDAO>>
Realtime
ReadDAO
Realtime
WriteDAO
businessService.run()
}
Realtime
DataStore

Solution: DI, illustrated
IoC container
<<BusinessService>>
BizServiceImpl
<<ReadDAO>>
<<WriteDAO>>
reduce(key, values, context) {
ContextBacked
WriteDAO
businessService.run()
}
ValuesBacked
ReadDAO

Example
Small-batch, Artisanal WordCount
-> Petabyte-scale WordCount*
*healthy suspension of disbelief required
refs http://wiki.apache.org/hadoop/WordCount

Example: Artisanal WordCount
» You live in a borough of NYC and have a beard
» You’ve built a great business around counting words, one at
a time, in small, handcrafted batches in linear O(n) time
» You receive files from customers and run your simple but
effective code
» You had the foresight to know that some day you need to
scale up. So you created a properly componentized
architecture:
• Domain objects
• Data access layer
• Service layer (application logic)

Opower @ Hadoop Summit North America
WordCountDTO
word : String
count: int
12
<<WordCountDAO>>
getWords() : Iterable<String>
writeWordCount(count : WordCountDTO)
<<WordCountService>>
countWord(word : String)
WordCount
ServiceImpl
ArtisanalWord
CountDAO
1
2
3
1. Retrieve words
2. Count words
3. Write count

» Core business logic: WordCountServiceImpl.countWord()
public void countWord(String word) {
int wordCount = 0;
for(String nextWord :
wordCountDAO.getWords()){
if(nextWord.equals(word))
++wordCount;
}
WordCountDTO wordCountDTO =
new WordCountDTO(word, wordCount);
wordCountDAO.writeWordCount(wordCountDTO);
}

» IoC configuration (Google Guice)
public class WordCountGuiceModule extends
AbstractModule {
...
@Override
protected void configure() {
bind(WordCountService.class)
.to(WordCountServiceImpl.class);
bind(WordCountDAO.class)
.toInstance(this.wordCountDAO);
}
}

» Artisanal WordCount wiring and execution
WordCountDAO wordCountDAO =
new ArtisanalWordCountDAO(inFile, outFile);
WordCountService wordCountService =
Guice.createInjector(
new WordCountGuiceModule(wordCountDAO)
).getInstance(WordCountService.class);
for(String word : getWordsToCount()) {
wordCountService.countWord(word);
}

artisanalWordCount() {
IoC container
WordCountServiceImpl
<<WordCountDAO>>
ArtisanalWord
CountDAO
wordCountService
.countWord(“hat”)
}
bat
cat
hat
mat
hat
sat
rat
…

Example: Petabyte WordCount
» Indie days are over: petabytes of words!
» O(n) won’t cut it
» Hadoop to the rescue. You partition by word in your map
phase. Your reduce method looks like:
public void reduce(Text key,
Iterable<IntWritable> values,
Context context)
» MapReduceWordCountDAO fulfills the WordCountDAO
contract (more on this later)
» WordCountDTOs are written to an MR context and collected

IoC container
WordCountServiceImpl
<<WordCountDAO>>
MapReduce
WordCountDAO
wordCountService
.countWord(key.toString())
}
bat
cat
hat
mat
hat
sat
cat
…
bat: <1>
cat: <1,1>
hat: <1,1>

» Petabyte WordCount wiring and execution
public void reduce(Text key,
Iterable<IntWritable> values, Context ctx){
MapReduceWordCountDAO wordCountDAO = new
MapReduceWordCountDAO(key,values,ctx);
WordCountService wordCountService =
Guice.createInjector(
new WordCountGuiceModule(wordCountDAO)
).getInstance(WordCountService.class);
wordCountService.countWord(key.toString());
}

Opower case study: Bill Projection
» Opower in 5 bullet points
• We
• Help
• People like you & me
• Reduce
• Your energy usage
» … by working with utility companies to analyze energy
usage and provide actionable insights
» One of the ways we do this is via Bill Projection

» How it works
• Retrieve energy usage (kWh, therms)
• Forecast usage
• Apply rates to project costs
Rate
Engine
rates
$30

» DI used to employ the same code components for batch
and in-process, synchronous (real-time) calculations
Batch M/R
calculations
In-process
calculations
web emailsms ivr
Bill Projection
code components
Curated data
inputs
Results
validation

Spring IoC container
<<BillForecastService>>
BillForecastServiceImpl
billForecastService
.forecast()
}
HBase
map()
<<UsageDAO>>
<<RateEngine>>
RateEngineImpl
MapReduceDAOMRUsageDAO

calculateBillProjection() {
Spring IoC container
<<BillForecastService>>
BillForecastServiceImpl
<<UsageDAO>>
<<RateEngine>>
RateEngineImpl
MapReduceDAOHBaseUsageDAO
HBase
billForecastService
.forecast()
}

» Benefits of DI solution
• Were able to use pre-Hadoop Rate Engine code component
• Calculations can be applied in batch and/or in real-time
• Good test coverage

Wrap up
» Dependency Injection + Hadoop gives you
• Separation of Concerns
• Batch and real-time calculations using the same code
» Some limitations
• Code is sufficiently componentized
• Assumes domain classes can survive MR partitioning
• Somebody still has to know MR
» Opower employs DI + Hadoop to serve up Bill Projections
using a mixed batch + real-time workflow

Wrap up
» Questions?
Eric Chang
Technical Lead, Data Services
Opower
eric@opower.com
http://www.linkedin.com/in/ericgchang
Artisanal WordCount example:
https://github.com/opower/artisanal-word-count

Use dependency injection to get Hadoop out of your application code

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Use dependency injection to get Hadoop out of your application code

Similar to Use dependency injection to get Hadoop out of your application code (20)

More from DataWorks Summit

More from DataWorks Summit (20)

Recently uploaded

Recently uploaded (20)