SlideShare a Scribd company logo
1 of 31
Opower @ Hadoop Summit North America
Use Dependency Injection to
get Hadoop out of your
application code
June 27, 2013
Eric Chang
Technology Lead, Data Services
Opower
OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE
Agenda
1. Problem Statement
2. Solution
3. Example
4. Opower case study
5. Wrap up
Opower @ Hadoop Summit North America 3
Problem Statement
“Hadoop is hard, let’s go shopping!”,
or Effective Separation of Concerns in
Hadoop
Opower @ Hadoop Summit North America 4
Problem Statement
» Why Separation of Concerns?
• Integration/migration of existing code
• Allows for code re-use
• Allows for different levels of expertise
• Greater testability
» Hadoop doesn’t do Separation of Concerns
• serialization, input/output formats, and partitioning are
not portable
• provides little guidance/out of the box functionality for
integrating code components (existing or new)
OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE
Agenda
1. Problem Statement
2. Solution
3. Example
4. Opower case study
5. Wrap up
Opower @ Hadoop Summit North America 6
Solution
Dependency Injection, or
“Don’t call us, we’ll call you”
Opower @ Hadoop Summit North America 7
Solution: DI, illustrated
aRealtimeCallFromTheWeb() {
IoC container
<<BusinessService>>
BizServiceImpl
<<ReadDAO>>
<<WriteDAO>>
Realtime
ReadDAO
Realtime
WriteDAO
businessService.run()
}
Realtime
DataStore
Opower @ Hadoop Summit North America 8
Solution: DI, illustrated
IoC container
<<BusinessService>>
BizServiceImpl
<<ReadDAO>>
<<WriteDAO>>
reduce(key, values, context) {
ContextBacked
WriteDAO
businessService.run()
}
ValuesBacked
ReadDAO
OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE
Agenda
1. Problem Statement
2. Solution
3. Example
4. Opower case study
5. Wrap up
Opower @ Hadoop Summit North America 10
Example
Small-batch, Artisanal WordCount
-> Petabyte-scale WordCount*
*healthy suspension of disbelief required
refs http://wiki.apache.org/hadoop/WordCount
Opower @ Hadoop Summit North America 11
Example: Artisanal WordCount
» You live in a borough of NYC and have a beard
» You’ve built a great business around counting words, one at
a time, in small, handcrafted batches in linear O(n) time
» You receive files from customers and run your simple but
effective code
» You had the foresight to know that some day you need to
scale up. So you created a properly componentized
architecture:
• Domain objects
• Data access layer
• Service layer (application logic)
Opower @ Hadoop Summit North America
WordCountDTO
word : String
count: int
12
Example: Artisanal WordCount
<<WordCountDAO>>
getWords() : Iterable<String>
writeWordCount(count : WordCountDTO)
<<WordCountService>>
countWord(word : String)
WordCount
ServiceImpl
ArtisanalWord
CountDAO
1
2
3
1. Retrieve words
2. Count words
3. Write count
Opower @ Hadoop Summit North America 13
Example: Artisanal WordCount
» Core business logic: WordCountServiceImpl.countWord()
public void countWord(String word) {
int wordCount = 0;
for(String nextWord :
wordCountDAO.getWords()){
if(nextWord.equals(word))
++wordCount;
}
WordCountDTO wordCountDTO =
new WordCountDTO(word, wordCount);
wordCountDAO.writeWordCount(wordCountDTO);
}
Opower @ Hadoop Summit North America 14
Example: Artisanal WordCount
» IoC configuration (Google Guice)
public class WordCountGuiceModule extends
AbstractModule {
...
@Override
protected void configure() {
bind(WordCountService.class)
.to(WordCountServiceImpl.class);
bind(WordCountDAO.class)
.toInstance(this.wordCountDAO);
}
}
Opower @ Hadoop Summit North America 15
Example: Artisanal WordCount
» Artisanal WordCount wiring and execution
WordCountDAO wordCountDAO =
new ArtisanalWordCountDAO(inFile, outFile);
WordCountService wordCountService =
Guice.createInjector(
new WordCountGuiceModule(wordCountDAO)
).getInstance(WordCountService.class);
for(String word : getWordsToCount()) {
wordCountService.countWord(word);
}
Opower @ Hadoop Summit North America 16
Example: Artisanal WordCount
artisanalWordCount() {
IoC container
<<WordCountService>>
WordCountServiceImpl
<<WordCountDAO>>
ArtisanalWord
CountDAO
wordCountService
.countWord(“hat”)
}
bat
cat
hat
mat
hat
sat
rat
…
Opower @ Hadoop Summit North America 17
Example: Artisanal WordCount
Opower @ Hadoop Summit North America 18
Example: Petabyte WordCount
» Indie days are over: petabytes of words!
» O(n) won’t cut it
» Hadoop to the rescue. You partition by word in your map
phase. Your reduce method looks like:
public void reduce(Text key,
Iterable<IntWritable> values,
Context context)
» MapReduceWordCountDAO fulfills the WordCountDAO
contract (more on this later)
» WordCountDTOs are written to an MR context and collected
Opower @ Hadoop Summit North America 19
Example: Petabyte WordCount
reduce(key, values, context) {
IoC container
<<WordCountService>>
WordCountServiceImpl
<<WordCountDAO>>
MapReduce
WordCountDAO
wordCountService
.countWord(key.toString())
}
bat
cat
hat
mat
hat
sat
cat
…
bat: <1>
cat: <1,1>
hat: <1,1>
Opower @ Hadoop Summit North America 20
Example: Petabyte WordCount
» Petabyte WordCount wiring and execution
public void reduce(Text key,
Iterable<IntWritable> values, Context ctx){
MapReduceWordCountDAO wordCountDAO = new
MapReduceWordCountDAO(key,values,ctx);
WordCountService wordCountService =
Guice.createInjector(
new WordCountGuiceModule(wordCountDAO)
).getInstance(WordCountService.class);
wordCountService.countWord(key.toString());
}
Opower @ Hadoop Summit North America 21
Example: Petabyte WordCount
OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE
Agenda
1. Problem Statement
2. Solution
3. Example
4. Opower case study
5. Wrap up
Opower @ Hadoop Summit North America 23
Opower case study: Bill Projection
» Opower in 5 bullet points
• We
• Help
• People like you & me
• Reduce
• Your energy usage
» … by working with utility companies to analyze energy
usage and provide actionable insights
» One of the ways we do this is via Bill Projection
Opower @ Hadoop Summit North America 24
Opower case study: Bill Projection
» How it works
• Retrieve energy usage (kWh, therms)
• Forecast usage
• Apply rates to project costs
Rate
Engine
rates
$30
Opower @ Hadoop Summit North America 25
Opower case study: Bill Projection
» DI used to employ the same code components for batch
and in-process, synchronous (real-time) calculations
Batch M/R
calculations
In-process
calculations
web emailsms ivr
Bill Projection
code components
Curated data
inputs
Results
validation
Opower @ Hadoop Summit North America 26
Opower case study: Bill Projection
Spring IoC container
<<BillForecastService>>
BillForecastServiceImpl
billForecastService
.forecast()
}
HBase
map()
reduce(key, values, context) {
<<UsageDAO>>
<<RateEngine>>
RateEngineImpl
MapReduceDAOMRUsageDAO
Opower @ Hadoop Summit North America 27
Opower case study: Bill Projection
calculateBillProjection() {
Spring IoC container
<<BillForecastService>>
BillForecastServiceImpl
<<UsageDAO>>
<<RateEngine>>
RateEngineImpl
MapReduceDAOHBaseUsageDAO
HBase
billForecastService
.forecast()
}
Opower @ Hadoop Summit North America 28
Opower case study: Bill Projection
» Benefits of DI solution
• Were able to use pre-Hadoop Rate Engine code component
• Calculations can be applied in batch and/or in real-time
• Good test coverage
OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE
Agenda
1. Problem Statement
2. Solution
3. Example
4. Opower case study
5. Wrap up
Opower @ Hadoop Summit North America 30
Wrap up
» Dependency Injection + Hadoop gives you
• Separation of Concerns
• Batch and real-time calculations using the same code
» Some limitations
• Code is sufficiently componentized
• Assumes domain classes can survive MR partitioning
• Somebody still has to know MR
» Opower employs DI + Hadoop to serve up Bill Projections
using a mixed batch + real-time workflow
Opower @ Hadoop Summit North America 31
Wrap up
» Questions?
Eric Chang
Technical Lead, Data Services
Opower
eric@opower.com
http://www.linkedin.com/in/ericgchang
Artisanal WordCount example:
https://github.com/opower/artisanal-word-count

More Related Content

What's hot

Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes John Archer
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresDATAVERSITY
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...DataWorks Summit
 
Beyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIBeyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIDataWorks Summit
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleHortonworks
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureMicrosoft
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraCloudera, Inc.
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Cloudera, Inc.
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerDataWorks Summit
 
Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir Saxena
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Yellowfin
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteMark van Rijmenam
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data ArchitectureSplunk
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...Mark Rittman
 

What's hot (20)

Hadoop Trends
Hadoop TrendsHadoop Trends
Hadoop Trends
 
Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes Democratizing Data Science on Kubernetes
Democratizing Data Science on Kubernetes
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
It Takes a Village: Organizational Alignment to Deliver Big Data Value in Hea...
 
Beyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AIBeyond Big Data: Data Science and AI
Beyond Big Data: Data Science and AI
 
Benefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at ScaleBenefits of Transferring Real-Time Data to Hadoop at Scale
Benefits of Transferring Real-Time Data to Hadoop at Scale
 
Georgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft AzureGeorgia Azure Event - Scalable cloud games using Microsoft Azure
Georgia Azure Event - Scalable cloud games using Microsoft Azure
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...
 
Rob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San JoseRob Bearden Keynote Hadoop Summit San Jose
Rob Bearden Keynote Hadoop Summit San Jose
 
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
Hadoop in the Enterprise - Dr. Amr Awadallah @ Microstrategy World 2011
 
Data Science with Hadoop: A Primer
Data Science with Hadoop: A PrimerData Science with Hadoop: A Primer
Data Science with Hadoop: A Primer
 
Practical advice to build a data driven company
Practical advice to build a data driven companyPractical advice to build a data driven company
Practical advice to build a data driven company
 
Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume Sudhir hadoop and Data warehousing resume
Sudhir hadoop and Data warehousing resume
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
 
Hadoop Big Data Lakes Keynote
Hadoop Big Data Lakes KeynoteHadoop Big Data Lakes Keynote
Hadoop Big Data Lakes Keynote
 
SplunkSummit 2015 - Real World Big Data Architecture
SplunkSummit 2015 -  Real World Big Data ArchitectureSplunkSummit 2015 -  Real World Big Data Architecture
SplunkSummit 2015 - Real World Big Data Architecture
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
IlOUG Tech Days 2016 - Unlock the Value in your Data Reservoir using Oracle B...
 

Similar to Use dependency injection to get Hadoop *out* of your application code

Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Julian Hyde
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on HadoopDataWorks Summit
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopTony Ng
 
I Love APIs Europe 2015: Developer Sessions
I Love APIs Europe 2015: Developer SessionsI Love APIs Europe 2015: Developer Sessions
I Love APIs Europe 2015: Developer SessionsApigee | Google Cloud
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsCloudera, Inc.
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Alluxio, Inc.
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Neo4j
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...RTTS
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...ModusOptimum
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarRTTS
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS CloudIdan Tohami
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...nimak
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointInside Analysis
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
 

Similar to Use dependency injection to get Hadoop *out* of your application code (20)

Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14Cost-based query optimization in Apache Hive 0.14
Cost-based query optimization in Apache Hive 0.14
 
Experimentation Platform on Hadoop
Experimentation Platform on HadoopExperimentation Platform on Hadoop
Experimentation Platform on Hadoop
 
eBay Experimentation Platform on Hadoop
eBay Experimentation Platform on HadoopeBay Experimentation Platform on Hadoop
eBay Experimentation Platform on Hadoop
 
I Love APIs Europe 2015: Developer Sessions
I Love APIs Europe 2015: Developer SessionsI Love APIs Europe 2015: Developer Sessions
I Love APIs Europe 2015: Developer Sessions
 
Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
Integrating Google Cloud Dataproc with Alluxio for faster performance in the ...
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
Better Total Value of Ownership (TVO) for Complex Analytic Workflows with the...
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Data lake – On Premise VS Cloud
Data lake – On Premise VS CloudData lake – On Premise VS Cloud
Data lake – On Premise VS Cloud
 
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
Cross-Tier Application and Data Partitioning of Web Applications for Hybrid C...
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Use dependency injection to get Hadoop *out* of your application code

  • 1. Opower @ Hadoop Summit North America Use Dependency Injection to get Hadoop out of your application code June 27, 2013 Eric Chang Technology Lead, Data Services Opower
  • 2. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  • 3. Opower @ Hadoop Summit North America 3 Problem Statement “Hadoop is hard, let’s go shopping!”, or Effective Separation of Concerns in Hadoop
  • 4. Opower @ Hadoop Summit North America 4 Problem Statement » Why Separation of Concerns? • Integration/migration of existing code • Allows for code re-use • Allows for different levels of expertise • Greater testability » Hadoop doesn’t do Separation of Concerns • serialization, input/output formats, and partitioning are not portable • provides little guidance/out of the box functionality for integrating code components (existing or new)
  • 5. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  • 6. Opower @ Hadoop Summit North America 6 Solution Dependency Injection, or “Don’t call us, we’ll call you”
  • 7. Opower @ Hadoop Summit North America 7 Solution: DI, illustrated aRealtimeCallFromTheWeb() { IoC container <<BusinessService>> BizServiceImpl <<ReadDAO>> <<WriteDAO>> Realtime ReadDAO Realtime WriteDAO businessService.run() } Realtime DataStore
  • 8. Opower @ Hadoop Summit North America 8 Solution: DI, illustrated IoC container <<BusinessService>> BizServiceImpl <<ReadDAO>> <<WriteDAO>> reduce(key, values, context) { ContextBacked WriteDAO businessService.run() } ValuesBacked ReadDAO
  • 9. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  • 10. Opower @ Hadoop Summit North America 10 Example Small-batch, Artisanal WordCount -> Petabyte-scale WordCount* *healthy suspension of disbelief required refs http://wiki.apache.org/hadoop/WordCount
  • 11. Opower @ Hadoop Summit North America 11 Example: Artisanal WordCount » You live in a borough of NYC and have a beard » You’ve built a great business around counting words, one at a time, in small, handcrafted batches in linear O(n) time » You receive files from customers and run your simple but effective code » You had the foresight to know that some day you need to scale up. So you created a properly componentized architecture: • Domain objects • Data access layer • Service layer (application logic)
  • 12. Opower @ Hadoop Summit North America WordCountDTO word : String count: int 12 Example: Artisanal WordCount <<WordCountDAO>> getWords() : Iterable<String> writeWordCount(count : WordCountDTO) <<WordCountService>> countWord(word : String) WordCount ServiceImpl ArtisanalWord CountDAO 1 2 3 1. Retrieve words 2. Count words 3. Write count
  • 13. Opower @ Hadoop Summit North America 13 Example: Artisanal WordCount » Core business logic: WordCountServiceImpl.countWord() public void countWord(String word) { int wordCount = 0; for(String nextWord : wordCountDAO.getWords()){ if(nextWord.equals(word)) ++wordCount; } WordCountDTO wordCountDTO = new WordCountDTO(word, wordCount); wordCountDAO.writeWordCount(wordCountDTO); }
  • 14. Opower @ Hadoop Summit North America 14 Example: Artisanal WordCount » IoC configuration (Google Guice) public class WordCountGuiceModule extends AbstractModule { ... @Override protected void configure() { bind(WordCountService.class) .to(WordCountServiceImpl.class); bind(WordCountDAO.class) .toInstance(this.wordCountDAO); } }
  • 15. Opower @ Hadoop Summit North America 15 Example: Artisanal WordCount » Artisanal WordCount wiring and execution WordCountDAO wordCountDAO = new ArtisanalWordCountDAO(inFile, outFile); WordCountService wordCountService = Guice.createInjector( new WordCountGuiceModule(wordCountDAO) ).getInstance(WordCountService.class); for(String word : getWordsToCount()) { wordCountService.countWord(word); }
  • 16. Opower @ Hadoop Summit North America 16 Example: Artisanal WordCount artisanalWordCount() { IoC container <<WordCountService>> WordCountServiceImpl <<WordCountDAO>> ArtisanalWord CountDAO wordCountService .countWord(“hat”) } bat cat hat mat hat sat rat …
  • 17. Opower @ Hadoop Summit North America 17 Example: Artisanal WordCount
  • 18. Opower @ Hadoop Summit North America 18 Example: Petabyte WordCount » Indie days are over: petabytes of words! » O(n) won’t cut it » Hadoop to the rescue. You partition by word in your map phase. Your reduce method looks like: public void reduce(Text key, Iterable<IntWritable> values, Context context) » MapReduceWordCountDAO fulfills the WordCountDAO contract (more on this later) » WordCountDTOs are written to an MR context and collected
  • 19. Opower @ Hadoop Summit North America 19 Example: Petabyte WordCount reduce(key, values, context) { IoC container <<WordCountService>> WordCountServiceImpl <<WordCountDAO>> MapReduce WordCountDAO wordCountService .countWord(key.toString()) } bat cat hat mat hat sat cat … bat: <1> cat: <1,1> hat: <1,1>
  • 20. Opower @ Hadoop Summit North America 20 Example: Petabyte WordCount » Petabyte WordCount wiring and execution public void reduce(Text key, Iterable<IntWritable> values, Context ctx){ MapReduceWordCountDAO wordCountDAO = new MapReduceWordCountDAO(key,values,ctx); WordCountService wordCountService = Guice.createInjector( new WordCountGuiceModule(wordCountDAO) ).getInstance(WordCountService.class); wordCountService.countWord(key.toString()); }
  • 21. Opower @ Hadoop Summit North America 21 Example: Petabyte WordCount
  • 22. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  • 23. Opower @ Hadoop Summit North America 23 Opower case study: Bill Projection » Opower in 5 bullet points • We • Help • People like you & me • Reduce • Your energy usage » … by working with utility companies to analyze energy usage and provide actionable insights » One of the ways we do this is via Bill Projection
  • 24. Opower @ Hadoop Summit North America 24 Opower case study: Bill Projection » How it works • Retrieve energy usage (kWh, therms) • Forecast usage • Apply rates to project costs Rate Engine rates $30
  • 25. Opower @ Hadoop Summit North America 25 Opower case study: Bill Projection » DI used to employ the same code components for batch and in-process, synchronous (real-time) calculations Batch M/R calculations In-process calculations web emailsms ivr Bill Projection code components Curated data inputs Results validation
  • 26. Opower @ Hadoop Summit North America 26 Opower case study: Bill Projection Spring IoC container <<BillForecastService>> BillForecastServiceImpl billForecastService .forecast() } HBase map() reduce(key, values, context) { <<UsageDAO>> <<RateEngine>> RateEngineImpl MapReduceDAOMRUsageDAO
  • 27. Opower @ Hadoop Summit North America 27 Opower case study: Bill Projection calculateBillProjection() { Spring IoC container <<BillForecastService>> BillForecastServiceImpl <<UsageDAO>> <<RateEngine>> RateEngineImpl MapReduceDAOHBaseUsageDAO HBase billForecastService .forecast() }
  • 28. Opower @ Hadoop Summit North America 28 Opower case study: Bill Projection » Benefits of DI solution • Were able to use pre-Hadoop Rate Engine code component • Calculations can be applied in batch and/or in real-time • Good test coverage
  • 29. OPOWER CONFIDENTIAL: DO NOT DISTRIBUTE Agenda 1. Problem Statement 2. Solution 3. Example 4. Opower case study 5. Wrap up
  • 30. Opower @ Hadoop Summit North America 30 Wrap up » Dependency Injection + Hadoop gives you • Separation of Concerns • Batch and real-time calculations using the same code » Some limitations • Code is sufficiently componentized • Assumes domain classes can survive MR partitioning • Somebody still has to know MR » Opower employs DI + Hadoop to serve up Bill Projections using a mixed batch + real-time workflow
  • 31. Opower @ Hadoop Summit North America 31 Wrap up » Questions? Eric Chang Technical Lead, Data Services Opower eric@opower.com http://www.linkedin.com/in/ericgchang Artisanal WordCount example: https://github.com/opower/artisanal-word-count

Editor's Notes

  1. Quick public service announcementThere was a small typo on the printed schedule that listed this session as user dependency injection, when in fact I’ll be talking about using dependency injectionSo if you are coming to hear about user dependency injection, I’m sorry to disappoint, but you’re in the wrong room.Maybe if there’s some time at the end, we can brainstorm on what user dependency injection might be.In any case, my name is Eric Chang and I’m the technology lead of the data services team at Opower- We build code infrastructure on top of hadoop and hbase- We’re also practitioners solving problems for Opower’s customers. These are pretty interesting problems that I’ll talk about later on.- As practitioners, we use the tools at hand to solve problems, and in our case that has historically included dependency injection.Now, Dependency injection isn’t a mainline topic at Hadoop Summit and is in fact a pretty well established approachbut I’m hoping to convince you that what’s old can be new again when applied in the right way in your hadoop infrastructure
  2. “hadoop is hard”Of course, I’m being tongue in cheek, but there is a broader point here, namely the principle of separation of concerns.Relying on that principle, I’ll make the bold claim that there are parts of your code whose focus is *not* Hadoop interactions.For these parts of your code, Hadoop should be “hard”/”not my concern” or even better, entirely invisible.The most salient example of this is core application/business logic that should be focused entirely on higher-level business functionality, not Hadoop plumbing.So the challenge posed is: how do we build an effective separation of concerns and deploy Hadoop-agnostic code to our cluster?
  3. Let’s dive a little deeper into the justification for Separation of Concerns to frame this discussionWhy does one need a separation of concerns?While quite a few Hadoop deployments are greenfield from a code perspective, you might find yourself in the position –as we did-- of having to migrate existing code components to Hadoop.Irrespective, of whether your code is brand new or legacy code, separation of concerns via good code componentization enables re-use in interesting ways as we’ll see later in the talkKeeping some parts of your code blissfully ignorant of MapReduce plumbing allows for more focus within your organization: you can have developers with a focus on core business logic who aren’t distracted by Hadoop plumbing and you can have developers with a focus on Hadoop infrastructure who aren’t bogged down in the details of complex business logic.Finally, while there are testing frameworks like MRUnit, by definition, a test that ends at the map() or reduce() method boundary is fairly coarse grained. Code componetization lends itself better to more granular (and lighter weight) testing.
  4. The solution to our separation of concerns problem is dependency inejction.It provides a means by which we can have application logic that doesn’t explicitly configure its dependencies but instead interacts with abstractions (Interfaces).Implementations of these interfaces are managed by an Inversion of Control container such as Spring or Guice. A main application injects appropriate implementations of these interfaces into an Invernsion of Control container, then retrieves and invokes methods on managed code components.
  5. Even if you are already familiar with Dependency Injection, or DI for short, these next two slides help set the groundwork for the rest of the concepts in this talk.Let’s start by illustrating use of code components in a traditional real-time access pattern and then show how they can be adapted to HadoopWe start with our application/business logic which is defined by the business service interface and implmented by the BizServiceImpl code component that you see in the lower right hand cornerThe service interacts with a readDAO interface that describes a means of retrieving domain objects. DAO in this case is shorthand for the Data Access Object design pattern which is used to encapsulate access to an underlying data store.The service also interacts with a writeDAO to save domain object.A disclaimer for those of you familiar with the DAO access pattern: bifurcating read and write operations in to separate interfaces isn’t something we would normally do in production code, but it’s helpful for the purposes of this illustration.The service, along with associated DAOs is managed by an Inversion of Control container.Since this is a real-time context, we’ll inject implementations of both DAOs that are backed by a real-time data store such as HbaseAt runtime, a servlet container processes a user request and delegates to the aRealtimeCallFromTheWeb method which:Requests an instance of the business service managed by the IoC container.Invokes the run() method on the service to execute our core application logicLet’s walk through that again:- our business services interacts with read and write interfaces- we configure our container to return realtime read annd write daos that are backed by a realtime data store- when a realtime request is made, the business service is invoked and pulls the data it needs from the realtime store through these daosAs you can see Dependency Injection allows calling code to interact with interfaces instead of concrete implementations. We’ll see how this works to our advantage on the next slide.
  6. Let’s take the same code components and see how DI can support re-use of code in a Hadoop MapReduce context.Let’s say you’re writing code that executes in the reduce phase of a MapReduceJob. Remember that our application code – represented by BizServiceImpl – is supposed to remain entirely ignorant of Hadoop.One way to think of the values and context parameters in a reduce method are asKeys and values are a data sourcecontext is a data sink and is a front-end to an output format of some sortUsing these generalizations, we can construct a ReadDAO that provides a Data Source for domain objects- Domain objects are constructed from a combination of the keys and values passed in to the reduce method and provided to the ReadDAO during its construction and before it is injected into the IoC containerSimilary, we also inject a ContextBackedWriteDAO that uses a Reducer.Context as data sink for any domain objectsThe reduce method provides the appropriate DAOs to the IoC container at runtime… and then invokes the same method on an instance of the business service managed by the IoC container.Digging a little deeper here, one thing to point out is that all of this is made possible by short-lived IoC containers whose DAOs have a lifetime scoped to the reduce() method call- Compare this to the realtime case where we have a persistent data store like Hbase. In that case, the DAOs are long-lived because their backing store is long-livedBut in the MapReduce case, the data store is the values iterable, which has new content on each subsequent invocation of reduce()It wouldn’t make sense to keep piling values into the same DAO instance, so instead we make it an ephemeral pass-through to the values iterable and discard it after each call to reduce()
  7. To more concretely demonstate how dependency injection can help you with Hadoop, we’ll use the traditional WordCount example and flip it on its head.We’ll follow recent hollywood trends and tell the WordCount origin story before it was a massively parallel, petabyte scale example of how to use Hadoop.Don’t worry if you’re not familiar withWordCount. Just imagine that that there was a time when people used to count words one at a time, in small, artisanal batches.As the disclaimer says, this is somewhat contrived example, but it’s useful for the sake of illustration.
  8. Imagine you live in a borough of NYC and have a beard, and that you’ve built a great business around counting words in small, artisanal batches, in linear time.You get files from your customers and process them, one at a time, using your elegantly simple codeBut you knew you had to scale up at some point. So you compontentized your code as we’ll see in the next slide.
  9. Let’s look at the way code has been decomposed, starting with the domain-specific data transfer object you see here, the WordCountDTO.The transfer object pattern allows us to decouple our business logic from details like storage formats and persistence layer implementation.As you can see here, we are only concerned with capturing two items: the word and the number of occurrences of that word in the input file provided us.Our core application logic is encapsulated by the WordCountService interface and its method countWord().For this example, we’ll have only one implementation of the interface, namely WordCountServiceImpl- retrieves words from a provided WordCountDAO interface by calling getWords()- It then counts wordsAnd writes the results back as WordCountDTOs to the WordCountDAO API by calling writeWordCount()In the Artisanal WordCount case, we’ll be using a ArtisanalWordCountDAO implementation.As its name implies, this DAO, on every call to getWords():opens up the input file provided by your customerscans the file in its entiretyreturns all words to the callermaybe it’s a little smart and does some caching, but in general, it’s a linear time implementation because we’re solving the problem at hand and dealing with small batches
  10. Let’s look at the implementation of WordCountServiceImpl.For every word provided to the countWord() method, it asks the WordCountDAO for words from the word storeFor each match in the returned list, it increments a counterA WordCountDTO is constructed with the results and written back to the WordCountDAOThis class is DAO implementation-agnostic and has no knowledge of backing storage formats: it’s just pure business logic
  11. Here we see how we configure the Inversion of Control container (in this case, Google Guice)Don’t worry if you aren’t familiar with this API, the main takeaway is:We always return (“bind”) a WordCountServiceImpl when an implementor of the WordCountService interface is requestedwe allow for any implementions of WordCountDAO to be injectedThe takeaway here is that the WordCountService implementation returned stays constant, but we can vary the implementation of the WordCounDAO returned
  12. The ArtisanalWordCount main class builds an ArtisanalWordCountDAO from a provided input file and target output file.It injects the ArtisanalWordCountDAO into a Guice moduleIt then asks the Guice module for an implementation of the WordCountServiceWe know that the module will return the WordCountServiceImpl implementation that we just reviewed, with the ArtisanalWordCountDAO injected.Let’s assume there is a getWordsToCount() method that determines which words we’re interested in counting, maybe via command line arguments or some other input file.For each word that we’re supposed to count, we call the countWord method on the service we retrieved.
  13. We represent the Artisanal implementation using our DI illustration method from earlier.The WorldCountServiceImpl service is managed by the IoC container and is injected with an artisanal implementation of the DAO that reads a file one line at a timeThe artisanalWordCount() method is a single-threaded batch process that invokes the service methods on the WordCountService interface to calculate word counts.
  14. Fast forward a bit, and imagine that artisnal days are over and you have customers who want petabytes of words counted.Linear time won’t cut it.It just so happens that your boss used to work at Yahoo and suggests you look into Hadoop.You study MapReduce a bit, and figure out how to partition words in your map phase and come up with the classic WordCount reduce method.You build a MapReduceWordCountDAO that fulfills the WordCountDAO API contract and supports writing of calculated WordCountDTOs back to the MR context to be collected in an TextOutputFormat
  15. So, how does DI gives us the best of both worlds and allow us to keep our small-batch code roots but apply them at scale?As mentioned earlier, we’ll start by partitioning our input files by word and emitting a count for each word found.By the time we’re ready to enter the reduce phase, we have keys which are words and values which are occurrence counts. -- So far, we’re no different than the classic Hadoop WordCount example. Here’s where we fork to employ some code reuse via DI.Let’s break down the parameters to reduce:Key is the wordValues is a list of IntWritables, one for each ocurrence of the wordContext is a text output formatGoing back to principles called out earlier, keys and values are our data source and the context is our data sink.The MapReduceWordCountDAO is constructed with the values, which it can sum up to know how many times the word ocurred in the file.The MapReduceWordCountDAO.getWords() method then merely “echoes” back the word the correct # of times.Additionally, MapReduceWordCountDAO writes a provided WordCountDTO to the Reducer context which is then written out to an TextOutputFormat at job completion.We wire the MapReduceWordCountDAO in to the IoC container, and since it satisfies the WordCountDAO API contract, we can use the same WordCountServiceImpl that we used in the artisanal flow.We call countWord and re-use the same core logic to count word occurrences.
  16. Here’s how it’s all wired upIn the reduce method, we construct an instance of a mapreducewordcountdao using the key, values, and context provided to the reduce methodWe construct a guice module with this daoAnd then ask it for the word count service so we can invoke the countWord business method
  17. Utility companies provide us with smart meter reads as well as a definition of rates to use to calculate costs.We use a forecasting algorithm to project usageWe then apply rates, when provided to us by the utility company, to calculate a projected costThe resulting calculation is made available via multiple channels, including the web as you see here, but also via push channels such as email and sms.Shameless plug: if you are a PGE customer, you can see this by clicking on the My Usage top-level tab, then My Dashboard sub-tabOne item I’d like to draw your attention to on this slideThis is one of the first user-facing features we built on Hadoop at OpowerThe Rate Engine was an existing, fairly complex set of code components that already were in use at Opower in non-Hadoop application stacksWe wanted to be able to preserve existing uses of the Rate Engine outside of HadoopSo the question was: how do we maintain existing legacy use cases of the Rate Engine while integrating in to Hadoop?
  18. Dependency injection lets us employ the same code components in batch and in realtime.This allows us to use the same code in batch workflows for push channels like sms and email and re-use that same code base in realtime channels such as the web.One other thing to highlight here is that this code componentization also gives us a testing story:Because the core components are platform agnosticAnd because we can inject whatever data sources and sinks that we’d likeWe can provided curated test data inputsAnd assert results that are posted to our test data sinksAll this means that we can have a two-pronged integration testing approach:Finer-grained tests using curated inputs and outputs in a realtime contextCoarser-grained tests that test flows of data in a MapReduce container
  19. The entry point to our bill projection calculations is the bill forecast service.It interacts with our Rate Engine, which is the pre-Hadoop code component.Both the Rate Engine and the forecast service access a DAO to retrieve usageAll code components are managed in a Spring IoC conatinerIn a Hadoop/MR context, we map over usage stored in Hbase and collect usage, grouped by customer, in the reduce phaseIn the reduce phase, we construct DAOs are runtime to feed usage to the bill projection components.We rely on the data source/sink principle here and use the values parameter as the data sourceWhat I don’t show here is the DAO that writes to the context as a data sink, which in our case the context is a TableOutputFormat that maps back to HBaseAgain, our application logic in the BillForecastService doesn’t need to know about any of these detailsThese DAOs transparently serve up data to the bill forecast and rate engine code components; no changes to the existing Rate Engine were necessary.The reduce phase then calls forecast() once it has set up the Spring container.
  20. We covered batch calculations; however, depending on the scale of the utility company and the frequency with which we want to update bill projections, we also support real time calculations.In this case, everything stays the same on the right-hand side, but the invoking code is based on a web request which is represented by the calculationBillProjection() method.HBase-backed DAOs are wired in to pull usage from Hbase on a per-customer basis.These DAOs are injected in to the Spring containerThe BillForecastService remains unawares that it’s being invoked in a realtime context, where it’s getting its reads, and where it’s writing its results
  21. Somebody still has to know MRThis is not about turnkey MR; it’s about separation of concernsWe know that we’re not alone in having existing code bases that drive a successful business and need a story for transitioning to hadoop.For those of you who find yourselves in a situation similar to ours, I hope I’ve been able to provide a few insights and maybe given you another approach to consider as you face the challenge of taking something that’s already successful and deploying it at scale on hadoop.
  22. That wraps it up; I’ve posted the source code of the illustrative artisanal word count example to github in case you’re curious. Thanks for your time.Any questions?