SlideShare a Scribd company logo
1 of 58
Download to read offline
{

    "_id" : ObjectId("4dcd3ebc9278000000005158"),

    "timestamp" : ISODate("2011-05-13T14:22:46.777Z"),

    "binary" : BinData(0,""),

    "string" : "abc",

    "number" : 3,

    "subobj" : {"subA": 1, "subB": 2 },

    "array" : [1, 2, 3],

    "dbref" : [_id1, _id2, _id3]

                        padding

}
{   db.coll.find({"string": "abc"});
db.coll.find({ "string" : /^a.*$/i });
    "_id" : ObjectId("4dcd3ebc9278000000005158"),

    "timestamp" : ISODate("2011-05-13T14:22:46.777Z"),
                   db.coll.find({"subobj.subA": 1});
                db.coll.find({"subobj.subB": {$exists: true} });
    "binary" : BinData(0,""),

    "string" : "abc",              db.coll.find({"number": 3});
                               db.coll.find({"number": {$gt: 1}});
    "number" : 3,

    "subobj" : {"subA": 1, "subB": 2 },

    "array" : [1, 2, 3],
                         db.coll.find({"array": {$all:[1, 2]} });
    "dbref" : [_id1, _id2, _id3]
                        db.coll.find({"array": {$in:[2, 4, 6]} });
                             padding

}
{

    "_id" : ObjectId("4dcd3ebc9278000000005158"),

    "timestamp" : ISODate("2011-05-13T14:22:46.777Z"),
          { $set : {"string": "def"} }

    "binary" : BinData(0,""), { $inc : {"number": 1} }

    "string" : "def",
                          { $pull : {"subobj": {"subB": 2 } } }
    "number" : 4,

    "subobj" : {"subA": 1, "subB": 2 },

    "array" : [1, 2, 3, 4, 5, 6],

    "dbref"$addToSet : { "array" : { $each : [ 4 , 5 , 6 ] } } }
         { : [_id1, _id2, _id3]


    "newkey" : "In-place"

}                              { $set : {"newkey": "In-place"} }
ScientificPython
def mapper(key, value):
   for word in value.split(): yield word,1
def reducer(key, values):
   yield key,sum(values)
if __name__ == "__main__":
   import dumbo
   dumbo.run(mapper, reducer)



dumbo start wordcount.py 
       -hadoop /path/to/hadoop 
       -input wc_input.txt 
       -output wc_output
[2011-07-01 12:01:48,447]
db.collection.insert(
 {hour:0,
   userId:”1234”,
   actionType:”login”,}
);
m = function(){
     this.tags.forEach{
          function(z) {
              emit(z, {count: 1});
          }
     };
};
r = function(key, values) {
     var total=0;
     for (i=0, i<values.length, i++)
          total += values[i].count;
     return { count : total };
}
res=db.things.mapReduce(m,!r);
#                              finalize
Examples
                Conclusions and Future Work


 Party Solutions




                                    Motivation
                                  Architecture
                                    Examples
                  Conclusions and Future Work


ummary of Features


 Hadoop-based: same limitations as Streaming (Dumbo) and
                       Streaming Jython Pydoop
 Jython (Happy), except for ease of use
           C/C++ Ext       Yes       No     Yes
 Other implementations: good if you have your own cluster
         Standard Lib      Full     Partial   Full
     Hadoop is the most widespread implementation
            MR API         No*       Full   Partial
           Java-like FW                   No           Yes            Yes
               HDFS                       No
                                  Leo, Zanetti
                                                       Yes            Yes
                                                 Pydoop: a Python MapReduce and HDFS API for Hadoop



     (*) you can only write the map and reduce parts as executable scripts.
Motivation
                               Architecture
                                 Examples
               Conclusions and Future Work


Hadoop Pipes



                                                      Communication with Java
                                                      framework via persistent
                                                      sockets
                                                      The C++ app provides a
                                                      factory used by the framework
                                                      to create MR components
                                                      Providing Mapper and
                                                      Reducer is mandatory




                               Leo, Zanetti   Pydoop: a Python MapReduce and HDFS API for Hadoop
Motivation
                              Architecture
                                Examples
              Conclusions and Future Work


Integration of Pydoop with C++


                                             Integration with Pipes:
                                                 Method calls flow from the
                                                 framework through the C++ and the
                                                 Pydoop API, ultimately reaching
                                                 user-defined methods
                                                 Results are wrapped by Boost and
                                                 returned to the framework
                                             Integration with HDFS:
                                                 Function calls initiated by Pydoop
                                                 Results wrapped and returned as
                                                 Python objects to the app
gawk '
 BEGIN{ reducenum='$REDUCE_NUM'; }
  { userid=$7; key=$8; }
  key ~ /a{GetLoginBonus}/ { incrby(userid,key,$9,a); next;}
  key ~ /a{SideJob}/          { incrby(userid,key,$11,a); next;}
  key ~ /a{CleanMyShop}/      { hincr(userid,key,$9,a); next; }
  key ~ /(GetAvatarPart|ChangeP|ChangeWakuwakuP|ChangeKonergy)/
                                { incrbydiff(userid,key,$9,a); next; }
 ...‘ $IN

# for reducer1 (such as “userid % reducenum == 0”)
# command userid key value
MULTI
HINCRBY 1111 a{ChangeGreed} 3
HINCRBY 1111 a{GianEvent} 7
HINCRBY 1111 a{TeamChallenge} 5
HINCRBY 2222 a{Battle} 3
HINCRBY 2222 a{ChangeMoney} 3
...
EXEC
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model

More Related Content

What's hot

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basicHafizur Rahman
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answerstechieguy85
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in SparkShiao-An Yuan
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersRahul Jain
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxData
 
Implementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopImplementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopLamprini Koutsokera
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questionsKalyan Hadoop
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answersKalyan Hadoop
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduceUday Vakalapudi
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud appsDavid Cunningham
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 

What's hot (20)

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
Hadoop operations basic
Hadoop operations basicHadoop operations basic
Hadoop operations basic
 
mesos-devoxx14
mesos-devoxx14mesos-devoxx14
mesos-devoxx14
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 
Debugging & Tuning in Spark
Debugging & Tuning in SparkDebugging & Tuning in Spark
Debugging & Tuning in Spark
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
 
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optim...
 
Implementation of k means algorithm on Hadoop
Implementation of k means algorithm on HadoopImplementation of k means algorithm on Hadoop
Implementation of k means algorithm on Hadoop
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Hadoop Interview Questions and Answers
Hadoop Interview Questions and AnswersHadoop Interview Questions and Answers
Hadoop Interview Questions and Answers
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
Scala+data
Scala+dataScala+data
Scala+data
 
Hadoop - Introduction to mapreduce
Hadoop -  Introduction to mapreduceHadoop -  Introduction to mapreduce
Hadoop - Introduction to mapreduce
 
Big data interview questions and answers
Big data interview questions and answersBig data interview questions and answers
Big data interview questions and answers
 
Introduction to HDFS and MapReduce
Introduction to HDFS and MapReduceIntroduction to HDFS and MapReduce
Introduction to HDFS and MapReduce
 
Easy deployment & management of cloud apps
Easy deployment & management of cloud appsEasy deployment & management of cloud apps
Easy deployment & management of cloud apps
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 

Viewers also liked

Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduceRyan Tabora
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJSKyung Yeol Kim
 
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기KTH, 케이티하이텔
 
Meteor 0.3.6 Preview
Meteor 0.3.6 PreviewMeteor 0.3.6 Preview
Meteor 0.3.6 PreviewJuntai Park
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011rogerbodamer
 
React in Native Apps - Meetup React - 20150409
React in Native Apps - Meetup React - 20150409React in Native Apps - Meetup React - 20150409
React in Native Apps - Meetup React - 20150409Minko3D
 
Ionic adventures - Hybrid Mobile App Development rocks
Ionic adventures - Hybrid Mobile App Development rocksIonic adventures - Hybrid Mobile App Development rocks
Ionic adventures - Hybrid Mobile App Development rocksJuarez Filho
 
React JS and why it's awesome
React JS and why it's awesomeReact JS and why it's awesome
React JS and why it's awesomeAndrew Hull
 
System webpack-jspm
System webpack-jspmSystem webpack-jspm
System webpack-jspmJesse Warden
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB
 
Mongo db and hadoop driving business insights - final
Mongo db and hadoop   driving business insights - finalMongo db and hadoop   driving business insights - final
Mongo db and hadoop driving business insights - finalMongoDB
 
Groovy overview, DSLs and ecosystem - Mars JUG - 2010
Groovy overview, DSLs and ecosystem - Mars JUG - 2010Groovy overview, DSLs and ecosystem - Mars JUG - 2010
Groovy overview, DSLs and ecosystem - Mars JUG - 2010Guillaume Laforge
 
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGroovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGuillaume Laforge
 
groovy DSLs from beginner to expert
groovy DSLs from beginner to expertgroovy DSLs from beginner to expert
groovy DSLs from beginner to expertPaul King
 
Sharding
ShardingSharding
ShardingMongoDB
 
Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기JangHyuk You
 

Viewers also liked (20)

Intro to HDFS and MapReduce
Intro to HDFS and MapReduceIntro to HDFS and MapReduce
Intro to HDFS and MapReduce
 
Compose Async with RxJS
Compose Async with RxJSCompose Async with RxJS
Compose Async with RxJS
 
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
[H3 2012] 내컴에선 잘되던데? - vagrant로 서버와 동일한 개발환경 꾸미기
 
Meteor 0.3.6 Preview
Meteor 0.3.6 PreviewMeteor 0.3.6 Preview
Meteor 0.3.6 Preview
 
Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011Mongo Web Apps: OSCON 2011
Mongo Web Apps: OSCON 2011
 
React in Native Apps - Meetup React - 20150409
React in Native Apps - Meetup React - 20150409React in Native Apps - Meetup React - 20150409
React in Native Apps - Meetup React - 20150409
 
Ionic adventures - Hybrid Mobile App Development rocks
Ionic adventures - Hybrid Mobile App Development rocksIonic adventures - Hybrid Mobile App Development rocks
Ionic adventures - Hybrid Mobile App Development rocks
 
React JS and why it's awesome
React JS and why it's awesomeReact JS and why it's awesome
React JS and why it's awesome
 
Grid FS
Grid FSGrid FS
Grid FS
 
Angular2 ecosystem
Angular2 ecosystemAngular2 ecosystem
Angular2 ecosystem
 
System webpack-jspm
System webpack-jspmSystem webpack-jspm
System webpack-jspm
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop ConnectorMongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
MongoDB Days Silicon Valley: MongoDB and the Hadoop Connector
 
Mongo db and hadoop driving business insights - final
Mongo db and hadoop   driving business insights - finalMongo db and hadoop   driving business insights - final
Mongo db and hadoop driving business insights - final
 
Groovy overview, DSLs and ecosystem - Mars JUG - 2010
Groovy overview, DSLs and ecosystem - Mars JUG - 2010Groovy overview, DSLs and ecosystem - Mars JUG - 2010
Groovy overview, DSLs and ecosystem - Mars JUG - 2010
 
Hadoop to spark-v2
Hadoop to spark-v2Hadoop to spark-v2
Hadoop to spark-v2
 
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume LaforgeGroovy Ecosystem - JFokus 2011 - Guillaume Laforge
Groovy Ecosystem - JFokus 2011 - Guillaume Laforge
 
groovy DSLs from beginner to expert
groovy DSLs from beginner to expertgroovy DSLs from beginner to expert
groovy DSLs from beginner to expert
 
Sharding
ShardingSharding
Sharding
 
Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기Mongo DB 완벽가이드 - 4장 쿼리하기
Mongo DB 완벽가이드 - 4장 쿼리하기
 

Similar to MongoDB & Hadoop: Flexible Hourly Batch Processing Model

Introduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training WebinarIntroduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training WebinarCloudera, Inc.
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training courseKamal A
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKSkills Matter
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Søren Lund
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfishFei Dong
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBigData_Europe
 
Serverless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersServerless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersSashko Stubailo
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloudwesley chun
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsLynn Langit
 
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkDatascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkSequelGate
 

Similar to MongoDB & Hadoop: Flexible Hourly Batch Processing Model (20)

January 2011 HUG: Pig Presentation
January 2011 HUG: Pig PresentationJanuary 2011 HUG: Pig Presentation
January 2011 HUG: Pig Presentation
 
Introduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training WebinarIntroduction to Hadoop Developer Training Webinar
Introduction to Hadoop Developer Training Webinar
 
mapReduce.pptx
mapReduce.pptxmapReduce.pptx
mapReduce.pptx
 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
 
Hadoop online training course
Hadoop online  training courseHadoop online  training course
Hadoop online training course
 
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UKIntroduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
Introduction to Sqoop Aaron Kimball Cloudera Hadoop User Group UK
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerank
 
Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)Playing with Hadoop (NPW2013)
Playing with Hadoop (NPW2013)
 
Cascading on starfish
Cascading on starfishCascading on starfish
Cascading on starfish
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data ApplicationsBig Data Europe: Simplifying Development and Deployment of Big Data Applications
Big Data Europe: Simplifying Development and Deployment of Big Data Applications
 
Serverless GraphQL for Product Developers
Serverless GraphQL for Product DevelopersServerless GraphQL for Product Developers
Serverless GraphQL for Product Developers
 
Introduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google CloudIntroduction to Cloud Computing with Google Cloud
Introduction to Cloud Computing with Google Cloud
 
Data Science
Data ScienceData Science
Data Science
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Hadoop MapReduce Fundamentals
Hadoop MapReduce FundamentalsHadoop MapReduce Fundamentals
Hadoop MapReduce Fundamentals
 
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, SparkDatascience Training with Hadoop, Python Machine Learning & Scala, Spark
Datascience Training with Hadoop, Python Machine Learning & Scala, Spark
 

More from Takahiro Inoue

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTakahiro Inoue
 
トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングTakahiro Inoue
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Takahiro Inoue
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するTakahiro Inoue
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューションTakahiro Inoue
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方Takahiro Inoue
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータTakahiro Inoue
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612Takahiro Inoue
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)Takahiro Inoue
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜Takahiro Inoue
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Takahiro Inoue
 
Hadoop and the Data Scientist
Hadoop and the Data ScientistHadoop and the Data Scientist
Hadoop and the Data ScientistTakahiro Inoue
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big DataTakahiro Inoue
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsTakahiro Inoue
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to TinkerpopTakahiro Inoue
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4jTakahiro Inoue
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDBTakahiro Inoue
 
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Takahiro Inoue
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Takahiro Inoue
 

More from Takahiro Inoue (20)

Treasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC DemoTreasure Data × Wave Analytics EC Demo
Treasure Data × Wave Analytics EC Demo
 
トレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティングトレジャーデータとtableau実現する自動レポーティング
トレジャーデータとtableau実現する自動レポーティング
 
Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界Tableauが魅せる Data Visualization の世界
Tableauが魅せる Data Visualization の世界
 
トレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解するトレジャーデータのバッチクエリとアドホッククエリを理解する
トレジャーデータのバッチクエリとアドホッククエリを理解する
 
20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション20140708 オンラインゲームソリューション
20140708 オンラインゲームソリューション
 
トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方トレジャーデータ流,データ分析の始め方
トレジャーデータ流,データ分析の始め方
 
オンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータオンラインゲームソリューション@トレジャーデータ
オンラインゲームソリューション@トレジャーデータ
 
事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612事例で学ぶトレジャーデータ 20140612
事例で学ぶトレジャーデータ 20140612
 
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)トレジャーデータ株式会社について(for all Data_Enthusiast!!)
トレジャーデータ株式会社について(for all Data_Enthusiast!!)
 
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜この Visualization がすごい2014 〜データ世界を彩るツール6選〜
この Visualization がすごい2014 〜データ世界を彩るツール6選〜
 
Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!Treasure Data Intro for Data Enthusiast!!
Treasure Data Intro for Data Enthusiast!!
 
Hadoop and the Data Scientist
Hadoop and the Data ScientistHadoop and the Data Scientist
Hadoop and the Data Scientist
 
MongoDB: Intro & Application for Big Data
MongoDB: Intro & Application  for Big DataMongoDB: Intro & Application  for Big Data
MongoDB: Intro & Application for Big Data
 
An Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB PluginsAn Introduction to Fluent & MongoDB Plugins
An Introduction to Fluent & MongoDB Plugins
 
An Introduction to Tinkerpop
An Introduction to TinkerpopAn Introduction to Tinkerpop
An Introduction to Tinkerpop
 
An Introduction to Neo4j
An Introduction to Neo4jAn Introduction to Neo4j
An Introduction to Neo4j
 
The Definition of GraphDB
The Definition of GraphDBThe Definition of GraphDB
The Definition of GraphDB
 
Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)Large-Scale Graph Processing〜Introduction〜(完全版)
Large-Scale Graph Processing〜Introduction〜(完全版)
 
Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)Large-Scale Graph Processing〜Introduction〜(LT版)
Large-Scale Graph Processing〜Introduction〜(LT版)
 
Advanced MongoDB #1
Advanced MongoDB #1Advanced MongoDB #1
Advanced MongoDB #1
 

Recently uploaded

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

MongoDB & Hadoop: Flexible Hourly Batch Processing Model

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. { "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), "binary" : BinData(0,""), "string" : "abc", "number" : 3, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3], "dbref" : [_id1, _id2, _id3] padding }
  • 7. { db.coll.find({"string": "abc"}); db.coll.find({ "string" : /^a.*$/i }); "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), db.coll.find({"subobj.subA": 1}); db.coll.find({"subobj.subB": {$exists: true} }); "binary" : BinData(0,""), "string" : "abc", db.coll.find({"number": 3}); db.coll.find({"number": {$gt: 1}}); "number" : 3, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3], db.coll.find({"array": {$all:[1, 2]} }); "dbref" : [_id1, _id2, _id3] db.coll.find({"array": {$in:[2, 4, 6]} }); padding }
  • 8. { "_id" : ObjectId("4dcd3ebc9278000000005158"), "timestamp" : ISODate("2011-05-13T14:22:46.777Z"), { $set : {"string": "def"} } "binary" : BinData(0,""), { $inc : {"number": 1} } "string" : "def", { $pull : {"subobj": {"subB": 2 } } } "number" : 4, "subobj" : {"subA": 1, "subB": 2 }, "array" : [1, 2, 3, 4, 5, 6], "dbref"$addToSet : { "array" : { $each : [ 4 , 5 , 6 ] } } } { : [_id1, _id2, _id3] "newkey" : "In-place" } { $set : {"newkey": "In-place"} }
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21. def mapper(key, value): for word in value.split(): yield word,1 def reducer(key, values): yield key,sum(values) if __name__ == "__main__": import dumbo dumbo.run(mapper, reducer) dumbo start wordcount.py -hadoop /path/to/hadoop -input wc_input.txt -output wc_output
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. db.collection.insert( {hour:0, userId:”1234”, actionType:”login”,} );
  • 41.
  • 42. m = function(){ this.tags.forEach{ function(z) { emit(z, {count: 1}); } }; }; r = function(key, values) { var total=0; for (i=0, i<values.length, i++) total += values[i].count; return { count : total }; } res=db.things.mapReduce(m,!r); # finalize
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. Examples Conclusions and Future Work Party Solutions Motivation Architecture Examples Conclusions and Future Work ummary of Features Hadoop-based: same limitations as Streaming (Dumbo) and Streaming Jython Pydoop Jython (Happy), except for ease of use C/C++ Ext Yes No Yes Other implementations: good if you have your own cluster Standard Lib Full Partial Full Hadoop is the most widespread implementation MR API No* Full Partial Java-like FW No Yes Yes HDFS No Leo, Zanetti Yes Yes Pydoop: a Python MapReduce and HDFS API for Hadoop (*) you can only write the map and reduce parts as executable scripts.
  • 50. Motivation Architecture Examples Conclusions and Future Work Hadoop Pipes Communication with Java framework via persistent sockets The C++ app provides a factory used by the framework to create MR components Providing Mapper and Reducer is mandatory Leo, Zanetti Pydoop: a Python MapReduce and HDFS API for Hadoop
  • 51. Motivation Architecture Examples Conclusions and Future Work Integration of Pydoop with C++ Integration with Pipes: Method calls flow from the framework through the C++ and the Pydoop API, ultimately reaching user-defined methods Results are wrapped by Boost and returned to the framework Integration with HDFS: Function calls initiated by Pydoop Results wrapped and returned as Python objects to the app
  • 52.
  • 53.
  • 54.
  • 55. gawk ' BEGIN{ reducenum='$REDUCE_NUM'; } { userid=$7; key=$8; } key ~ /a{GetLoginBonus}/ { incrby(userid,key,$9,a); next;} key ~ /a{SideJob}/ { incrby(userid,key,$11,a); next;} key ~ /a{CleanMyShop}/ { hincr(userid,key,$9,a); next; } key ~ /(GetAvatarPart|ChangeP|ChangeWakuwakuP|ChangeKonergy)/ { incrbydiff(userid,key,$9,a); next; } ...‘ $IN # for reducer1 (such as “userid % reducenum == 0”) # command userid key value MULTI HINCRBY 1111 a{ChangeGreed} 3 HINCRBY 1111 a{GianEvent} 7 HINCRBY 1111 a{TeamChallenge} 5 HINCRBY 2222 a{Battle} 3 HINCRBY 2222 a{ChangeMoney} 3 ... EXEC