SlideShare a Scribd company logo
Design of a DSL by Ruby
 for heavy computations
over map-reduce clusters


     the 37th Grace seminar
         16th June, 2010

          Koichi Fujikawa
     Cirius Technologies, Inc.
Today's Agenda
 Background
 Problem
 Approach
 My Project
 Conclusion
Background
         Where are we in the world?
We Live in the "Big Data" era
 World-wide web page data (Text-only) is expected
 400TB (at one point).
   Some web service company (like Google,
   Yahoo, etc) have to process these data for
   their business, but..
 General HDD can read data in 50MB/sec. This
 means we can take 2000 hours (approx. 100
 days) to read the total web data(400TB) by one
 machine.
 We need the parallel processing / file system.
MapReduce
 MapReduce is one of the parallel skeletons
 Became popular by Google's paper(2004)
 MapReduce has two phases
    Map phase: transform key and value to
    another (key and) value
    Reduce phase: aggregate and calculate
    values by one key
 Each record process by map phase first and
 then by reduce phase
Hadoop
 Hadoop is open source clone of Google
 MapReduce hosted by Apache Foundation
 Big web service provider(Yahoo, Facebook,
 etc) contribute this project actively.
 Large development and user community all
 over the world (including Japan)
    Hadoop conference Japan 2009
    Hadoop source code reading events
Problem
          What issues do we face?
Programming Model
 General programmers, engineers are not
 familiar with this "MapReduce" model, so it is
 too difficult to try and use
    Especially to separate Map and Reduce
 No Effective way of the "pattern of the
 MapRecuce programming" because this
 technology is not mature for the engineers.
 We have to find this individually. It is very
 difficult and time-consuming.
Programming Language
 Hadoop is written in Java language, so the
 programmers need to write Map and Reduce
 procedure in Java.
   Java is strong typed and compile language.
   Some web service engineer don't like these
   language.
 No problem if the code is fixed and
 completed, but I wonder it is suitable for ad-
 hoc prototyping and easy querying.
   MapReduce jobs depend on what users want to
   get, so flexibility is important, I think.
Approach
           How do we resolve it?
Hide complexity of MapReduce
 I found the description for MapReduce could
 be simpler in some specific case (e.g. log
 analysis).
 In this case (but almost all of Hadoop usage is
 now log analysis), it would be nice if
 programmers can write the description without
 taking care of MapReduce!
DSL approach by Ruby
 For this description, I created DSL for each
 specific usage.
   Log analysis DSL is a reference
   implementation which I prepared.
 As DSL runtime environment for Hadoop, I
 chose Ruby and JRuby, which is Ruby
 runtime working on JVM.
   Ruby is very flexible and reusable object-
   oriented language, so very easy to create
   DSL processor.
My project
             What do I do?
Hadoop Papyrus
 DSL framework for Hadoop by JRuby
   We can write log analysis code by
   only several line.
 Open source (Apache Licence) same as
 Hadoop
   Hosted by github
 Distributed by common Ruby archive site
   RubyGems.org
 Supported by IPA mitoh 2009
DEMO
Conclusion
             What is archiving now?
On the way to big challenge
 We need parallel processing method to
 handle massive web-scale data.
 MapReduce and Hadoop is one of good tools,
 but..
   Difficult to describe Map and Reduce
   Irritated to write Java for someone :-)
 Hadoop Papyrus is providing the key!
   Ruby-based DSL framework for Hadoop
   You can write Map and Reduce at once
Questions?
 Thank you very much!
  Twitter ID: @fujibee

More Related Content

What's hot

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
Nathan Bijnens
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
de:code 2017
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter
 
Redis
RedisRedis
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
Uwe Printz
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
Bahadir Cambel
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
univalence
 
Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startups
bmlever
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
Winston Chen
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
Tobias Trelle
 
power-assert, mechanism and philosophy
power-assert, mechanism and philosophypower-assert, mechanism and philosophy
power-assert, mechanism and philosophy
Takuto Wada
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
Jihoon Son
 
Hadoop
HadoopHadoop
Hadoop
Cassell Hsu
 
MongoDB + Java + Spring Data
MongoDB + Java + Spring DataMongoDB + Java + Spring Data
MongoDB + Java + Spring DataAnton Sulzhenko
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
johnynek
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
Hugo Gävert
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Rick Copeland
 
Huangjing renren
Huangjing renrenHuangjing renren
Huangjing renrend0nn9n
 
Presto Overfview
Presto OverfviewPresto Overfview
Presto Overfview
Miguel Ping
 

What's hot (20)

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Redis
RedisRedis
Redis
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startups
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
power-assert, mechanism and philosophy
power-assert, mechanism and philosophypower-assert, mechanism and philosophy
power-assert, mechanism and philosophy
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Hadoop
HadoopHadoop
Hadoop
 
MongoDB + Java + Spring Data
MongoDB + Java + Spring DataMongoDB + Java + Spring Data
MongoDB + Java + Spring Data
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Huangjing renren
Huangjing renrenHuangjing renren
Huangjing renren
 
Presto Overfview
Presto OverfviewPresto Overfview
Presto Overfview
 

Viewers also liked

Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLHadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Koichi Fujikawa
 
Cloud computing competition by Hapyrus
Cloud computing competition by HapyrusCloud computing competition by Hapyrus
Cloud computing competition by Hapyrus
Koichi Fujikawa
 
Technology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms TableTechnology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms Table
Dana Luterman
 
Rakuten tech conf
Rakuten tech confRakuten tech conf
Rakuten tech conf
Koichi Fujikawa
 
GUIAS DE NADAL
GUIAS DE NADALGUIAS DE NADAL
GUIAS DE NADALboello
 
Trends WCM 2010
Trends WCM 2010Trends WCM 2010
Trends WCM 2010
Martijn Hoeijmans
 
クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術
Koichi Fujikawa
 
Tokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusTokyo Webmining #12 Hapyrus
Tokyo Webmining #12 Hapyrus
Koichi Fujikawa
 
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Koichi Fujikawa
 

Viewers also liked (9)

Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLHadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
 
Cloud computing competition by Hapyrus
Cloud computing competition by HapyrusCloud computing competition by Hapyrus
Cloud computing competition by Hapyrus
 
Technology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms TableTechnology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms Table
 
Rakuten tech conf
Rakuten tech confRakuten tech conf
Rakuten tech conf
 
GUIAS DE NADAL
GUIAS DE NADALGUIAS DE NADAL
GUIAS DE NADAL
 
Trends WCM 2010
Trends WCM 2010Trends WCM 2010
Trends WCM 2010
 
クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術
 
Tokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusTokyo Webmining #12 Hapyrus
Tokyo Webmining #12 Hapyrus
 
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
 

Similar to Design of a_dsl_by_ruby_for_heavy_computations

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
Hitendra Kumar
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
samthemonad
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn HadoopSilicon Halton
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
Bhushan Kulkarni
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
amarkayam
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceHadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective Audience
Chandra Sekhar
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
techieguy85
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
Big Data Montreal
 
Blue Ruby SDN Webinar
Blue Ruby SDN WebinarBlue Ruby SDN Webinar
Blue Ruby SDN Webinar
Juergen Schmerder
 
Gluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVMGluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVM
Jeremy Whitlock
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
vishal choudhary
 
B04 06 0918
B04 06 0918B04 06 0918
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
SATOSHI TAGOMORI
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
siddharth30121
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
Phil Young
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
RexRamos9
 

Similar to Design of a_dsl_by_ruby_for_heavy_computations (20)

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Hadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceHadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective Audience
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Blue Ruby SDN Webinar
Blue Ruby SDN WebinarBlue Ruby SDN Webinar
Blue Ruby SDN Webinar
 
Gluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVMGluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVM
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 

Design of a_dsl_by_ruby_for_heavy_computations

  • 1. Design of a DSL by Ruby for heavy computations over map-reduce clusters the 37th Grace seminar 16th June, 2010 Koichi Fujikawa Cirius Technologies, Inc.
  • 2. Today's Agenda Background Problem Approach My Project Conclusion
  • 3. Background Where are we in the world?
  • 4. We Live in the "Big Data" era World-wide web page data (Text-only) is expected 400TB (at one point). Some web service company (like Google, Yahoo, etc) have to process these data for their business, but.. General HDD can read data in 50MB/sec. This means we can take 2000 hours (approx. 100 days) to read the total web data(400TB) by one machine. We need the parallel processing / file system.
  • 5. MapReduce MapReduce is one of the parallel skeletons Became popular by Google's paper(2004) MapReduce has two phases Map phase: transform key and value to another (key and) value Reduce phase: aggregate and calculate values by one key Each record process by map phase first and then by reduce phase
  • 6.
  • 7. Hadoop Hadoop is open source clone of Google MapReduce hosted by Apache Foundation Big web service provider(Yahoo, Facebook, etc) contribute this project actively. Large development and user community all over the world (including Japan) Hadoop conference Japan 2009 Hadoop source code reading events
  • 8. Problem What issues do we face?
  • 9. Programming Model General programmers, engineers are not familiar with this "MapReduce" model, so it is too difficult to try and use Especially to separate Map and Reduce No Effective way of the "pattern of the MapRecuce programming" because this technology is not mature for the engineers. We have to find this individually. It is very difficult and time-consuming.
  • 10. Programming Language Hadoop is written in Java language, so the programmers need to write Map and Reduce procedure in Java. Java is strong typed and compile language. Some web service engineer don't like these language. No problem if the code is fixed and completed, but I wonder it is suitable for ad- hoc prototyping and easy querying. MapReduce jobs depend on what users want to get, so flexibility is important, I think.
  • 11. Approach How do we resolve it?
  • 12. Hide complexity of MapReduce I found the description for MapReduce could be simpler in some specific case (e.g. log analysis). In this case (but almost all of Hadoop usage is now log analysis), it would be nice if programmers can write the description without taking care of MapReduce!
  • 13. DSL approach by Ruby For this description, I created DSL for each specific usage. Log analysis DSL is a reference implementation which I prepared. As DSL runtime environment for Hadoop, I chose Ruby and JRuby, which is Ruby runtime working on JVM. Ruby is very flexible and reusable object- oriented language, so very easy to create DSL processor.
  • 14. My project What do I do?
  • 15. Hadoop Papyrus DSL framework for Hadoop by JRuby We can write log analysis code by only several line. Open source (Apache Licence) same as Hadoop Hosted by github Distributed by common Ruby archive site RubyGems.org Supported by IPA mitoh 2009
  • 16.
  • 17.
  • 18. DEMO
  • 19. Conclusion What is archiving now?
  • 20. On the way to big challenge We need parallel processing method to handle massive web-scale data. MapReduce and Hadoop is one of good tools, but.. Difficult to describe Map and Reduce Irritated to write Java for someone :-) Hadoop Papyrus is providing the key! Ruby-based DSL framework for Hadoop You can write Map and Reduce at once
  • 21. Questions? Thank you very much! Twitter ID: @fujibee