SlideShare a Scribd company logo
1 of 19
MapReduce@DirectI

     amkiray: ramki.g@directi.com
    uvdhray: dhruv.m@directi.com
Lets start with an example…

 access.log
 timestamp,url,response_code,response_time


 products.dat
 date, product_id, price
Requirement:

    Number of requests in the last 30 days.




    $> ls –rt *.log | tail -30 | xargs “wc –l”
Requirement:

    Busiest 30 minutes in last 30 days.





    $> ls –rt   *.log| tail -30 | xargs “./count_30min.sh“
Requirement:

    Number of failed buy requests for products worth

    more than $30 in the last 30 days .

    Import data to an RDBMS;

    SELECT COUNT(*) FROM logs, products
    WHERE GET_REQUEST_TYPE(logs.url)=„BUY‟
    AND
    GET_PRODUCT_ID(logs.url)=products.product_id
    AND product.price>30
    AND DATE(log.timestamp) = products.date;
Now gimme the number of
                  
                      failed buy requests for products
                      worth more than $30 in the
It might take a
                      last 1 Year.
     while!
2 days later….




               On its way!
             Inserting data
            into database…
5 days later…

                 $> mysqladmin processlist
           +-----------------------------------+
           |   Query   |   Copy to Temp Table |
           +-----------------------------------+

                       May be its Joining!!
                     Or may be its dead…
         Or may be my replacement will see the result .. 
Hadoop    god@internal.directi.com
Go use
         bloody!
But Why?!
    Distributed data processing cluster

    A distributed file system

    Data location sensitive Task scheduler

    MapReduce paradigm

    Handles Parallelizable and distributable tasks

    Failover capability

    Web based monitoring capabilities

    More value for your time!

Where to start??

    MapReduce:

    Q: Sum of all squares of a=[1,2,3,4,3,2,7]

    Simple….




    fold( map(a, square()),                          sum())



      You can do that in any functional programming language….
Now do it for an array
   of 100 million
  elements…….
This is where Hadoop comes in..
    Distributed File System : HDFS

    Distributed Computation/Task Processing: Hadoop

    Name Node + Data Node

    Task Tracker + Job Tracker

Task Tracker
How MapReduce Works…
An example
     Word Count


hadoop jar contrib/streaming/hadoop-0.*-streaming.jar
      -jobconf mapred.data.field.separator=quot;,quot;
      -input 'wc.eg.in'
      -output 'wc.eg.out'
      -mapper 'wc -w'
      -reducer quot;awk „{ sum+=$1 } END{ print sum}‟quot;




    Task Tracker: http://cae5.internal.directi.com:50030/jobtracker.jsp
    Name Node: http://cae2.internal.directi.com:50070/dfshealth.jsp
Pig and Pig Latin
    A procedural language for MapReduce operations.


logs = LOAD 'access.logs' USING PigStorage(',') AS
         (ts:int, URL:chararray, resp:chararray, resp_time:int);

products = LOAD 'products.dat' USING PigStorage(',') AS
         (date:int, pid:int, price:int);

l1 = FOREACH logs GENERATE GetDate(ts) as req_date,
         GetProductID(URL) as prod_id,
         GetRequestType(URL) as rtype, resp, resp_time;

j1 = JOIN l1 BY prod_id, products by pid;

j2 = FILTER j1 BY req_date==date AND price>30.0F;

j3 = GROUP j2 ALL;

j4 = FOREACH j3 GENERATE COUNT(j2);

DUMP j4
Q&A

More Related Content

What's hot

Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareAltinity Ltd
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL-Consulting
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Unified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsUnified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsAltinity Ltd
 
Raw system logs processing with hive
Raw system logs processing with hiveRaw system logs processing with hive
Raw system logs processing with hiveArpit Patil
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBGeoffrey Anderson
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionSimon Su
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesAltinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Google App Engine BeCamp 2008
Google App Engine BeCamp 2008Google App Engine BeCamp 2008
Google App Engine BeCamp 2008atonse
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce APITom Croucher
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...Altinity Ltd
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovAltinity Ltd
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doMetehan Çetinkaya
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOAltinity Ltd
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQLKohei KaiGai
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationAlexey Lesovsky
 

What's hot (20)

Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlareClickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
Clickhouse Capacity Planning for OLAP Workloads, Mik Kocikowski of CloudFlare
 
PostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQPostgreSQL Meetup Berlin at Zalando HQ
PostgreSQL Meetup Berlin at Zalando HQ
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Unified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco SystemsUnified Data Platform, by Pauline Yeung of Cisco Systems
Unified Data Platform, by Pauline Yeung of Cisco Systems
 
Raw system logs processing with hive
Raw system logs processing with hiveRaw system logs processing with hive
Raw system logs processing with hive
 
Monitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDBMonitoring MySQL with OpenTSDB
Monitoring MySQL with OpenTSDB
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
GCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow IntroductionGCPUG meetup 201610 - Dataflow Introduction
GCPUG meetup 201610 - Dataflow Introduction
 
Pgcenter overview
Pgcenter overviewPgcenter overview
Pgcenter overview
 
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert HodgesWebinar: Secrets of ClickHouse Query Performance, by Robert Hodges
Webinar: Secrets of ClickHouse Query Performance, by Robert Hodges
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Google App Engine BeCamp 2008
Google App Engine BeCamp 2008Google App Engine BeCamp 2008
Google App Engine BeCamp 2008
 
Upgrading To The New Map Reduce API
Upgrading To The New Map Reduce APIUpgrading To The New Map Reduce API
Upgrading To The New Map Reduce API
 
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
ClickHouse and the Magic of Materialized Views, By Robert Hodges and Altinity...
 
ClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei MilovidovClickHouse Features for Advanced Users, by Aleksei Milovidov
ClickHouse Features for Advanced Users, by Aleksei Milovidov
 
Our Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.doOur Story With ClickHouse at seo.do
Our Story With ClickHouse at seo.do
 
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEOClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
ClickHouse tips and tricks. Webinar slides. By Robert Hodges, Altinity CEO
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 
Troubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming ReplicationTroubleshooting PostgreSQL Streaming Replication
Troubleshooting PostgreSQL Streaming Replication
 
Prashant de-ny-project-s1
Prashant de-ny-project-s1Prashant de-ny-project-s1
Prashant de-ny-project-s1
 

Similar to MapReduce@DirectI

Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbMongoDB APAC
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsMaxim Grinev
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Ilya Haykinson
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop IntroductionSNEHAL MASNE
 
Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Skills Matter
 
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010Skills Matter
 
Dynamic Tracing of your AMP web site
Dynamic Tracing of your AMP web siteDynamic Tracing of your AMP web site
Dynamic Tracing of your AMP web siteSriram Natarajan
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Skills Matter
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From UberChester Chen
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big DataDhafer Malouche
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Sumeet Singh
 
Everything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebEverything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebJames Rakich
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentationJoseph Adler
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDSATOSHI TAGOMORI
 
10 things I learned building Nomad packs
10 things I learned building Nomad packs10 things I learned building Nomad packs
10 things I learned building Nomad packsBram Vogelaar
 

Similar to MapReduce@DirectI (20)

03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Buildingsocialanalyticstoolwithmongodb
BuildingsocialanalyticstoolwithmongodbBuildingsocialanalyticstoolwithmongodb
Buildingsocialanalyticstoolwithmongodb
 
Xadoop - new approaches to data analytics
Xadoop - new approaches to data analyticsXadoop - new approaches to data analytics
Xadoop - new approaches to data analytics
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4
 
Hadoop Introduction
Hadoop IntroductionHadoop Introduction
Hadoop Introduction
 
Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010 Daniel Sikar: Hadoop MapReduce - 06/09/2010
Daniel Sikar: Hadoop MapReduce - 06/09/2010
 
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
Hadoop mapreduce user_group_daniel_sikar_presentation_06.09.2010
 
Dynamic Tracing of your AMP web site
Dynamic Tracing of your AMP web siteDynamic Tracing of your AMP web site
Dynamic Tracing of your AMP web site
 
Gur1009
Gur1009Gur1009
Gur1009
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3Aws Quick Dirty Hadoop Mapreduce Ec2 S3
Aws Quick Dirty Hadoop Mapreduce Ec2 S3
 
Handout3o
Handout3oHandout3o
Handout3o
 
SF Big Analytics meetup : Hoodie From Uber
SF Big Analytics meetup : Hoodie  From UberSF Big Analytics meetup : Hoodie  From Uber
SF Big Analytics meetup : Hoodie From Uber
 
R the unsung hero of Big Data
R the unsung hero of Big DataR the unsung hero of Big Data
R the unsung hero of Big Data
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Everything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the WebEverything is Awesome - Cutting the Corners off the Web
Everything is Awesome - Cutting the Corners off the Web
 
Big data week presentation
Big data week presentationBig data week presentation
Big data week presentation
 
Hive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TDHive dirty/beautiful hacks in TD
Hive dirty/beautiful hacks in TD
 
10 things I learned building Nomad packs
10 things I learned building Nomad packs10 things I learned building Nomad packs
10 things I learned building Nomad packs
 

More from Directi Group

Hr coverage directi 2012
Hr coverage directi 2012Hr coverage directi 2012
Hr coverage directi 2012Directi Group
 
MDI - Mandevian Knights
MDI - Mandevian KnightsMDI - Mandevian Knights
MDI - Mandevian KnightsDirecti Group
 
FMS - Riders on the Storm
FMS - Riders on the StormFMS - Riders on the Storm
FMS - Riders on the StormDirecti Group
 
ISB - Beirut Film Fiesta
ISB - Beirut Film FiestaISB - Beirut Film Fiesta
ISB - Beirut Film FiestaDirecti Group
 
Great Lakes - Synergy
Great Lakes - SynergyGreat Lakes - Synergy
Great Lakes - SynergyDirecti Group
 
Great Lakes - Fabulous Four
Great Lakes - Fabulous FourGreat Lakes - Fabulous Four
Great Lakes - Fabulous FourDirecti Group
 
IIM C - Baker Street
IIM C - Baker StreetIIM C - Baker Street
IIM C - Baker StreetDirecti Group
 
Directi Case Study Contest - Team idate from MDI Gurgaon
Directi Case Study Contest -  Team idate from MDI GurgaonDirecti Case Study Contest -  Team idate from MDI Gurgaon
Directi Case Study Contest - Team idate from MDI GurgaonDirecti Group
 
Directi Case Study Contest - Relationships Matter from ISB Hyderabad
Directi Case Study Contest - Relationships Matter from ISB HyderabadDirecti Case Study Contest - Relationships Matter from ISB Hyderabad
Directi Case Study Contest - Relationships Matter from ISB HyderabadDirecti Group
 
Directi Case Study Contest - Team Goodfellas from ISB Hyderabad
Directi Case Study Contest - Team Goodfellas from ISB HyderabadDirecti Case Study Contest - Team Goodfellas from ISB Hyderabad
Directi Case Study Contest - Team Goodfellas from ISB HyderabadDirecti Group
 
Directi Case Study Contest- Team Joka warriors from IIM C
Directi Case Study Contest- Team Joka warriors from IIM CDirecti Case Study Contest- Team Joka warriors from IIM C
Directi Case Study Contest- Team Joka warriors from IIM CDirecti Group
 
Directi Case Study Contest - Team Alkaline Jazz from IIFT
Directi Case Study Contest - Team Alkaline Jazz from IIFTDirecti Case Study Contest - Team Alkaline Jazz from IIFT
Directi Case Study Contest - Team Alkaline Jazz from IIFTDirecti Group
 
Directi Case Study Contest - Singles 360 by Team Awesome from IIM A
Directi Case Study Contest - Singles 360 by Team Awesome from IIM ADirecti Case Study Contest - Singles 360 by Team Awesome from IIM A
Directi Case Study Contest - Singles 360 by Team Awesome from IIM ADirecti Group
 
Directi On Campus- Engineering Presentation - 2011-2012
Directi On Campus- Engineering Presentation - 2011-2012Directi On Campus- Engineering Presentation - 2011-2012
Directi On Campus- Engineering Presentation - 2011-2012Directi Group
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti Group
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti Group
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti Group
 

More from Directi Group (20)

Hr coverage directi 2012
Hr coverage directi 2012Hr coverage directi 2012
Hr coverage directi 2012
 
IIM L - ConArtists
IIM L - ConArtistsIIM L - ConArtists
IIM L - ConArtists
 
MDI - Mandevian Knights
MDI - Mandevian KnightsMDI - Mandevian Knights
MDI - Mandevian Knights
 
ISB - Pikturewale
ISB - PikturewaleISB - Pikturewale
ISB - Pikturewale
 
FMS - Riders on the Storm
FMS - Riders on the StormFMS - Riders on the Storm
FMS - Riders on the Storm
 
IIM L - Inferno
IIM L - InfernoIIM L - Inferno
IIM L - Inferno
 
ISB - Beirut Film Fiesta
ISB - Beirut Film FiestaISB - Beirut Film Fiesta
ISB - Beirut Film Fiesta
 
Great Lakes - Synergy
Great Lakes - SynergyGreat Lakes - Synergy
Great Lakes - Synergy
 
Great Lakes - Fabulous Four
Great Lakes - Fabulous FourGreat Lakes - Fabulous Four
Great Lakes - Fabulous Four
 
IIM C - Baker Street
IIM C - Baker StreetIIM C - Baker Street
IIM C - Baker Street
 
Directi Case Study Contest - Team idate from MDI Gurgaon
Directi Case Study Contest -  Team idate from MDI GurgaonDirecti Case Study Contest -  Team idate from MDI Gurgaon
Directi Case Study Contest - Team idate from MDI Gurgaon
 
Directi Case Study Contest - Relationships Matter from ISB Hyderabad
Directi Case Study Contest - Relationships Matter from ISB HyderabadDirecti Case Study Contest - Relationships Matter from ISB Hyderabad
Directi Case Study Contest - Relationships Matter from ISB Hyderabad
 
Directi Case Study Contest - Team Goodfellas from ISB Hyderabad
Directi Case Study Contest - Team Goodfellas from ISB HyderabadDirecti Case Study Contest - Team Goodfellas from ISB Hyderabad
Directi Case Study Contest - Team Goodfellas from ISB Hyderabad
 
Directi Case Study Contest- Team Joka warriors from IIM C
Directi Case Study Contest- Team Joka warriors from IIM CDirecti Case Study Contest- Team Joka warriors from IIM C
Directi Case Study Contest- Team Joka warriors from IIM C
 
Directi Case Study Contest - Team Alkaline Jazz from IIFT
Directi Case Study Contest - Team Alkaline Jazz from IIFTDirecti Case Study Contest - Team Alkaline Jazz from IIFT
Directi Case Study Contest - Team Alkaline Jazz from IIFT
 
Directi Case Study Contest - Singles 360 by Team Awesome from IIM A
Directi Case Study Contest - Singles 360 by Team Awesome from IIM ADirecti Case Study Contest - Singles 360 by Team Awesome from IIM A
Directi Case Study Contest - Singles 360 by Team Awesome from IIM A
 
Directi On Campus- Engineering Presentation - 2011-2012
Directi On Campus- Engineering Presentation - 2011-2012Directi On Campus- Engineering Presentation - 2011-2012
Directi On Campus- Engineering Presentation - 2011-2012
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering Presentation
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering Presentation
 
Directi On Campus- Engineering Presentation
Directi On Campus- Engineering PresentationDirecti On Campus- Engineering Presentation
Directi On Campus- Engineering Presentation
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 

MapReduce@DirectI

  • 1. MapReduce@DirectI amkiray: ramki.g@directi.com uvdhray: dhruv.m@directi.com
  • 2. Lets start with an example… access.log timestamp,url,response_code,response_time products.dat date, product_id, price
  • 3. Requirement:  Number of requests in the last 30 days.  $> ls –rt *.log | tail -30 | xargs “wc –l”
  • 4. Requirement:  Busiest 30 minutes in last 30 days.  $> ls –rt *.log| tail -30 | xargs “./count_30min.sh“
  • 5. Requirement:  Number of failed buy requests for products worth  more than $30 in the last 30 days . Import data to an RDBMS; SELECT COUNT(*) FROM logs, products WHERE GET_REQUEST_TYPE(logs.url)=„BUY‟ AND GET_PRODUCT_ID(logs.url)=products.product_id AND product.price>30 AND DATE(log.timestamp) = products.date;
  • 6. Now gimme the number of  failed buy requests for products worth more than $30 in the It might take a last 1 Year. while!
  • 7. 2 days later…. On its way! Inserting data into database…
  • 8. 5 days later… $> mysqladmin processlist +-----------------------------------+ | Query | Copy to Temp Table | +-----------------------------------+ May be its Joining!! Or may be its dead… Or may be my replacement will see the result .. 
  • 9. Hadoop god@internal.directi.com Go use bloody!
  • 10. But Why?! Distributed data processing cluster  A distributed file system  Data location sensitive Task scheduler  MapReduce paradigm  Handles Parallelizable and distributable tasks  Failover capability  Web based monitoring capabilities  More value for your time! 
  • 11. Where to start?? MapReduce:  Q: Sum of all squares of a=[1,2,3,4,3,2,7]  Simple….  fold( map(a, square()), sum()) You can do that in any functional programming language….
  • 12. Now do it for an array of 100 million elements…….
  • 13. This is where Hadoop comes in.. Distributed File System : HDFS  Distributed Computation/Task Processing: Hadoop  Name Node + Data Node  Task Tracker + Job Tracker 
  • 14.
  • 17. An example Word Count  hadoop jar contrib/streaming/hadoop-0.*-streaming.jar -jobconf mapred.data.field.separator=quot;,quot; -input 'wc.eg.in' -output 'wc.eg.out' -mapper 'wc -w' -reducer quot;awk „{ sum+=$1 } END{ print sum}‟quot; Task Tracker: http://cae5.internal.directi.com:50030/jobtracker.jsp Name Node: http://cae2.internal.directi.com:50070/dfshealth.jsp
  • 18. Pig and Pig Latin A procedural language for MapReduce operations.  logs = LOAD 'access.logs' USING PigStorage(',') AS (ts:int, URL:chararray, resp:chararray, resp_time:int); products = LOAD 'products.dat' USING PigStorage(',') AS (date:int, pid:int, price:int); l1 = FOREACH logs GENERATE GetDate(ts) as req_date, GetProductID(URL) as prod_id, GetRequestType(URL) as rtype, resp, resp_time; j1 = JOIN l1 BY prod_id, products by pid; j2 = FILTER j1 BY req_date==date AND price>30.0F; j3 = GROUP j2 ALL; j4 = FOREACH j3 GENERATE COUNT(j2); DUMP j4
  • 19. Q&A