SlideShare a Scribd company logo

Spark at-hackthon8jan2014

1 of 30
Download to read offline
Spark @ Hackathon
Madhukara Phatak
Zinnia Systems
@madhukaraphatak

1
Hackathon

“ Hackathon is a programming ritual ,
usually done in night, where
programmers solve problems overnight
which may take years of time otherwise”
-Anonymous

2
Where / When
Dec 6 and 7 th , 2013
At BigData Conclave
Hosted by Flutura
Solving real world problem(s) in 24 hours
No restriction on tools to be used
Deliverables : Results, Working code and Visualization

3
What?
Predict the global energy demand for next year using the
energy usage data available for last four years, in order to
enable utility companies , effectively handle the energy
demand.
Every minute usage data collected from smart meters
From 2008- 2012
Around 133 mb uncompressed
2,75,000 records
Around 25,000 missing data records

4
Questions to be answered
What would be the energy consumption for the next day?
What would be week wise Energy consumption for the next
one year?
What would be household’s peak time load (Peak time is
between 7 AM to 10 AM) for the next month.
During Weekdays
During Weekends
Assuming there was full day of outage, Calculate the
Revenue Loss for a particular day next year by finding the
Average Revenue per day (ARPD) of the household using
given tariff plan
Can you identify the device usage patterns?
5
Who
Four member team from Zinnia Systems
Chose Spark/Scala for solving these problems
Every one except me ,were new to Scala and Spark
Able to solve the problems on time
Won first prize at Hackathon

6
Ad

Recommended

Behm Shah Pagerank
Behm Shah PagerankBehm Shah Pagerank
Behm Shah Pagerankgothicane
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
 
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)
PyCascading for Intuitive Flow Processing with Hadoop (gabor szabo)PyData
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReducePietro Michiardi
 

More Related Content

What's hot

Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Map Reduce
Map ReduceMap Reduce
Map Reduceschapht
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech ProjectsJody Garnett
 
How LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentHow LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentSasha Ovsankin
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduceDavid Gleich
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovSpark Summit
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...MLconf
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Databricks
 
Introduction to MapReduce Data Transformations
Introduction to MapReduce Data TransformationsIntroduction to MapReduce Data Transformations
Introduction to MapReduce Data Transformationsswooledge
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed DatasetsGabriele Modena
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoopishan0019
 

What's hot (20)

Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Unit 2 part-2
Unit 2 part-2Unit 2 part-2
Unit 2 part-2
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
How LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product DevelopmentHow LinkedIn Uses Scalding for Data Driven Product Development
How LinkedIn Uses Scalding for Data Driven Product Development
 
Sparse matrix computations in MapReduce
Sparse matrix computations in MapReduceSparse matrix computations in MapReduce
Sparse matrix computations in MapReduce
 
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander UlanovA Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
A Scaleable Implementation of Deep Learning on Spark -Alexander Ulanov
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
Ehtsham Elahi, Senior Research Engineer, Personalization Science and Engineer...
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
 
Introduction to MapReduce Data Transformations
Introduction to MapReduce Data TransformationsIntroduction to MapReduce Data Transformations
Introduction to MapReduce Data Transformations
 
Resilient Distributed Datasets
Resilient Distributed DatasetsResilient Distributed Datasets
Resilient Distributed Datasets
 
Map reduce in Hadoop
Map reduce in HadoopMap reduce in Hadoop
Map reduce in Hadoop
 
Tutorial5
Tutorial5Tutorial5
Tutorial5
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 

Similar to Spark at-hackthon8jan2014

Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkVincent Poncet
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdfMaheshPandit16
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsDatabricks
 
Stefano Baghino - From Big Data to Fast Data: Apache Spark
Stefano Baghino - From Big Data to Fast Data: Apache SparkStefano Baghino - From Big Data to Fast Data: Apache Spark
Stefano Baghino - From Big Data to Fast Data: Apache SparkCodemotion
 
Big Data Analytics with Apache Spark
Big Data Analytics with Apache SparkBig Data Analytics with Apache Spark
Big Data Analytics with Apache SparkMarcoYuriFujiiMelo
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZDataFactZ
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25thSneha Challa
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKMatt Stubbs
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkVince Gonzalez
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopQuantUniversity
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Updatevithakur
 
Briefing on the Modern ML Stack with R
 Briefing on the Modern ML Stack with R Briefing on the Modern ML Stack with R
Briefing on the Modern ML Stack with RDatabricks
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Databricks
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesStratio
 

Similar to Spark at-hackthon8jan2014 (20)

Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Spark Introduction.pdf
Apache Spark Introduction.pdfApache Spark Introduction.pdf
Apache Spark Introduction.pdf
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Stefano Baghino - From Big Data to Fast Data: Apache Spark
Stefano Baghino - From Big Data to Fast Data: Apache SparkStefano Baghino - From Big Data to Fast Data: Apache Spark
Stefano Baghino - From Big Data to Fast Data: Apache Spark
 
Big Data Analytics with Apache Spark
Big Data Analytics with Apache SparkBig Data Analytics with Apache Spark
Big Data Analytics with Apache Spark
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Introduction to Spark - DataFactZ
Introduction to Spark - DataFactZIntroduction to Spark - DataFactZ
Introduction to Spark - DataFactZ
 
Apache spark sneha challa- google pittsburgh-aug 25th
Apache spark  sneha challa- google pittsburgh-aug 25thApache spark  sneha challa- google pittsburgh-aug 25th
Apache spark sneha challa- google pittsburgh-aug 25th
 
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARKBig Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
Big Data LDN 2018: PROJECT HYDROGEN: UNIFYING AI WITH APACHE SPARK
 
Cleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - SparkCleveland Hadoop Users Group - Spark
Cleveland Hadoop Users Group - Spark
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Scala+data
Scala+dataScala+data
Scala+data
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
Briefing on the Modern ML Stack with R
 Briefing on the Modern ML Stack with R Briefing on the Modern ML Stack with R
Briefing on the Modern ML Stack with R
 
Spark 101
Spark 101Spark 101
Spark 101
 
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
Spark's Role in the Big Data Ecosystem (Spark Summit 2014)
 
As simple as Apache Spark
As simple as Apache SparkAs simple as Apache Spark
As simple as Apache Spark
 
Multiplaform Solution for Graph Datasources
Multiplaform Solution for Graph DatasourcesMultiplaform Solution for Graph Datasources
Multiplaform Solution for Graph Datasources
 

Recently uploaded

Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLMemory Fabric Forum
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxMemory Fabric Forum
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Memory Fabric Forum
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17Ana-Maria Mihalceanu
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIMemory Fabric Forum
 
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
Zi-Stick UBS Dongle ZIgbee from  Aeotec manualZi-Stick UBS Dongle ZIgbee from  Aeotec manual
Zi-Stick UBS Dongle ZIgbee from Aeotec manualDomotica daVinci
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stackSummit
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!KivenRaySarsaba
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, GoogleISPMAIndia
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch TuesdayIvanti
 
Microsoft Azure News - Feb 2024
Microsoft Azure News - Feb 2024Microsoft Azure News - Feb 2024
Microsoft Azure News - Feb 2024Daniel Toomey
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringMassimo Talia
 
Bringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxBringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxMaarten Balliauw
 
Enhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersEnhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersThousandEyes
 
My self introduction to know others abut me
My self  introduction to know others abut meMy self  introduction to know others abut me
My self introduction to know others abut meManoj Prabakar B
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaISPMAIndia
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfkatalinjordans1
 

Recently uploaded (20)

5 Tech Trend to Notice in ESG Landscape- 47Billion
5 Tech Trend to Notice in ESG Landscape- 47Billion5 Tech Trend to Notice in ESG Landscape- 47Billion
5 Tech Trend to Notice in ESG Landscape- 47Billion
 
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXLQ1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
Q1 Memory Fabric Forum: Memory Processor Interface 2023, Focus on CXL
 
H3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptxH3 Platform CXL Solution_Memory Fabric Forum.pptx
H3 Platform CXL Solution_Memory Fabric Forum.pptx
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17Enhancing Productivity and Insight  A Tour of JDK Tools Progress Beyond Java 17
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
 
Q1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AIQ1 Memory Fabric Forum: XConn CXL Switches for AI
Q1 Memory Fabric Forum: XConn CXL Switches for AI
 
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
Zi-Stick UBS Dongle ZIgbee from  Aeotec manualZi-Stick UBS Dongle ZIgbee from  Aeotec manual
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
 
How we think about an advisor tech stack
How we think about an advisor tech stackHow we think about an advisor tech stack
How we think about an advisor tech stack
 
My sample product research idea for you!
My sample product research idea for you!My sample product research idea for you!
My sample product research idea for you!
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch Tuesday
 
Microsoft Azure News - Feb 2024
Microsoft Azure News - Feb 2024Microsoft Azure News - Feb 2024
Microsoft Azure News - Feb 2024
 
Dynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineeringDynamical systems simulation in Python for science and engineering
Dynamical systems simulation in Python for science and engineering
 
Bringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxBringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptx
 
Enhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersEnhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for Partners
 
My self introduction to know others abut me
My self  introduction to know others abut meMy self  introduction to know others abut me
My self introduction to know others abut me
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish GuptaBuilding Products That Think- Bhaskaran Srinivasan & Ashish Gupta
Building Products That Think- Bhaskaran Srinivasan & Ashish Gupta
 
Power of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdfPower of 2024 - WITforce Odyssey.pptx.pdf
Power of 2024 - WITforce Odyssey.pptx.pdf
 

Spark at-hackthon8jan2014

  • 1. Spark @ Hackathon Madhukara Phatak Zinnia Systems @madhukaraphatak 1
  • 2. Hackathon “ Hackathon is a programming ritual , usually done in night, where programmers solve problems overnight which may take years of time otherwise” -Anonymous 2
  • 3. Where / When Dec 6 and 7 th , 2013 At BigData Conclave Hosted by Flutura Solving real world problem(s) in 24 hours No restriction on tools to be used Deliverables : Results, Working code and Visualization 3
  • 4. What? Predict the global energy demand for next year using the energy usage data available for last four years, in order to enable utility companies , effectively handle the energy demand. Every minute usage data collected from smart meters From 2008- 2012 Around 133 mb uncompressed 2,75,000 records Around 25,000 missing data records 4
  • 5. Questions to be answered What would be the energy consumption for the next day? What would be week wise Energy consumption for the next one year? What would be household’s peak time load (Peak time is between 7 AM to 10 AM) for the next month. During Weekdays During Weekends Assuming there was full day of outage, Calculate the Revenue Loss for a particular day next year by finding the Average Revenue per day (ARPD) of the household using given tariff plan Can you identify the device usage patterns? 5
  • 6. Who Four member team from Zinnia Systems Chose Spark/Scala for solving these problems Every one except me ,were new to Scala and Spark Able to solve the problems on time Won first prize at Hackathon 6
  • 7. Why Spark? Faster prototyping Excellent in memory performance Uses AKKA Able to run 2.5 million concurrent processing in 1GB RAM Easy to debug Excellent integration in IntelliJ and Eclipse Little to code - 500 lines of code Something new to try at Hackathon 7
  • 8. Solution Uses only core spark Uses Geometric Brownian motion algorithm for prediction Complete code Available at Github https://github.com/zinniasystems/spark-energy-prediction Under Apache license Blog series http://zinniasystems.com/blog/2013/12/12/ predicting-global-energy-demand-using-spark-part-1/ 8
  • 9. Embrace Scala Scala is a JVM language in which spark is implemented. Though Spark gives Java API and Python API Scala feels more natural. If you are coming from Pig , You feel at home in Scala API. Spark source base is small. So knowing Scala helps you peek at spark source whenever possible. Excellent REPL support. 9
  • 10. Go functional Spark encourages you to use functional idioms over object oriented one's Some of the functional idioms available are Closure Function chaining Lazy evaluation Ex : Standard deviation Sum( (xi – Mean)*(xi – Mean)) Map/Reduce way Map calculates (xi – Mean)*(xi – Mean) Reduce does the sum 10
  • 11. Spark way private def standDeviation(inputRDD: RDD[Double], mean: Double): Double = { val sum = inputRDD.map(value => { (value - mean) * (value - mean) }).reduce((firstValue, secondValue) => { firstValue + secondValue })} private def standDeviation(inputRDD: RDD[Double],mean: Double): Double = { val sum =0; inputRDD.map(value => sum+=(value-mean)*(value-mean))} 11
  • 12. Use Tuples Map/Reduce is restricted to Key,Value pairs Representing data like Grouped data is too difficult in Key,Value pairs Writable is too much work to develop/maintain There was Tuple Map/Reduce effort some point of time Spark (Scala) has built in. 12
  • 13. Tuple Example Aggregating data over hour def hourlyAggregator(inputRDD: RDD[Record]): RDD[((String, Long), Record)] The resulting RDD has tuple as key which combines date and hour of the day. These tuples can be sent as input to other functions Aggregating data over Day def dailyAggregation(inputRDD: RDD[((String, Long), Record)]): RDD[(Date, Record)] 13
  • 14. Use Lazy evaluation Map/Reduce does not embrace lazy evaluation. Output of every job has to be written to HDFS HDFS is only way to share data in Map/Reduce Spark differs Every operation, other than actions, are lazy evaluated Only write critical data to Disk, other intermediate data cache in Memory Be careful when you use actions. Try to delay calling of actions as late possible. Refer Main.scala 14
  • 16. Spark @ Hackathon Madhukara Phatak Zinnia Systems @madhukaraphatak 1
  • 17. Hackathon “ Hackathon is a programming ritual , usually done in night, where programmers solve problems overnight which may take years of time otherwise” -Anonymous 2
  • 18. Where / When Dec 6 and 7 th , 2013 At BigData Conclave Hosted by Flutura Solving real world problem(s) in 24 hours No restriction on tools to be used Deliverables : Results, Working code and Visualization 3
  • 19. What? Predict the global energy demand for next year using the energy usage data available for last four years, in order to enable utility companies , effectively handle the energy demand. Every minute usage data collected from smart meters From 2008- 2012 Around 133 mb uncompressed 2,75,000 records Around 25,000 missing data records 4
  • 20. Questions to be answered What would be the energy consumption for the next day? What would be week wise Energy consumption for the next one year? What would be household’s peak time load (Peak time is between 7 AM to 10 AM) for the next month. During Weekdays During Weekends Assuming there was full day of outage, Calculate the Revenue Loss for a particular day next year by finding the Average Revenue per day (ARPD) of the household using given tariff plan Can you identify the device usage patterns? 5
  • 21. Who Four member team from Zinnia Systems Chose Spark/Scala for solving these problems Every one except me ,were new to Scala and Spark Able to solve the problems on time Won first prize at Hackathon 6
  • 22. Why Spark? Faster prototyping Excellent in memory performance Uses AKKA Able to run 2.5 million concurrent processing in 1GB RAM Easy to debug Excellent integration in IntelliJ and Eclipse Little to code - 500 lines of code Something new to try at Hackathon 7
  • 23. Solution Uses only core spark Uses Geometric Brownian motion algorithm for prediction Complete code Available at Github https://github.com/zinniasystems/spark-energy-prediction Under Apache license Blog series http://zinniasystems.com/blog/2013/12/12/ predicting-global-energy-demand-using-spark-part-1/ 8
  • 24. Embrace Scala Scala is a JVM language in which spark is implemented. Though Spark gives Java API and Python API Scala feels more natural. If you are coming from Pig , You feel at home in Scala API. Spark source base is small. So knowing Scala helps you peek at spark source whenever possible. Excellent REPL support. 9
  • 25. Go functional Spark encourages you to use functional idioms over object oriented one's Some of the functional idioms available are Closure Function chaining Lazy evaluation Ex : Standard deviation Sum( (xi – Mean)*(xi – Mean)) Map/Reduce way Map calculates (xi – Mean)*(xi – Mean) Reduce does the sum 10
  • 26. Spark way private def standDeviation(inputRDD: RDD[Double], mean: Double): Double = { val sum = inputRDD.map(value => { (value - mean) * (value - mean) }).reduce((firstValue, secondValue) => { firstValue + secondValue })} private def standDeviation(inputRDD: RDD[Double],mean: Double): Double = { val sum =0; inputRDD.map(value => sum+=(value-mean)*(value-mean))} 11 The code is available in EnergyUsagePrediction.scala
  • 27. Use Tuples Map/Reduce is restricted to Key,Value pairs Representing data like Grouped data is too difficult in Key,Value pairs Writable is too much work to develop/maintain There was Tuple Map/Reduce effort some point of time Spark (Scala) has built in. 12
  • 28. Tuple Example Aggregating data over hour def hourlyAggregator(inputRDD: RDD[Record]): RDD[((String, Long), Record)] The resulting RDD has tuple as key which combines date and hour of the day. These tuples can be sent as input to other functions Aggregating data over Day def dailyAggregation(inputRDD: RDD[((String, Long), Record)]): RDD[(Date, Record)] 13
  • 29. Use Lazy evaluation Map/Reduce does not embrace lazy evaluation. Output of every job has to be written to HDFS HDFS is only way to share data in Map/Reduce Spark differs Every operation, other than actions, are lazy evaluated Only write critical data to Disk, other intermediate data cache in Memory Be careful when you use actions. Try to delay calling of actions as late possible. Refer Main.scala 14