© 2014 MapR Technologies 1© 2014 MapR Technologies
© 2014 MapR Technologies 2
Agenda
• Introductions
• Log File enrichment
• ETL with ML
• Recommendation Engine
• Adhoc SQL ...
© 2014 MapR Technologies 3
Who is Mike Emerick ?
My bio the highlights.
Architect for MapR for 2.5 years.
“creative hours ...
© 2014 MapR Technologies 4
Approach to this presentation
1.No API discussion
1.Architecture features and utilization
2. Us...
© 2014 MapR Technologies 5
Spark 10,000 feet
• Fundamentally Spark is an MPP.
• Can use many Storage Subsystems.
(Great fo...
© 2014 MapR Technologies 6
Usecase : SQL Queries
• “Interactive SQL on Hadoop...”
• How does Spark make this easier?
– Nat...
© 2014 MapR Technologies 7
Firewall logs
WAP Logs
Application Logs
Spark
Streaming
Compromised
accouts
M7 Persistant
NoSQL...
© 2014 MapR Technologies 8
Usecase : Log file enrichment
• Why enrich my log data..?
• This is not Storm it is Batch
– Sim...
© 2014 MapR Technologies 9
Firewall logs
WAP Logs
Application Logs
Spark
Streaming
Compromised
accouts
M7 Persistant
NoSQL...
© 2014 MapR Technologies 10
Usecase : SQL mixing with ML
• Why are folks doing this..?
• How does Spark make this easier?
...
© 2014 MapR Technologies 11
Firewall logs
WAP Logs
Application Logs
Spark
Streaming
Compromised
accouts
M7 Persistant
NoSQ...
© 2014 MapR Technologies 12
Usecase : Recommendation Engine
• It is a recommendation engine...
• How does Spark make this ...
© 2014 MapR Technologies 13
Firewall logs
WAP Logs
Application Logs
Spark
Streaming
Compromised
accouts
M7 Persistant
NoSQ...
© 2014 MapR Technologies 14
Use cases build in complexity
• Adoption follows a curve of complexity
– Ingestion and query
–...
© 2014 MapR Technologies 15
Future state: ~ in the year 2000
• ADAM - Genomics
• GraphX – Graph is near...
• Mlib – Look f...
© 2014 MapR Technologies 16
Business Services
MapR is hiring in Chicago
Apache Drill Beta this Summer
Happy National Makin...
Upcoming SlideShare
Loading in...5
×

Meet Spark

729

Published on

The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark-streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction to the Spark stack, explain how Spark has lightening fast results, and how it complements Apache Hadoop.

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
729
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
21
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • My approach to this presentation
    Not and API presentation
    Great documentation and examples
    Lots of presentations of this variety
    Architecture presentation – Use case study.
    What unique features facilitate workloads.
    What is here and coming for new workloads.
     
  • Adhoc Queries with Shark
    An early use case.
    Not usually the first production.
    MapR has a few.
    In the long run...
  • Adhoc Queries with Shark
    An early use case.
    Not usually the first production.
    MapR has a few.
    In the long run...
  • Logfile enrichment
    Streaming API
    Leveraging near time resolution for enrichment.
    Not the same as Storm - Its Batch
    Hooks to other messaging tools ZeroMQ, Kafka etc..
    Sliding Window features
    NoSQL capabilities
    Access to Tables via Hbase API
    Access to in memory RDDs
  • Adhoc Queries with Shark
    An early use case.
    Not usually the first production.
    MapR has a few.
    In the long run...
  • SQL Mixing with Machine Learning
    “ETL for the math nerd”
    R and Python
    Current access is Shark
    Spark SQL will drive to native SQL
  • Adhoc Queries with Shark
    An early use case.
    Not usually the first production.
    MapR has a few.
    In the long run...
  • Recommendation Engine
    Provides all aspects
    ETL
    Vector/Matrix Generation Training
    Near time recommendation
  • Adhoc Queries with Shark
    An early use case.
    Not usually the first production.
    MapR has a few.
    In the long run...
  • Pharma’s like Adam Academia is behind
    Few Graph use cases in development none deployed none on GraphX
    Mlib and Mahout may join forces..
    PySpark according to DataBricks is some of the most active code
    SparkR
    BlinkDB Time limited queries separate git hub will be merged in to main Spark Branch
    Couple OEM vendors for Spark they are covered on the @ databricks site.
  • MapR has a very large and robust, growing ecosystem of partners. This is important for you because you have existing investments and relationships with other technologies which need to work well with MapR, integrate easily, and allow you to create a differentiated set of technologies.

    (highlight key partners which are important to your customer)
  • Meet Spark

    1. 1. © 2014 MapR Technologies 1© 2014 MapR Technologies
    2. 2. © 2014 MapR Technologies 2 Agenda • Introductions • Log File enrichment • ETL with ML • Recommendation Engine • Adhoc SQL Queries • The Future case
    3. 3. © 2014 MapR Technologies 3 Who is Mike Emerick ? My bio the highlights. Architect for MapR for 2.5 years. “creative hours at Workshop 88.”
    4. 4. © 2014 MapR Technologies 4 Approach to this presentation 1.No API discussion 1.Architecture features and utilization 2. Use Cases .. and Why Spark?
    5. 5. © 2014 MapR Technologies 5 Spark 10,000 feet • Fundamentally Spark is an MPP. • Can use many Storage Subsystems. (Great for development) • RDD, Accumulators, Broadcast. • Map Reduce +. • Apache Spark site has great resources on architecture and API.
    6. 6. © 2014 MapR Technologies 6 Usecase : SQL Queries • “Interactive SQL on Hadoop...” • How does Spark make this easier? – Native Hive QL (SQL 93 ish) – In memory and from disk – Usually the first thought... • Spark SQL
    7. 7. © 2014 MapR Technologies 7 Firewall logs WAP Logs Application Logs Spark Streaming Compromised accouts M7 Persistant NoSQL Persistent Datastore Spark SQL Presentation Monitoring Spark NoSQL Tools Tableau Qlikview Spark Mlib Reporting Data Products WebPortal Content Management MapR Dataplatform Microstrategy Known Exploits Blacklists GeoIP
    8. 8. © 2014 MapR Technologies 8 Usecase : Log file enrichment • Why enrich my log data..? • This is not Storm it is Batch – Similar to Hbase Async API.. • How does Spark make this easier? – Streaming API – Sliding Windows – SQL Hive/Shark • Connect to Hbase – NoSQL Connectors • Hbase
    9. 9. © 2014 MapR Technologies 9 Firewall logs WAP Logs Application Logs Spark Streaming Compromised accouts M7 Persistant NoSQL Persistent Datastore Spark SQL Presentation Monitoring Spark NoSQL Tools Tableau Qlikview Spark Mlib Reporting Data Products WebPortal Content Management MapR Dataplatform Microstrategy Known Exploits Blacklists GeoIP
    10. 10. © 2014 MapR Technologies 10 Usecase : SQL mixing with ML • Why are folks doing this..? • How does Spark make this easier? – Native Machine learning Mlib – Access to neartime Adhoc SQL queries – R and SQL in the same place – Bigger than in memory faster than MR
    11. 11. © 2014 MapR Technologies 11 Firewall logs WAP Logs Application Logs Spark Streaming Compromised accouts M7 Persistant NoSQL Persistent Datastore Spark SQL Presentation Monitoring Spark NoSQL Tools Tableau Qlikview Spark Mlib Reporting Data Products WebPortal Content Management MapR Dataplatform Microstrategy Known Exploits Blacklists GeoIP
    12. 12. © 2014 MapR Technologies 12 Usecase : Recommendation Engine • It is a recommendation engine... • How does Spark make this easier? – ETL and Enrichment – Mlib makes it easy to import data. – Mlib Training in same cluster – NoSQL Adhoc serves recommendations – Dynamic
    13. 13. © 2014 MapR Technologies 13 Firewall logs WAP Logs Application Logs Spark Streaming Compromised accouts M7 Persistant NoSQL Persistent Datastore Spark SQL Presentation Monitoring Spark NoSQL Tools Tableau Qlikview Spark Mlib Reporting Data Products WebPortal Content Management MapR Dataplatform Microstrategy Known Exploits Blacklists GeoIP
    14. 14. © 2014 MapR Technologies 14 Use cases build in complexity • Adoption follows a curve of complexity – Ingestion and query – Ingestion Enrichment Query – Ingestion Enrichment Machine learning Query – Ingestion Enrichment Machine learning Serving recommendations – ..... • Spark is flattening the curve • Why? – One framework – Less data movement – Access to preferred language
    15. 15. © 2014 MapR Technologies 15 Future state: ~ in the year 2000 • ADAM - Genomics • GraphX – Graph is near... • Mlib – Look for lots of work here • PySpark – Fastest evolving • SparkR – Just getting started • BlinkDB – ~ Queries • OEM...
    16. 16. © 2014 MapR Technologies 16 Business Services MapR is hiring in Chicago Apache Drill Beta this Summer Happy National Making day ! Check out W88 for Hadoop classes
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×