SlideShare a Scribd company logo
Mastering MapReduce Series, Session I:MapReduce for Big Data Management and Analysis Curt Monash, Monash Research Steve Wooledge, Aster Data Peter Pawlowski, Aster Data Eric Friedman, Aster Data October 15th, 2009
 Aster Data Overview  SQL-MapReduce  Example SQL-MapReduce applications  SQL-MapReduce Syntax/example  Q&A Topics
Aster Data Creating the Next-Generation Data Management System Founded in 2005 to revolutionize data processing & management of very large data volumes Founding team innovated on the ‘big data’ problem at Stanford University and were joined by big data experts from Google, Oracle, and Microsoft Aster’s first commercial product, nCluster, has been in market since 2007. Customers include MySpace, LinkedIn, Coremetrics, Akamai, others. Since 2008, innovated on Google’s well-known MapReduceframework to transform data processing. Created patent-pending  SQL-MapReduce(In-Database MapReduce)
Example Data-Driven Applications  Large Data Volumes and Analytics-Intensive ,[object Object]
Service Personalization (e.g. telco)
Graph analysis
Consumer segmentation
Consumer buying patterns and consumer behavior
Click-stream analysis
Compliance & Regulatory Reporting
Predictive and granular forecasting
Trend analysis and modeling
Credit and Risk management
Fraud detection
Cross-platform ad and event attribution
Cross-platform media affinity analysis,[object Object]
Improving Computation Push-Down Cycle Time = Seconds to Minutes BI Reports  Server DataMining Workload Common SQL Queries: aggregation, sub-sets & samples MPP Database Confidential and proprietary. Copyright © 2009 Aster Data Systems 6
Aster’s Solution - A Massively Parallel Data Warehouse With the Unique Ability to Embed Applications Deeper, Faster Analytics on Big Data OtherApplications(C, C++, Perl, Python…) Leading BI Tools Key Classes ofApplications Custom JAVAApplications Custom .NET Applications Packaged Analytic Apps 6 Aster nCluster System Aster’s SQL-MapReduce orStandard Interfaces Unified  Interface SQL SQL-MapReduce 5 High Volume, Fast Querying Industry-leading  WLM: 300+  Concurrent Workloads 4 Dynamic Workload Manager (WLM) Data .NET App Java App Embedded Parallelized Apps – executes within the DB Pack’gdApp Other Apps 3 3 Data Data Data Data Data Data MPP Data Warehouse withIncremental Scaling  (scale by function) Data Data Data Data Data 2  Massively -Parallel  Data Store 1 Commodity Hardware
Aster SQL-MapReduce (SQL-MR) Bring your applications to the data “Data-Applications” Development Platform Rich portfolio of supported languages – Java, .NET, Python, Ruby, Perl, C++, R and More Use SQL to develop rich data apps Expressive flexibility Reusability across applications and reports
Full Tilt Poker: Fraud DetectionThe second largest online poker site in the world Objective: Improve fraud analytics and stop revenue leakage Before: Separate Java-based fraud detection applications ran once a week	 ,[object Object]
Java-based program ran the data mining on extracted data
Algorithm had to be oversimplified due to performance limitations
Fraud was detected too late or not at allAfter: Store and analyze all data in one location…the Aster database with SQL-MapReduce ,[object Object]
Enriched fraud algorithm is now catching previously undetected fraud
Query performance improved by 60x (90 mins down to 90 secs)9 Confidential and proprietary. Copyright © 2009 Aster Data Systems
Aster’s Patent-Pending SQL-MapReduce Enables faster, easier, and more powerful analytics  SQL-MapReduce framework (for developers to create and extend) Flexible: MapReduce expressiveness, languages, polymorphism Performance: Massive parallelization, computational push-down  Availability: Fault isolation, resource management  Powerful SQL-MR functions (for analysts to consume) Deep insights: Unlimited analytical power at your disposal Ease of use: Simply plug in to the SQL you know and love The Power of Aster’s SQL-MapReduce Framework Write Install Use and Reuse Write a SQL-MR function in Java, C, etc. Install inside Aster nCluster Invoke SQL-MR function from SQL 3 1 2

More Related Content

What's hot

Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps, a CSC Big Data Business
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
Big Data User Group Karlsruhe/Stuttgart
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big dataRaul Chong
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
Revolution Analytics
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
Arvind Sathi
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
Infochimps, a CSC Big Data Business
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft Private Cloud
 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
Ulf Mattsson
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Con LA
 
AI in the Enterprise at Scale
AI in the Enterprise at ScaleAI in the Enterprise at Scale
AI in the Enterprise at Scale
Ganesan Narayanasamy
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
HostedbyConfluent
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
Donghui Zhang
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
Boulder Java User's Group
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
Ganesan Narayanasamy
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Publicis Sapient Engineering
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
Jan Wiegelmann
 
Digital Shift in Insurance: How is the Industry Responding with the Influx of...
Digital Shift in Insurance: How is the Industry Responding with the Influx of...Digital Shift in Insurance: How is the Industry Responding with the Influx of...
Digital Shift in Insurance: How is the Industry Responding with the Influx of...
DataWorks Summit
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dataconomy Media
 

What's hot (20)

Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
Taming Big Data With Modern Software Architecture
Taming Big Data  With Modern Software ArchitectureTaming Big Data  With Modern Software Architecture
Taming Big Data With Modern Software Architecture
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
Big Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
 
02 a holistic approach to big data
02 a holistic approach to big data02 a holistic approach to big data
02 a holistic approach to big data
 
Real-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to ProductionReal-time Big Data Analytics: From Deployment to Production
Real-time Big Data Analytics: From Deployment to Production
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure WhitepaperMicrosoft SQL Azure - Scaling Out with SQL Azure Whitepaper
Microsoft SQL Azure - Scaling Out with SQL Azure Whitepaper
 
Protecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UKProtecting data privacy in analytics and machine learning ISACA London UK
Protecting data privacy in analytics and machine learning ISACA London UK
 
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
Data Science Out of The Box : Case Studies in the Telecommunication by Anand ...
 
AI in the Enterprise at Scale
AI in the Enterprise at ScaleAI in the Enterprise at Scale
AI in the Enterprise at Scale
 
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, StealthLessons from building a stream-first metadata platform | Shirshanka Das, Stealth
Lessons from building a stream-first metadata platform | Shirshanka Das, Stealth
 
Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017Big Data Platform Landscape by 2017
Big Data Platform Landscape by 2017
 
Big Data and OSS at IBM
Big Data and OSS at IBMBig Data and OSS at IBM
Big Data and OSS at IBM
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
Ask Bigger Questions with Cloudera and Apache Hadoop - Big Data Day Paris 2013
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Digital Shift in Insurance: How is the Industry Responding with the Influx of...
Digital Shift in Insurance: How is the Industry Responding with the Influx of...Digital Shift in Insurance: How is the Industry Responding with the Influx of...
Digital Shift in Insurance: How is the Industry Responding with the Influx of...
 
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
Dr. Christian Kurze from Denodo, "Data Virtualization: Fulfilling the Promise...
 

Similar to Mastering MapReduce: MapReduce for Big Data Management and Analysis

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
eRic Choo
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
Amazon Web Services
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
Databricks
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
ElsonPaul2
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
ScyllaDB
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
Yousun Jeong
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
Richard Garris
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
GoDataDriven
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Databricks
 
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
Amazon Web Services
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
Kevin Crocker
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017
ajay_ei
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
Hektor Jacynycz García
 
QWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 wordsQWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 words
John Park
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
DataStax Academy
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
Eiti Kimura
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
VMware Tanzu
 
Big Data on the Cloud
Big Data on the CloudBig Data on the Cloud
Big Data on the Cloud
Sercan Karaoglu
 
From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting data
markgrover
 

Similar to Mastering MapReduce: MapReduce for Big Data Management and Analysis (20)

Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
AWS November Webinar Series - Advanced Analytics with Amazon Redshift and the...
 
A look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutionsA look under the hood at Apache Spark's API and engine evolutions
A look under the hood at Apache Spark's API and engine evolutions
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
M7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal HausenblasM7 and Apache Drill, Micheal Hausenblas
M7 and Apache Drill, Micheal Hausenblas
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei ZahariaDeep learning and streaming in Apache Spark 2.2 by Matei Zaharia
Deep learning and streaming in Apache Spark 2.2 by Matei Zaharia
 
Best Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache SparkBest Practices for Building and Deploying Data Pipelines in Apache Spark
Best Practices for Building and Deploying Data Pipelines in Apache Spark
 
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
AWS re:Invent 2016: Zillow Group: Developing Classification and Recommendatio...
 
Data science and OSS
Data science and OSSData science and OSS
Data science and OSS
 
Spark Kafka summit 2017
Spark Kafka summit 2017Spark Kafka summit 2017
Spark Kafka summit 2017
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
QWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 wordsQWC 2014 - A picture worth 1000 words
QWC 2014 - A picture worth 1000 words
 
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache CassandraMovile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
Movile Internet Movel SA: A Change of Seasons: A big move to Apache Cassandra
 
Cassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of SeasonsCassandra Summit 2015 - A Change of Seasons
Cassandra Summit 2015 - A Change of Seasons
 
Real-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven ApplicationsReal-time Analytics for Data-Driven Applications
Real-time Analytics for Data-Driven Applications
 
Big Data on the Cloud
Big Data on the CloudBig Data on the Cloud
Big Data on the Cloud
 
From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting data
 

More from Teradata Aster

Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Teradata Aster
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
Teradata Aster
 
Using Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentUsing Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic Environment
Teradata Aster
 
Big Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataBig Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey Data
Teradata Aster
 
What Makes A Great Data Scientist?
What Makes A Great Data Scientist?What Makes A Great Data Scientist?
What Makes A Great Data Scientist?
Teradata Aster
 
Practical Applications of Visual Analytics
Practical Applications of Visual AnalyticsPractical Applications of Visual Analytics
Practical Applications of Visual Analytics
Teradata Aster
 
Trust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTrust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social Media
Teradata Aster
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
Teradata Aster
 
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaBig Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
Teradata Aster
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
Teradata Aster
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
Teradata Aster
 
Keynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsKeynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball Analytics
Teradata Aster
 
Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics,
Teradata Aster
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
Teradata Aster
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics Applied
Teradata Aster
 
Solving the Education Crisis with Big Data
Solving the Education Crisis with Big DataSolving the Education Crisis with Big Data
Solving the Education Crisis with Big Data
Teradata Aster
 
Using SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsUsing SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced Analytics
Teradata Aster
 
SAS aster data big data dc presentation public
SAS aster data big data dc presentation publicSAS aster data big data dc presentation public
SAS aster data big data dc presentation publicTeradata Aster
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Teradata Aster
 

More from Teradata Aster (20)

Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and TargetingRazorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
Razorfish Multi-Channel Marketing: Better Customer Segmentation and Targeting
 
Big Data Decision-Making
Big Data Decision-MakingBig Data Decision-Making
Big Data Decision-Making
 
Using Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic EnvironmentUsing Data to Manage in Today’s Chaotic Environment
Using Data to Manage in Today’s Chaotic Environment
 
Big Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey DataBig Analytics 2012 Event Survey Data
Big Analytics 2012 Event Survey Data
 
What Makes A Great Data Scientist?
What Makes A Great Data Scientist?What Makes A Great Data Scientist?
What Makes A Great Data Scientist?
 
Practical Applications of Visual Analytics
Practical Applications of Visual AnalyticsPractical Applications of Visual Analytics
Practical Applications of Visual Analytics
 
Trust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social MediaTrust and Influence in the Complex Network of Social Media
Trust and Influence in the Complex Network of Social Media
 
Turning Big Data to Business Advantage
Turning Big Data to Business AdvantageTurning Big Data to Business Advantage
Turning Big Data to Business Advantage
 
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s DilemmaBig Brands Meet Big Data – The Newest Innovator’s Dilemma
Big Brands Meet Big Data – The Newest Innovator’s Dilemma
 
Simplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the BusinessSimplifying Big Data Analytics for the Business
Simplifying Big Data Analytics for the Business
 
Evaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics PlatformsEvaluating Big Data Predictive Analytics Platforms
Evaluating Big Data Predictive Analytics Platforms
 
Keynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball AnalyticsKeynote: Cross Industry Lessons from Moneyball Analytics
Keynote: Cross Industry Lessons from Moneyball Analytics
 
Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics, Technology Strategies for Big Data Analytics,
Technology Strategies for Big Data Analytics,
 
Hadoop - Now, Next and Beyond
Hadoop - Now, Next and BeyondHadoop - Now, Next and Beyond
Hadoop - Now, Next and Beyond
 
From Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics AppliedFrom Data Science to Business Value - Analytics Applied
From Data Science to Business Value - Analytics Applied
 
Solving the Education Crisis with Big Data
Solving the Education Crisis with Big DataSolving the Education Crisis with Big Data
Solving the Education Crisis with Big Data
 
Using SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced AnalyticsUsing SQL-MapReduce for Advanced Analytics
Using SQL-MapReduce for Advanced Analytics
 
SAS aster data big data dc presentation public
SAS aster data big data dc presentation publicSAS aster data big data dc presentation public
SAS aster data big data dc presentation public
 
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
Utilizing Aster nCluster to support processing in excess of 100 Billion rows ...
 
comScore
comScorecomScore
comScore
 

Recently uploaded

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 

Recently uploaded (20)

Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 

Mastering MapReduce: MapReduce for Big Data Management and Analysis

  • 1. Mastering MapReduce Series, Session I:MapReduce for Big Data Management and Analysis Curt Monash, Monash Research Steve Wooledge, Aster Data Peter Pawlowski, Aster Data Eric Friedman, Aster Data October 15th, 2009
  • 2. Aster Data Overview SQL-MapReduce Example SQL-MapReduce applications SQL-MapReduce Syntax/example Q&A Topics
  • 3. Aster Data Creating the Next-Generation Data Management System Founded in 2005 to revolutionize data processing & management of very large data volumes Founding team innovated on the ‘big data’ problem at Stanford University and were joined by big data experts from Google, Oracle, and Microsoft Aster’s first commercial product, nCluster, has been in market since 2007. Customers include MySpace, LinkedIn, Coremetrics, Akamai, others. Since 2008, innovated on Google’s well-known MapReduceframework to transform data processing. Created patent-pending SQL-MapReduce(In-Database MapReduce)
  • 4.
  • 8. Consumer buying patterns and consumer behavior
  • 11. Predictive and granular forecasting
  • 13. Credit and Risk management
  • 15. Cross-platform ad and event attribution
  • 16.
  • 17. Improving Computation Push-Down Cycle Time = Seconds to Minutes BI Reports Server DataMining Workload Common SQL Queries: aggregation, sub-sets & samples MPP Database Confidential and proprietary. Copyright © 2009 Aster Data Systems 6
  • 18. Aster’s Solution - A Massively Parallel Data Warehouse With the Unique Ability to Embed Applications Deeper, Faster Analytics on Big Data OtherApplications(C, C++, Perl, Python…) Leading BI Tools Key Classes ofApplications Custom JAVAApplications Custom .NET Applications Packaged Analytic Apps 6 Aster nCluster System Aster’s SQL-MapReduce orStandard Interfaces Unified Interface SQL SQL-MapReduce 5 High Volume, Fast Querying Industry-leading WLM: 300+ Concurrent Workloads 4 Dynamic Workload Manager (WLM) Data .NET App Java App Embedded Parallelized Apps – executes within the DB Pack’gdApp Other Apps 3 3 Data Data Data Data Data Data MPP Data Warehouse withIncremental Scaling (scale by function) Data Data Data Data Data 2 Massively -Parallel Data Store 1 Commodity Hardware
  • 19. Aster SQL-MapReduce (SQL-MR) Bring your applications to the data “Data-Applications” Development Platform Rich portfolio of supported languages – Java, .NET, Python, Ruby, Perl, C++, R and More Use SQL to develop rich data apps Expressive flexibility Reusability across applications and reports
  • 20.
  • 21. Java-based program ran the data mining on extracted data
  • 22. Algorithm had to be oversimplified due to performance limitations
  • 23.
  • 24. Enriched fraud algorithm is now catching previously undetected fraud
  • 25. Query performance improved by 60x (90 mins down to 90 secs)9 Confidential and proprietary. Copyright © 2009 Aster Data Systems
  • 26. Aster’s Patent-Pending SQL-MapReduce Enables faster, easier, and more powerful analytics SQL-MapReduce framework (for developers to create and extend) Flexible: MapReduce expressiveness, languages, polymorphism Performance: Massive parallelization, computational push-down Availability: Fault isolation, resource management Powerful SQL-MR functions (for analysts to consume) Deep insights: Unlimited analytical power at your disposal Ease of use: Simply plug in to the SQL you know and love The Power of Aster’s SQL-MapReduce Framework Write Install Use and Reuse Write a SQL-MR function in Java, C, etc. Install inside Aster nCluster Invoke SQL-MR function from SQL 3 1 2
  • 27.
  • 29.
  • 31.
  • 33.
  • 35. Expensive HW & maintenanceBest of both worlds! Traditional Database
  • 36. MapReduce Applications Behavioral Analytics (CRM) Sequential pattern analysis (e.g., up-sell/cross-sell) Spam/BOT analysis Sessionization analysis Risk & Fraud analysis Consumer credit scoring/default risk, market risk/VaR, operational risk, etc Fraud detection Graph analysis Social network “connectedness” (e.g., SSSP, APSP, etc) Text analysis Tokenization (e.g., word count classification) Natural language processing Statistical analysis (machine learning) Linear regression K-means clustering R Project algorithms
  • 37. Aster’s SQL-MapReduce Library: Pre-packaged (SDK), SQL-MR APIs, and documentation Pre-packaged SQL-MR sample functions nPath – complex sequential analysis for time-series and behavioral pattern analysis SSSP – single source shortest path Graph algorithm useful for fraud and segmentation analysis Sessionize– session categorization based on a sequence of clicks within a specified timeout Approximate percentiles – ultra-fast percentile (or N-tile) statistical distribution analysis Linear regression – statistical technique used to predict values based on a set of related variables. Tokenize – text analysis that splits strings into words, categorizes them, and does a word count
  • 38.
  • 39. Requires dozens of SQL queries every N minutes (dozens of times per day)
  • 40.
  • 41. Significantly simpler code: <100 lines vs. 1000 lines
  • 42. Single pass over data for optimal performanceSource: Avinash Kaushik, Occam’s Razor, Nov ‘08 14 Confidential and proprietary. Copyright © 2009 Aster Data Systems
  • 43.
  • 44. Running data mining and statistical analysis on multi-TB system
  • 46.
  • 47. Single pass over large-scale data
  • 48. 100 lines of code down to 12
  • 49. Significant SQL optimization: Minimal SQL code, greater performance via parallel execution
  • 50. Cycle time reduction: Significant resource savings in both time and utilization15 Confidential and proprietary. Copyright © 2009 Aster Data Systems
  • 52. nPath is a SQL-MR function included with nCluster. nPath enables analysis of ordered data: Clickstream data Financial transaction data User interaction data Anything of a time series nature Leverages the power of the SQL-MR framework to transcend SQL’s limitations with respect to ordered data What is Aster nPath? 17
  • 53. Example: Analyzing a Clickstream Business question How many distinct users: Start at the home page. Click on an auction. View the seller’s profile. Bid on the item. Available Data A database table clicks, populated with web log data, that has columns user_id, timestamp, and page_type.
  • 54. The nPath query SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (1) Partition: Form groups by user_id. (2) Order: Sort each group by timestamp.
  • 55. The nPath query (3b) Match: Define the subsequences of interest via regex. SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (3a) Match: Define a set of symbols.
  • 56. The nPath query SELECT count(distinct user_id) FROM nPath( ON clicks PARTITION BY user_id ORDER BY timestamp MODE(OVERLAPPING) PATTERN(‘H.A.P.B’) SYMBOLS( page_type = ‘home’ AS H, page_type = ‘auction’ AS A, page_type = ‘profile’ AS P, page_type = ‘bid’ AS B) RESULT(first(user_id of H) as user_id) ); (4) Compute Aggregates over matched subsequences.
  • 57. Market Basket Analysis Example Question Detect customers - that purchase the same category of items - in three market baskets in a row - with total value &gt; $150
  • 58. Two Methods – Same Answer Multi-pass Nested Sub-selects Single Pass SQL-MR nPath Query 5187 17769 3542 1889 5753 2001 156 193 2521 156 1416 75194 75194 10411 27355
  • 59. Demo – Market Basket Analysis (1M Rows)
  • 60. Summary:Bringing MapReduce to Big Data Management Aster’s MPP data warehouse + SQL-MapReduce
  • 61. Upcoming Webcast: Mastering MapReduce Part II Save the date!: December 3rd MapReduce Resources - http://www.asterdata.com/mapreduce/index.php Recorded application use-cases Code samples and tutorials DBMS2 on MapReduce: http://www.dbms2.com/category/parallelization/mapreduce/ Aster’s SQL-MapReduce http://www.asterdata.com/product/mapreduce.php http://www.asterdata.com/blog/index.php/category/mapreduce/ TDWI Technical whitepaper Contact us hello@asterdata.com Steve.wooledge@asterdata.com Thank You!