SlideShare a Scribd company logo
1 of 15
Copyright © 2010, SAS Institute Inc. All rights reserved.
When is the right time for real-time?
2
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Profile
An experienced analytics specialist with extensive
experience in risk analytics, marketing analytics and
social media analytics. Extensive experience in
developing new techniques and models to achieve
business objectives. Former SAS Global Forum Section
Chair.
3
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Real time…?
• Do I need real time capabilities?
• What is real time for me?
• Right time vs real time
• Free lunch? No way
• Open Source = Free Lunch?
• Real Time, Right Time, Robust Returns
4
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Do I need real time capabilities?
The need for real time capabilities is driven by the
business.
Model
Environment
Impact
5
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Model
Type of Model Business Examples
Price Models Buying Club, One-stop, low-price shopping Fee for advertising, Razor and blade
Convenience Models One-stop, convenient shopping, Comprehensive offering, Instant gratification
Commodity-Plus
Models
Low-price reliable commodity, Mass customised commodity, Service-wrapped commodity
Experience Models Experience selling, Cool brands
Channel Models Channel maximisation, Quality selling, Value-added reseller
Intermediary Models Market aggregation, Open market-making, Multi-party market aggregation
Trust Models Trusted operations, Trusted product leadership, Trusted service leadership
Innovation Models Incomparable products, Incomp
Linder and Cantrell’s (2000) categorization of business models
6
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Environment
7
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Impact (Courtesy of Alfrid from The Hobbit)
Where a little gold coin has better effect than a Wizard’s
wand….
https://www.youtube.com/watch?v=-fcJm1Slk2E
8
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
What is real time for me?
In real time analytics, the definition of real time is the time
interval between the reception of data, performing the
analysis and finally taking the actions that will be relevant.
Milliseconds
Seconds
Minutes
Hours
Algorithmic Trading
Fraud Detection
E-Commerce
Location Based
Marketing
Online Retailers
9
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Right Time Vs Real Time
Even with real time, the more important thing is to have
the right timing.
Right
Time
Real
Time
10
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Real Time Marketing - Mall
Pre-Store In-Store Post-Store
When should I target?
What should I target?
Target the approaching
customer with an offer
that is in a shop 10
meters away.
Target an in-store
customer with an offer
that is store based.
Offer a return purchase
coupon.
11
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Free Lunch? No Way
12
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Free Lunch? No Way
mea
upd
quic
Adv
Proc
easi
prov
tion
righ
spe
man
norm
mor
can
cust
to h
Uns
prov
extr
modelsin a single,
centrally managed
intuitive interface.
13
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Open Source = Free Lunch?
Hardware Provisioning
A common question received by Spark developers is how to configure hardware for it. While the right hardware will depend on the situation, we make the following
recommendations.
Storage Systems
Because most Spark jobs will likely have to read input data from an external storage system (e.g. the Hadoop File System, or HBase), it is important to place it as close
to this system as possible. We recommend the following:
If at all possible, run Spark on the same nodes as HDFS. The simplest way is to set up a Spark standalone mode cluster on the same nodes, and configure Spark and
Hadoop’s memory and CPU usage to avoid interference (for Hadoop, the relevant options are mapred.child.java.opts for the per-task memory and
mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum for number of tasks). Alternatively, you can run Hadoop and Spark on a
common cluster manager like Mesos or Hadoop YARN.
If this is not possible, run Spark on different nodes in the same local-area network as HDFS.
For low-latency data stores like HBase, it may be preferrable to run computing jobs on different nodes than the storage system to avoid interference.
Local Disks
While Spark can perform a lot of its computation in memory, it still uses local disks to store data that doesn’t fit in RAM, as well as to preserve intermediate output
between stages. We recommend having 4-8 disks per node, configured without RAID (just as separate mount points). In Linux, mount the disks with the noatime option
to reduce unnecessary writes. In Spark, configure the spark.local.dir variable to be a comma-separated list of the local disks. If you are running HDFS, it’s fine to use the
same disks as HDFS.
Memory
In general, Spark can run well with anywhere from 8 GB to hundreds of gigabytes of memory per machine. In all cases, we recommend allocating only at most 75% of
the memory for Spark; leave the rest for the operating system and buffer cache.
How much memory you will need will depend on your application. To determine how much your application uses for a certain dataset size, load part of your dataset in a
Spark RDD and use the Storage tab of Spark’s monitoring UI (http://<driver-node>:4040) to see its size in memory. Note that memory usage is greatly affected by
storage level and serialization format – see the tuning guide for tips on how to reduce it.
Finally, note that the Java VM does not always behave well with more than 200 GB of RAM. If you purchase machines with more RAM than this, you can run multiple
worker JVMs per node. In Spark’s standalone mode, you can set the number of workers per node with the SPARK_WORKER_INSTANCES variable in conf/spark-
env.sh, and the number of cores per worker with SPARK_WORKER_CORES.
Network
In our experience, when the data is in memory, a lot of Spark applications are network-bound. Using a 10 Gigabit or higher network is the best way to make these
applications faster. This is especially true for “distributed reduce” applications such as group-bys, reduce-bys, and SQL joins. In any given application, you can see how
much data Spark shuffles across the network from the application’s monitoring UI (http://<driver-node>:4040).
CPU Cores
Spark scales well to tens of CPU cores per machine because it performes minimal sharing between threads. You should likely provision at least 8-16 cores per machine.
Depending on the CPU cost of your workload, you may also need more: once data is in memory, most applications are either CPU- or network-bound.
The hardware for
50TB storage
is SGD$XXXXXXX.
Plus, you need
a team of
5 Hadoop
Specialists.
What about
the deployment
period?
14
Copyr ight © 2012, SAS Institute Inc. All rights reser ved.
Real Time, Right Time, Robust Returns
• Several Retail Space Companies in Singapore have
started reaping the benefits of doing real time analytics.
• Telcos have been progressing in this area through
Geo-Fencing Technology to market relevant offers.
• Banks also started offering some form of real time
marketing.
• Most companies have reported gains in the deployment
of real time technology.
Copyright © 2010, SAS Institute Inc. All rights reserved.
Thank you!

More Related Content

Similar to Right time Vs real time

Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesJohn Coggeshall
 
Top 10 Scalability Mistakes
Top 10 Scalability MistakesTop 10 Scalability Mistakes
Top 10 Scalability MistakesJohn Coggeshall
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecasesudhakara st
 
MongoDB and In-Memory Computing
MongoDB and In-Memory ComputingMongoDB and In-Memory Computing
MongoDB and In-Memory ComputingDylan Tong
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practiceswebuploader
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Webcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond HadoopWebcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond HadoopImpetus Technologies
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Shardinguzzal basak
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP vinoth kumar
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014gmalouf678
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!Databricks
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web appsLior Bar-On
 
NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!DataCore Software
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsAlluxio, Inc.
 
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...IBM India Smarter Computing
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with BlackfireMarko Mitranić
 
Hadoop Vs Spark — Choosing the Right Big Data Framework
Hadoop Vs Spark — Choosing the Right Big Data FrameworkHadoop Vs Spark — Choosing the Right Big Data Framework
Hadoop Vs Spark — Choosing the Right Big Data FrameworkAlaina Carter
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQLEDB
 

Similar to Right time Vs real time (20)

Apache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 MistakesApache Con 2008 Top 10 Mistakes
Apache Con 2008 Top 10 Mistakes
 
Top 10 Scalability Mistakes
Top 10 Scalability MistakesTop 10 Scalability Mistakes
Top 10 Scalability Mistakes
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
MongoDB and In-Memory Computing
MongoDB and In-Memory ComputingMongoDB and In-Memory Computing
MongoDB and In-Memory Computing
 
scale_perf_best_practices
scale_perf_best_practicesscale_perf_best_practices
scale_perf_best_practices
 
Cloud Economics
Cloud EconomicsCloud Economics
Cloud Economics
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Webcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond HadoopWebcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond Hadoop
 
Scaling PHP apps
Scaling PHP appsScaling PHP apps
Scaling PHP apps
 
MongoDB Sharding
MongoDB ShardingMongoDB Sharding
MongoDB Sharding
 
Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP Introduction to Bigdata and HADOOP
Introduction to Bigdata and HADOOP
 
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
Boston Spark User Group - Spark's Role at MediaCrossing - July 15, 2014
 
How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!How Adobe Does 2 Million Records Per Second Using Apache Spark!
How Adobe Does 2 Million Records Per Second Using Apache Spark!
 
Get to know the browser better and write faster web apps
Get to know the browser better   and write faster web appsGet to know the browser better   and write faster web apps
Get to know the browser better and write faster web apps
 
NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!NVMe and Flash – Make Your Storage Great Again!
NVMe and Flash – Make Your Storage Great Again!
 
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast AnalyticsGetting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
Getting Started with Apache Spark and Alluxio for Blazingly Fast Analytics
 
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
IBM System Storage DS8000 with SSDs An In-Depth Look at SSD Performance in th...
 
2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire2019 StartIT - Boosting your performance with Blackfire
2019 StartIT - Boosting your performance with Blackfire
 
Hadoop Vs Spark — Choosing the Right Big Data Framework
Hadoop Vs Spark — Choosing the Right Big Data FrameworkHadoop Vs Spark — Choosing the Right Big Data Framework
Hadoop Vs Spark — Choosing the Right Big Data Framework
 
Best Practices & Lessons Learned from Deployment of PostgreSQL
 Best Practices & Lessons Learned from Deployment of PostgreSQL Best Practices & Lessons Learned from Deployment of PostgreSQL
Best Practices & Lessons Learned from Deployment of PostgreSQL
 

More from Murphy Choy

Applications of the DOW loop
Applications of the DOW loop Applications of the DOW loop
Applications of the DOW loop Murphy Choy
 
Building a decision tree from decision stumps
Building a decision tree from decision stumpsBuilding a decision tree from decision stumps
Building a decision tree from decision stumpsMurphy Choy
 
Data masking in sas
Data masking in sasData masking in sas
Data masking in sasMurphy Choy
 
Data masking with classical ciphers
Data masking with classical ciphersData masking with classical ciphers
Data masking with classical ciphersMurphy Choy
 
A simple introduction to candlestick charts in sas
A simple introduction to candlestick charts in sasA simple introduction to candlestick charts in sas
A simple introduction to candlestick charts in sasMurphy Choy
 
General Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesGeneral Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesMurphy Choy
 
Edison chen and Cammie Tse Scandal: A twitter study
Edison chen and Cammie Tse Scandal: A twitter studyEdison chen and Cammie Tse Scandal: A twitter study
Edison chen and Cammie Tse Scandal: A twitter studyMurphy Choy
 
Real Time Process Compliance using Nomenclature Approach
Real Time Process Compliance using Nomenclature ApproachReal Time Process Compliance using Nomenclature Approach
Real Time Process Compliance using Nomenclature ApproachMurphy Choy
 

More from Murphy Choy (8)

Applications of the DOW loop
Applications of the DOW loop Applications of the DOW loop
Applications of the DOW loop
 
Building a decision tree from decision stumps
Building a decision tree from decision stumpsBuilding a decision tree from decision stumps
Building a decision tree from decision stumps
 
Data masking in sas
Data masking in sasData masking in sas
Data masking in sas
 
Data masking with classical ciphers
Data masking with classical ciphersData masking with classical ciphers
Data masking with classical ciphers
 
A simple introduction to candlestick charts in sas
A simple introduction to candlestick charts in sasA simple introduction to candlestick charts in sas
A simple introduction to candlestick charts in sas
 
General Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance CompaniesGeneral Insurance Conference 2014: Big Data for Insurance Companies
General Insurance Conference 2014: Big Data for Insurance Companies
 
Edison chen and Cammie Tse Scandal: A twitter study
Edison chen and Cammie Tse Scandal: A twitter studyEdison chen and Cammie Tse Scandal: A twitter study
Edison chen and Cammie Tse Scandal: A twitter study
 
Real Time Process Compliance using Nomenclature Approach
Real Time Process Compliance using Nomenclature ApproachReal Time Process Compliance using Nomenclature Approach
Real Time Process Compliance using Nomenclature Approach
 

Recently uploaded

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Recently uploaded (20)

Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Right time Vs real time

  • 1. Copyright © 2010, SAS Institute Inc. All rights reserved. When is the right time for real-time?
  • 2. 2 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Profile An experienced analytics specialist with extensive experience in risk analytics, marketing analytics and social media analytics. Extensive experience in developing new techniques and models to achieve business objectives. Former SAS Global Forum Section Chair.
  • 3. 3 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Real time…? • Do I need real time capabilities? • What is real time for me? • Right time vs real time • Free lunch? No way • Open Source = Free Lunch? • Real Time, Right Time, Robust Returns
  • 4. 4 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Do I need real time capabilities? The need for real time capabilities is driven by the business. Model Environment Impact
  • 5. 5 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Model Type of Model Business Examples Price Models Buying Club, One-stop, low-price shopping Fee for advertising, Razor and blade Convenience Models One-stop, convenient shopping, Comprehensive offering, Instant gratification Commodity-Plus Models Low-price reliable commodity, Mass customised commodity, Service-wrapped commodity Experience Models Experience selling, Cool brands Channel Models Channel maximisation, Quality selling, Value-added reseller Intermediary Models Market aggregation, Open market-making, Multi-party market aggregation Trust Models Trusted operations, Trusted product leadership, Trusted service leadership Innovation Models Incomparable products, Incomp Linder and Cantrell’s (2000) categorization of business models
  • 6. 6 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Environment
  • 7. 7 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Impact (Courtesy of Alfrid from The Hobbit) Where a little gold coin has better effect than a Wizard’s wand…. https://www.youtube.com/watch?v=-fcJm1Slk2E
  • 8. 8 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. What is real time for me? In real time analytics, the definition of real time is the time interval between the reception of data, performing the analysis and finally taking the actions that will be relevant. Milliseconds Seconds Minutes Hours Algorithmic Trading Fraud Detection E-Commerce Location Based Marketing Online Retailers
  • 9. 9 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Right Time Vs Real Time Even with real time, the more important thing is to have the right timing. Right Time Real Time
  • 10. 10 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Real Time Marketing - Mall Pre-Store In-Store Post-Store When should I target? What should I target? Target the approaching customer with an offer that is in a shop 10 meters away. Target an in-store customer with an offer that is store based. Offer a return purchase coupon.
  • 11. 11 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Free Lunch? No Way
  • 12. 12 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Free Lunch? No Way mea upd quic Adv Proc easi prov tion righ spe man norm mor can cust to h Uns prov extr modelsin a single, centrally managed intuitive interface.
  • 13. 13 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Open Source = Free Lunch? Hardware Provisioning A common question received by Spark developers is how to configure hardware for it. While the right hardware will depend on the situation, we make the following recommendations. Storage Systems Because most Spark jobs will likely have to read input data from an external storage system (e.g. the Hadoop File System, or HBase), it is important to place it as close to this system as possible. We recommend the following: If at all possible, run Spark on the same nodes as HDFS. The simplest way is to set up a Spark standalone mode cluster on the same nodes, and configure Spark and Hadoop’s memory and CPU usage to avoid interference (for Hadoop, the relevant options are mapred.child.java.opts for the per-task memory and mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum for number of tasks). Alternatively, you can run Hadoop and Spark on a common cluster manager like Mesos or Hadoop YARN. If this is not possible, run Spark on different nodes in the same local-area network as HDFS. For low-latency data stores like HBase, it may be preferrable to run computing jobs on different nodes than the storage system to avoid interference. Local Disks While Spark can perform a lot of its computation in memory, it still uses local disks to store data that doesn’t fit in RAM, as well as to preserve intermediate output between stages. We recommend having 4-8 disks per node, configured without RAID (just as separate mount points). In Linux, mount the disks with the noatime option to reduce unnecessary writes. In Spark, configure the spark.local.dir variable to be a comma-separated list of the local disks. If you are running HDFS, it’s fine to use the same disks as HDFS. Memory In general, Spark can run well with anywhere from 8 GB to hundreds of gigabytes of memory per machine. In all cases, we recommend allocating only at most 75% of the memory for Spark; leave the rest for the operating system and buffer cache. How much memory you will need will depend on your application. To determine how much your application uses for a certain dataset size, load part of your dataset in a Spark RDD and use the Storage tab of Spark’s monitoring UI (http://<driver-node>:4040) to see its size in memory. Note that memory usage is greatly affected by storage level and serialization format – see the tuning guide for tips on how to reduce it. Finally, note that the Java VM does not always behave well with more than 200 GB of RAM. If you purchase machines with more RAM than this, you can run multiple worker JVMs per node. In Spark’s standalone mode, you can set the number of workers per node with the SPARK_WORKER_INSTANCES variable in conf/spark- env.sh, and the number of cores per worker with SPARK_WORKER_CORES. Network In our experience, when the data is in memory, a lot of Spark applications are network-bound. Using a 10 Gigabit or higher network is the best way to make these applications faster. This is especially true for “distributed reduce” applications such as group-bys, reduce-bys, and SQL joins. In any given application, you can see how much data Spark shuffles across the network from the application’s monitoring UI (http://<driver-node>:4040). CPU Cores Spark scales well to tens of CPU cores per machine because it performes minimal sharing between threads. You should likely provision at least 8-16 cores per machine. Depending on the CPU cost of your workload, you may also need more: once data is in memory, most applications are either CPU- or network-bound. The hardware for 50TB storage is SGD$XXXXXXX. Plus, you need a team of 5 Hadoop Specialists. What about the deployment period?
  • 14. 14 Copyr ight © 2012, SAS Institute Inc. All rights reser ved. Real Time, Right Time, Robust Returns • Several Retail Space Companies in Singapore have started reaping the benefits of doing real time analytics. • Telcos have been progressing in this area through Geo-Fencing Technology to market relevant offers. • Banks also started offering some form of real time marketing. • Most companies have reported gains in the deployment of real time technology.
  • 15. Copyright © 2010, SAS Institute Inc. All rights reserved. Thank you!