Your SlideShare is downloading. ×
0
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The Impact of Cloud Computing on Predictive Analytics 7-29-09 v5

2,348

Published on

This is a talk I gave in San Diego on July 29, 2009 explaining some of the impact and some of the opportunities of cloud computing on predictive analytics.

This is a talk I gave in San Diego on July 29, 2009 explaining some of the impact and some of the opportunities of cloud computing on predictive analytics.

Published in: Technology
1 Comment
5 Likes
Statistics
Notes
  • Good Post
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,348
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
187
Comments
1
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  1. From Data to Decisions: New Strategies for Deploying Analytics Using Clouds <br />Robert Grossman<br />Open Data Group<br />July 29, 2009 <br />
  2. Analytic Strategy<br />Overview<br />Analytics <br />Analytic Infrastructure<br />Cloud computing has changed analytic infrastructure and enabled new classes of analytic algorithms. It’s time to rethink your analytic strategy.<br />
  3. Part 1Quick Review of Clouds<br />3<br />
  4. What is a Cloud?<br />Clouds provide on-demand resources or services over a network, often the Internet, with the scale and reliability of a data center.<br />No standard definition.<br />Cloud architectures are not new.<br />What is new:<br />Scale<br />Ease of use<br />Pricing model.<br />4<br />
  5. 5<br />Scale is new.<br />
  6. Elastic, Usage Based Pricing Is New<br />6<br />costs the same as<br />1 computer in a rack for 120 hours<br />120 computers in three racks for 1 hour<br /><ul><li> Elastic, usage based pricing turns capex into opex.
  7. Clouds can manage surges in computing needs.</li></li></ul><li>Simplicity Offered By the Cloud is New<br />7<br />+<br />.. and you have a computer ready to work.<br />A new programmer can develop a program to process a container full of data with less than day of training using MapReduce.<br />
  8. Two Types of Clouds<br />On-demand resources & services over a network at the scale of a data center<br />On-demand computing instances (IaaS)<br />IaaS: Amazon EC2, S3, etc.; Eucalyptus<br />supports many Web 2.0 applications/users<br />On-demand cloud services for large data cloud applications (PaaS for large data clouds)<br />GFS/MapReduce/Bigtable, Hadoop, Sector, …<br />Manage and compute with large data (say 10+ TB)<br />8<br />
  9. Cloud Architectures – How Do You Fill a Data Center?<br />on-demand computing capacity<br />App<br />App<br />App<br />App<br />App<br />on-demand computing instances<br />Cloud Data Services (BigTable, etc.) <br />Quasi-relational Data Services<br />App<br />App<br />…<br />Cloud Compute Services (MapReduce & Generalizations)<br />App<br />App<br />App<br />App<br />App<br />Cloud Storage Services<br />
  10. What is Analytic Infrastructure ...<br />10<br />Part 2 <br />… and why you should care.<br />
  11. What is Analytics?<br />Short Definition<br />Using data to make decisions.<br />Longer Definition<br />Using data to take actions and make decisions using models that are empirically derived and statistically valid. <br />It is important to understand the difference between reporting and analytics.<br />11<br />
  12. 12<br />Direct Marketing Models<br />Risk Models<br />Online Models<br />
  13. What is the Size of Your Data?<br />Small<br />Fits into memory<br />Medium<br />Too large for memory<br />But fits into a database<br />N.B. databases are designed for safe writing of rows<br />Large<br />To large for a database<br />But can use specialized file system (column-wise)<br />Or storage cloud (Google File System, Hadoop DFS)<br />13<br />
  14. (Very Simplified) Architectural View<br />14<br />Model Producer<br />PMML<br />Model<br />Data<br />The Predictive Model Markup Language (PMML) is an XML language for statistical and data mining models (www.dmg.org).<br />With PMML, it is easy to move models between applications and platforms.<br />
  15. (Simplified) Architectural View<br />15<br />algorithms to estimate models<br />Model Producer<br />Data<br />Data Pre-processing<br />features<br />PMML also supports XML elements to describe data preprocessing.<br />PMML<br />Model<br />
  16. Three Important Interfaces<br />16<br />Modeling Environment<br />2<br />1<br />1<br />Model Producer<br />Data<br />Data Pre-processing<br />PMML<br />Model<br />Deployment Environment <br />2<br />PMML<br />Model<br />3<br />3<br />1<br />Model Consumer<br />Post Processing<br />data<br />actions<br />scores<br />
  17. Actually, This is a Typically a Component in a Workflow<br />17<br />
  18. With the proper analytic infrastructure, cloud computing can be used for data preprocessing, for scoring, for producing models, and as a platform for other services in the analytic infrastructure.<br />18<br />
  19. Cloud Programming Models for Working With Large Data<br />19<br />Part 3 <br />
  20. Map-Reduce Example<br />Both input & output are (key, value) pairs<br />Input is file with one document per record<br />User specifies map function<br />key = document URL<br />Value = terms that document contains<br />“it”, 1“was”, 1“the”, 1“best”, 1<br />(“doc cdickens”,“it was the best of times”)<br />map<br />
  21. Example (cont’d)<br />MapReduce library gathers together all pairs with the same key value (shuffle/sort phase)<br />The user-defined reduce function combines all the values associated with the same key<br />key = “it”values = 1, 1<br />“it”, 2“was”, 2“best”, 1“worst”, 1<br />key = “was”values = 1, 1<br />reduce<br />key = “best”values = 1<br />key = “worst”values = 1<br />
  22. Using Clouds for Scoring (Model Consumers)<br />22<br />Part 4<br />
  23. What is a Statistical/Data Mining Model?<br />Infrastructure<br />Inputs: data attributes, mining attributes<br />Outputs, targets<br />Transformations<br />Segmented models, ensembles of models<br />Models that are part of a standard<br />Trees, SVMs, neural networks, cluster models, etc.<br />In this case, only need to specify parameters<br />Arbitrary models<br />e.g. arbitrary code that takes inputs to outputs<br />23<br />
  24. From an Architectural Viewpoint<br />In an operational environment in which models are being deployed, it may be useful to “Just so no to viewing models as arbitrary code”<br />The deployment can be much shorter if a scoring engine reads a PMML file instead of integrating a new piece of code containing a model.<br />24<br />
  25. Model Producers/Consumers in Clouds<br />Model Consumers take analytic models and use them to score data<br />Very easy to deploy in a cloud<br />Deploy a scoring engine in a cloud and then simply read PMML files<br />Very easy to scale up with cloud surges<br />Model Producers take data and produce models<br />Data parallel applications can be ported to clouds.<br />Others require weighing several factors.<br />25<br />
  26. 26<br />Modeling can be done in-house.<br />Sometimes it makes sense to the pre-processing in the cloud, especially if the data is there.<br />Model Producer<br />Data<br />Data Pre-processing<br />PMML<br />Model<br />PMML<br />Model<br />Scoring engine deployed in a cloud. <br />Model Consumer<br />Post Processing<br />data<br />actions<br />scores<br />
  27. Summary<br />
  28. For More Information<br />Contact information: Robert Grossman<br />blog.rgrossman.com<br />www.rgrossman.com<br />28<br />www.opendatagroup.com<br />

×