Your SlideShare is downloading. ×
2	
  
About	
  1010data	
  
•  Founded	
  in	
  2000	
  	
  
•  Based	
  in	
  NYC	
  
•  Big	
  Data	
  analyAcs	
  plaCo...
3	
  
We	
  Host/Analyze	
  14+	
  Trillion	
  Rows	
  of	
  Data	
  
All Quotes and Trades since 2003 on NYSE are done on...
4	
  
A	
  Typical	
  BI	
  Technology	
  Stack	
  
Administrators	
  
Data Sources
ETL	
  
Inter-­‐Enterprise	
  Users	
 ...
5	
  
The	
  Stack	
  Has	
  Fallen!	
  
6	
  
The	
  Analy(cs	
  Con(nuum	
  &	
  	
  
	
   	
   	
  	
  	
  	
  A	
  Single	
  Version	
  of	
  the	
  Truth	
  
7	
  
Intui(ve	
  Access	
  to	
  Unlimited	
  Amounts	
  of	
  Data	
  
Partner	
  
Data	
  
3rd	
  Party	
  
Data	
  
10...
8	
  
The	
  code:	
  	
  Chart	
  1	
  
<layout	
  background_="white"	
  border_="1"	
  height_="525"	
  name="candlesAc...
9	
  
Predic(ve	
  Analy(cs	
  on	
  a	
  Big	
  Data	
  Scale!	
  
	
  
Big	
  Data	
  mandated	
  AnalyAcs	
  and	
  pre...
10	
  
Common	
  Predic(ve	
  Modeling	
  Approach	
  
" CPU	
  intensive	
  &	
  error	
  prone	
  
steps:	
  
	
  
»  Da...
11	
  
“One	
  Segment”	
  =>	
  “A	
  Segment	
  of	
  One”	
  
“Any	
  customer	
  can	
  have	
  a	
  car	
  painted	
 ...
12	
  
Harry	
  Truman	
  displays	
  a	
  copy	
  of	
  the	
  Chicago	
  Daily	
  Tribune	
  newspaper	
  that	
  errone...
13	
  
Build	
  A	
  30	
  Day	
  Shopping	
  List	
  For	
  	
  
Each	
  Loyal	
  Shopper	
  at	
  a	
  Retail	
  Chain	
...
14	
  
If	
  The	
  Shopper	
  Bought	
  “It”	
  Before	
  Will	
  They	
  Buy	
  
“It”	
  Again?	
  
" Classical	
  model...
15	
  
Subscribers	
  are	
  “A	
  Segment	
  Of	
  One”!	
  
16	
  
All	
  sources	
  of	
  Prepay	
  as	
  analyzed	
  in	
  1989	
  
D	
  
R	
  
M	
  
Interest	
  Rates	
  
House	
 ...
17	
  
Quality	
  Measures	
  :	
  Lia	
  =>	
  AUC	
  
18	
  
Fine	
  vs.	
  Coarse:	
  Cash	
  flows	
  
19	
  
InQuery	
  analy(cs	
  –	
  	
  
	
   	
   	
  User	
  Defined	
  Group	
  Func(ons	
  
	
  
•  User	
  defined	
  
−...
20	
  
Ques(ons?	
  
Rethinking classical approaches to analysis and predictive modeling
Upcoming SlideShare
Loading in...5
×

Rethinking classical approaches to analysis and predictive modeling

1,335

Published on

Synopsis:
The speaker will address the need to rethink classical approaches to analysis and predictive modeling. He will examine "iterative analytics" and extremely fine grained segmentation down to a single customer -- ultimately building one model per customer or millions of predictive models delivering on the promise of "segment of one" . The speaker will also address the speed at which all this has to work to maintain a competitive advantage for innovative businesses.

Speaker:
Afshin Goodarzi, Chief Analyst 1010data

A veteran of analytics, Goodarzi has led several teams in designing, building and delivering predictive analytics and business analytical products to a diverse set of industries. Prior to joining 1010data, Goodarzi was the Managing Director of Mortgage at Equifax, responsible for the creation of new data products and supporting analytics to the financial industry. Previously, he led the development of various classes of predictive models aimed at the mortgage industry during his tenure at Loan Performance (Core Logic). Earlier on he had worked at BlackRock, the research center for NYNEX (present day Verizon) and Norkom Technologies. Goodarzi's publications span the fields of data mining, data visualization, optimization and artificial intelligence.

Sponsor:
1010Data [ http://1010data.com ]
Microsoft NERD [ http://microsoftnewengland.com ]
Cognizeus [ http://cognizeus.com ]

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,335
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
15
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Rethinking classical approaches to analysis and predictive modeling"

  1. 1. 1     Predic(ve  Analy(cs  on  a  Big  Data  Scale! Afshin  Goodarzi   afshin@1010data.com     April, 2014
  2. 2. 2   About  1010data   •  Founded  in  2000     •  Based  in  NYC   •  Big  Data  analyAcs  plaCorm  in  the  cloud   •  Library  of  pre-­‐built  analyAcal  applicaAons   •  Speed,  power  and  flexibility  second  to  none  
  3. 3. 3   We  Host/Analyze  14+  Trillion  Rows  of  Data   All Quotes and Trades since 2003 on NYSE are done on 1010data All mortgages ever issued are analyzed on 1010data Nearly all real-estate transactions are completed on 1010data Big Data - Granular Data - Time series Data   All data for ~35,000 Retail outlets across the US are analyzed on 1010data
  4. 4. 4   A  Typical  BI  Technology  Stack   Administrators   Data Sources ETL   Inter-­‐Enterprise  Users   EDW   Data  Cubes/     Marts   ReporAng  /   VisualizaAon   Analysis  /   Modeling  
  5. 5. 5   The  Stack  Has  Fallen!  
  6. 6. 6   The  Analy(cs  Con(nuum  &                A  Single  Version  of  the  Truth  
  7. 7. 7   Intui(ve  Access  to  Unlimited  Amounts  of  Data   Partner   Data   3rd  Party   Data   1010data  Cloud   Corporate   Data   425,369,127,325   Rows!  
  8. 8. 8   The  code:    Chart  1   <layout  background_="white"  border_="1"  height_="525"  name="candlesAck_layout"  relpos_="0,50"  width_="650">          <widget  base_="nyse.trades.hist.all"  class_="graphics"  invmode_="hide"  name="candlesAck"  relpos_="25,25"  update_="manual"  width_="600">              <sel  value="between(date;'{@startdate}';'{@enddate}')"/>              <sel  value="(symbol='{@symbol}')"/>              <tabu  label="Candle  SAck"  breaks="date">                  <break  col="date"  sort="up"/>                  <tcol  source="prc"  fun="wavg"  name="vwap"  weight="vol"  label="VWAP"/>                  <tcol  source="prc"  fun="hi"  name="high"  label="High"/>                  <tcol  source="prc"  fun="lo"  name="low"  label="Low"/>                  <tcol  source="prc"  fun="first"  name="open"  label="Open"/>                  <tcol  source="prc"  fun="last"  name="close"  label="Close"/>              </tabu>              <graphspec>                  <chart  type="candlesAck"  Atle="CandlesAck  Chart  for  {@symbol}">                      <axes  xlabel="Date"  ylabel="Trading  Price"/>                  </chart>              </graphspec>          </widget>          <widget  class_="bulon"  name="candlesAck_refresh"  relpos_="475,475"  submit_="candlesAck"  text_="Refresh"  type_="submit"/>          <widget  class_="field"  label_="Choose  Symbol:"  name="symbol_input"  relpos_="125,475"  value_="@symbol"/>      </layout>   Query  Chart  Spec  
  9. 9. 9   Predic(ve  Analy(cs  on  a  Big  Data  Scale!     Big  Data  mandated  AnalyAcs  and  predicAve  modeling  -­‐  an   example:   The  larger  data  sets  have  mandated  more  rigorous  sampling   strategies  as  tradiAonal  systems  have  not  kept  up  with  the   computaAonal  needs  of    predicAve  analyAc  soluAons  on  Big  Data.       •  Can  we  use  all  but  a  small  holdout  set  in  predicAve  modeling?     •  What  are  the  challenges?   •  What  is  an  approach  that  works?     •  Are  the  results  any  good?   •  Is  this  soluAon  only  applicable  to  one  industry?    
  10. 10. 10   Common  Predic(ve  Modeling  Approach   " CPU  intensive  &  error  prone   steps:     »  Data  selecAon   »  IV  to  DV  relaAonship   »  TransformaAons   »  Sampling  and  validaAon   »  Model  esAmaAon   »  Model  tesAng   »  Repeat   10   hlp://onlinepubs.trb.org/onlinepubs/nchrp/cd-­‐22/v2chapter5.html  
  11. 11. 11   “One  Segment”  =>  “A  Segment  of  One”   “Any  customer  can  have  a  car  painted  any  color  that  he  wants  so  long  as  it  is  black.”     re:  the  Model-­‐T  in  1909  (from  My  Life  and  Work  ,  Henry  Ford,  1922,  Chap.  4,  p.71)  
  12. 12. 12   Harry  Truman  displays  a  copy  of  the  Chicago  Daily  Tribune  newspaper  that  erroneously  reported   the  elecAon  of  Thomas  Dewey  in  1948.  Truman’s  narrow  victory  embarrassed  pollsters,  members   of  his  own  party,  and  the  press  who  had  predicted  a  Dewey  landslide.  
  13. 13. 13   Build  A  30  Day  Shopping  List  For     Each  Loyal  Shopper  at  a  Retail  Chain   Shopper   SKU   Probability  of   purchase  in  the  next   30  days   A.  Smith   12345   90%   A.  Smith   23567   85%   A.  Smith   ….   A.  Smith   87996   30%   POS   Loyalty   Econ  House  prices   Mortgage  Rates   BLS  -­‐  Unemployment   Inventory   With  Permission  from  A&P    
  14. 14. 14   If  The  Shopper  Bought  “It”  Before  Will  They  Buy   “It”  Again?   " Classical  modeling:   variables  as  either   posiAvely  or  negaAvely   correlated  with  target   " Shoppers  don’t  behave  the   same!   " The  demographics   alributes  have   distribuAons  for  each   variable!  
  15. 15. 15   Subscribers  are  “A  Segment  Of  One”!  
  16. 16. 16   All  sources  of  Prepay  as  analyzed  in  1989   D   R   M   Interest  Rates   House  prices   Unemployment   Loan  Age   Cost  of  opAon   Regional  economy   I   hlp://www.freeusandworldmaps.com/html/US_CounAes/US_CounAes.html   hlp://www.tradingeconomics.com/united-­‐states/unemployment-­‐rate   hlp://www.wfa.gov/   hlp://www.richmondfed.org/banking/markets_trends_and_staAsAcs/trends/pdf/delinquency_and_foreclosure_rates.pdf  
  17. 17. 17   Quality  Measures  :  Lia  =>  AUC  
  18. 18. 18   Fine  vs.  Coarse:  Cash  flows  
  19. 19. 19   InQuery  analy(cs  –          User  Defined  Group  Func(ons     •  User  defined   −  KNN   −  Naïve  Bayes   −  ARCH/AR   −  PCA   −  Kernel   −  Decision  Tree   −  LogisAcs  trees   −  FFT   −  Etc……..  
  20. 20. 20   Ques(ons?  

×