Your SlideShare is downloading. ×
0
Copyright @2013, Concurrent, Inc.Paco NathanConcurrent, Inc.San Francisco, CA@pacoid“Functional programmingfor optimizatio...
Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJo...
Q3 1997: inflection pointFour independent teams were working toward horizontalscale-out of workflows based on commodity hard...
RDBMSStakeholderSQL Queryresult setsExcel pivot tablesPowerPoint slide decksWeb AppCustomerstransactionsProductstrategyEng...
RDBMSStakeholderSQL Queryresult setsExcel pivot tablesPowerPoint slide decksWeb AppCustomerstransactionsProductstrategyEng...
RDBMSSQL Queryresult setsrecommenders+classifiersWeb AppscustomertransactionsAlgorithmicModelingLogseventhistoryaggregation...
RDBMSSQL Queryresult setsrecommenders+classifiersWeb AppscustomertransactionsAlgorithmicModelingLogseventhistoryaggregation...
WorkflowRDBMSnear timebatchservicestransactions,contentsocialinteractionsWeb Apps,Mobile, etc.HistoryData Products Customer...
WorkflowRDBMSnear timebatchservicestransactions,contentsocialinteractionsWeb Apps,Mobile, etc.HistoryData Products Customer...
by Leo BreimanStatistical Modeling: TheTwo CulturesStatistical Science, 2001bit.ly/eUTh9Lreferences…10
Amazon“Early Amazon: Splitting the website” – Greg Lindenglinden.blogspot.com/2006/02/early-amazon-splitting-website.htmle...
Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJo...
Cascading – originsAPI author Chris Wensel worked as a system architectat an Enterprise firm well-known for many populardat...
Cascading – functional programmingKey insight: MapReduce is based on functional programming– back to LISP in 1970s. Apache...
Cascading – functional programming• Twitter, eBay, LinkedIn, Nokia, YieldBot, uSwitch, etc.,have invested in open source p...
HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSuppor...
HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSuppor...
HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSuppor...
Cascading – deployments• case studies: Climate Corp, Twitter, Etsy,Williams-Sonoma, uSwitch, Airbnb, Nokia,YieldBot, Squar...
Cascading – deployments• case studies: Climate Corp, Twitter, Etsy,Williams-Sonoma, uSwitch, Airbnb, Nokia,YieldBot, Squar...
Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJo...
void map (String doc_id, String text):for each word w in segment(text):emit(w, "1");void reduce (String word, Iterator gro...
DocumentCollectionWordCountTokenizeGroupBytoken CountRM1 map1 reduce18 lines code gist.github.com/3900702word count – conc...
word count – Cascading app in JavaString docPath = args[ 0 ];String wcPath = args[ 1 ];Properties properties = new Propert...
mapreduceEvery(wc)[Count[decl:count]]Hfs[TextDelimited[[UNKNOWN]->[token, count]]][output/wc]]GroupBy(wc)[by:[token]]Each(...
(ns impatient.core  (:use [cascalog.api]        [cascalog.more-taps :only (hfs-delimited)])  (:require [clojure.string :as...
github.com/nathanmarz/cascalog/wiki• implements Datalog in Clojure, with predicates backedby Cascading – for a highly decl...
import com.twitter.scalding._ class WordCount(args : Args) extends Job(args) {Tsv(args("doc"),(doc_id, text),skipHeader = ...
github.com/twitter/scalding/wiki• extends the Scala collections API so that distributed listsbecome “pipes” backed by Casc...
Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJo...
workflow abstraction – pattern languageCascading uses a “plumbing” metaphor in the Java API,to define workflows out of famili...
references…pattern language: a structured method for solvinglarge, complex design problems, where the syntax ofthe languag...
workflow abstraction – literate programmingCascading workflows generate their own visualdocumentation: flow diagramsIn formal...
references…by Don KnuthLiterate ProgrammingUniv of Chicago Press, 1992literateprogramming.com/“Instead of imagining that o...
workflow abstraction – business processFollowing the essence of literate programming, Cascadingworkflows provide statements ...
references…by Edgar Codd“A relational model of data for large shared data banks”Communications of the ACM, 1970dl.acm.org/...
workflow abstraction – functional relational programmingThe combination of functional programming, pattern language,DSLs, l...
workflow abstraction – functional relational programmingThe combination of functional programming, pattern language,DSLs, l...
source: National Geographic“A kind of Cambrian explosion”algorithmic modeling + machine data+ curation, metadata + Open Da...
A Thought ExerciseConsider that when a company like Catepillar movesinto data science, they won’t be building the world’sn...
Alternatively…climate.com41
Two Avenues to the App Layer:scale ➞complexity➞Enterprise: must contend withcomplexity at scale everyday…incumbents extend...
Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJo...
Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLd...
Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLd...
HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSuppor...
Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLd...
HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSuppor...
Lingual – shell prompt, catalogcascading.org/lingual49
Lingual – queriescascading.org/lingual50
# load the JDBC packagelibrary(RJDBC) # set up the driverdrv <- JDBC("cascading.lingual.jdbc.Driver","~/src/concur/lingual...
> summary(df$hire_age)Min. 1st Qu. Median Mean 3rd Qu. Max.20.86 27.89 31.70 31.61 35.01 43.92Lingual – connecting Hadoop ...
Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLd...
Cascading workflows – business logicScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHa...
Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLd...
HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSuppor...
## train a RandomForest model f <- as.formula("as.factor(label) ~ .")fit <- randomForest(f, data_train, ntree=50) ## test ...
public class Main {public static void main( String[] args ) {  String pmmlPath = args[ 0 ];  String ordersPath = args[ 1 ]...
CustomerOrdersClassifyScoredOrdersGroupBytokenCountPMMLModelM RFailureTrapsAssertConfusionMatrixPattern – score a model, u...
PMML – vendor coverage60
ETLdatapreppredictivemodeldatasourcesendusesLingual:DW → ANSI SQLPattern:SAS, R, etc. → PMMLbusiness logic in Java,Clojure...
Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJo...
Palo Alto is quite a pleasant place• temperate weather• lots of parks, enormous trees• great coffeehouses• walkable downto...
1. Open Data about municipal infrastructure(GIS data: trees, roads, parks)✚2. Big Data about where people like to walk(sma...
The City of Palo Alto recently began to support Open Datato give the local community greater visibility into howtheir city...
GIS about trees in Palo Alto:discovery66
Geographic_Information,,,"Tree: 29 site 2 at 203 ADDISON AV, on ADDISON AV 44 from pl"," Private: -1 Tree ID: 29Street_Nam...
(defn parse-gis [line]"leverages parse-csv for complex CSV format in GIS export"(first (csv/parse-csv line)))  (defn etl-g...
discovery(ad-hoc queries get refinedinto composable predicates)Identifier: 474Tree ID: 412Tree: 412 site 1 at 115 HAWTHORN...
discovery(curate valuable metadata)70
(defn get-trees [src trap tree_meta]"subquery to parse/filter the tree data"(<- [?blurb ?tree_id ?situs ?tree_site?species...
// run analysis and visualization in Rlibrary(ggplot2)dat_folder <- ~/src/concur/CoPA/out/treedata <- read.table(file=past...
discoverysweetgumanalysis of the tree data:73
MtreeGISexportRegexparse-gissrcScrubspeciesGeohashRegexparse-treetreeTreeMetadataJoinFailureTrapsEstimateheightMdiscovery(...
9q9jh0geohash with 6-digit resolutionapproximates a 5-block squarecentered lat: 37.445, lng: -122.162modeling75
Each road in the GIS export is listed as a block between twocross roads, and each may have multiple road segments torepres...
9q9jh0X XXFilter trees which are too far away to provide shade. Calculate a sumof moments for tree height × distance, as a...
(defn get-shade [trees roads]"subquery to join tree and road estimates, maximize for shade"(<- [?road_name ?geohash ?road_...
MtreeJoinCalculatedistanceshadeFilterheightSummomentREstimatetrafficRroadFilterdistanceM MFiltersum_moment(flow diagram, s...
(defn get-gps [gps_logs trap]"subquery to aggregate and rank GPS tracks per user"(<- [?uuid ?geohash ?gps_count ?recent_vi...
Recommenders often combine multiple signals, via weightedaverages, to rank personalized results:• GPS of person ∩ road seg...
‣ addr: 115 HAWTHORNE AVE‣ lat/lng: 37.446, -122.168‣ geohash: 9q9jh0‣ tree: 413 site 2‣ species: Liquidambar styraciflua‣ ...
Enterprise DataWorkflowswith CascadingO’Reilly, 2013amazon.com/dp/1449358721references…83
blog, dev community, code/wiki/gists, maven repo,commercial products, career opportunities:cascading.orgzest.to/group11git...
Upcoming SlideShare
Loading in...5
×

Functional programming
 for optimization problems 
in Big Data

9,417

Published on

Enterprise Data Workflows with Cascading.

Silicon Valley Cloud Computing Meetup talk at Cloud Tech IV, 4/20 2013
http://www.meetup.com/cloudcomputing/events/111082032/

Published in: Technology
4 Comments
11 Likes
Statistics
Notes
No Downloads
Views
Total Views
9,417
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
50
Comments
4
Likes
11
Embeds 0
No embeds

No notes for slide

Transcript of "Functional programming
 for optimization problems 
in Big Data"

  1. 1. Copyright @2013, Concurrent, Inc.Paco NathanConcurrent, Inc.San Francisco, CA@pacoid“Functional programmingfor optimization problemsin Big Data”1
  2. 2. Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMRMachine DataCascadingSample CodeA Little Theory…WorkflowsOpen Data Example2
  3. 3. Q3 1997: inflection pointFour independent teams were working toward horizontalscale-out of workflows based on commodity hardware.This effort prepared the way for huge Internet successesin the 1997 holiday season… AMZN, EBAY, Inktomi(YHOO Search), then GOOGMapReduce and the Apache Hadoop open source stackemerged from this.3
  4. 4. RDBMSStakeholderSQL Queryresult setsExcel pivot tablesPowerPoint slide decksWeb AppCustomerstransactionsProductstrategyEngineeringrequirementsBIAnalystsoptimizedcodeCirca 1996: pre- inflection point4
  5. 5. RDBMSStakeholderSQL Queryresult setsExcel pivot tablesPowerPoint slide decksWeb AppCustomerstransactionsProductstrategyEngineeringrequirementsBIAnalystsoptimizedcodeCirca 1996: pre- inflection point“Throw it over the wall”5
  6. 6. RDBMSSQL Queryresult setsrecommenders+classifiersWeb AppscustomertransactionsAlgorithmicModelingLogseventhistoryaggregationdashboardsProductEngineeringUXStakeholder CustomersDW ETLMiddlewareservletsmodelsCirca 2001: post- big ecommerce successes6
  7. 7. RDBMSSQL Queryresult setsrecommenders+classifiersWeb AppscustomertransactionsAlgorithmicModelingLogseventhistoryaggregationdashboardsProductEngineeringUXStakeholder CustomersDW ETLMiddlewareservletsmodelsCirca 2001: post- big ecommerce successes“Data products”7
  8. 8. WorkflowRDBMSnear timebatchservicestransactions,contentsocialinteractionsWeb Apps,Mobile, etc.HistoryData Products CustomersRDBMSLogEventsIn-MemoryData GridHadoop,etc.Cluster SchedulerProdEngDWUse Cases Across Topologiess/wdevdatasciencediscovery+modelingPlannerOpsdashboardmetricsbusinessprocessoptimizedcapacitytapsDataScientistApp DevOpsDomainExpertintroducedcapabilityexistingSDLCCirca 2013: clusters everywhere8
  9. 9. WorkflowRDBMSnear timebatchservicestransactions,contentsocialinteractionsWeb Apps,Mobile, etc.HistoryData Products CustomersRDBMSLogEventsIn-MemoryData GridHadoop,etc.Cluster SchedulerProdEngDWUse Cases Across Topologiess/wdevdatasciencediscovery+modelingPlannerOpsdashboardmetricsbusinessprocessoptimizedcapacitytapsDataScientistApp DevOpsDomainExpertintroducedcapabilityexistingSDLCCirca 2013: clusters everywhere“Optimizing topologies”9
  10. 10. by Leo BreimanStatistical Modeling: TheTwo CulturesStatistical Science, 2001bit.ly/eUTh9Lreferences…10
  11. 11. Amazon“Early Amazon: Splitting the website” – Greg Lindenglinden.blogspot.com/2006/02/early-amazon-splitting-website.htmleBay“The eBay Architecture” – Randy Shoup, Dan Pritchettaddsimplicity.com/adding_simplicity_an_engi/2006/11/you_scaled_your.htmladdsimplicity.com.nyud.net:8080/downloads/eBaySDForum2006-11-29.pdfInktomi (YHOO Search)“Inktomi’s Wild Ride” – Erik Brewer (0:05:31 ff)youtube.com/watch?v=E91oEn1bnXMGoogle“Underneath the Covers at Google” – Jeff Dean (0:06:54 ff)youtube.com/watch?v=qsan-GQaeykperspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspxreferences…11
  12. 12. Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMRMachine DataCascadingSample CodeA Little Theory…WorkflowsOpen Data Example12
  13. 13. Cascading – originsAPI author Chris Wensel worked as a system architectat an Enterprise firm well-known for many populardata products.Wensel was following the Nutch open source project –where Hadoop started.Observation: would be difficult to find Java developersto write complex Enterprise apps in MapReduce –a potential blocker for leveraging new open sourcetechnology.13
  14. 14. Cascading – functional programmingKey insight: MapReduce is based on functional programming– back to LISP in 1970s. Apache Hadoop use cases aremostly about data pipelines, which are functional in nature.To ease staffing problems as “Main Street” Enterprise firmsbegan to embrace Hadoop, Cascading was introducedin late 2007, as a new Java API to implement functionalprogramming for large-scale data workflows:• leverages JVM and Java-based tools without anyneed to create new languages• allows programmers who have J2EE expertiseto leverage the economics of Hadoop clusters14
  15. 15. Cascading – functional programming• Twitter, eBay, LinkedIn, Nokia, YieldBot, uSwitch, etc.,have invested in open source projects atop Cascading– used for their large-scale production deployments• new case studies for Cascading apps are mostlybased on domain-specific languages (DSLs) in JVMlanguages which emphasize functional programming:Cascalog in Clojure (2010)Scalding in Scala (2012)github.com/nathanmarz/cascalog/wikigithub.com/twitter/scalding/wikiWhy Adopting the Declarative Programming PracticesWill ImproveYour Return fromTechnologyDan Woods, 2013-04-17 Forbesforbes.com/sites/danwoods/2013/04/17/why-adopting-the-declarative-programming-practices-will-improve-your-return-from-technology/15
  16. 16. HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSupportWebAppReportingAnalyticsCubessinktapModeling PMMLCascading – definitions• a pattern language for Enterprise Data Workflows• simple to build, easy to test, robust in production• design principles ⟹ ensure best practices at scale16
  17. 17. HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSupportWebAppReportingAnalyticsCubessinktapModeling PMMLCascading – usage• Java API, DSLs in Scala, Clojure,Jython, JRuby, Groovy,ANSI SQL• ASL 2 license, GitHub src,http://conjars.org• 5+ yrs production use,multiple Enterprise verticals17
  18. 18. HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSupportWebAppReportingAnalyticsCubessinktapModeling PMMLCascading – integrations• partners: Microsoft Azure, Hortonworks,Amazon AWS, MapR, EMC, SpringSource,Cloudera• taps: Memcached, Cassandra, MongoDB,HBase, JDBC, Parquet, etc.• serialization: Avro, Thrift, Kryo,JSON, etc.• topologies: Apache Hadoop,tuple spaces, local mode18
  19. 19. Cascading – deployments• case studies: Climate Corp, Twitter, Etsy,Williams-Sonoma, uSwitch, Airbnb, Nokia,YieldBot, Square, Harvard, Factual, etc.• use cases: ETL, marketing funnel, anti-fraud,social media, retail pricing, search analytics,recommenders, eCRM, utility grids, telecom,genomics, climatology, agronomics, etc.19
  20. 20. Cascading – deployments• case studies: Climate Corp, Twitter, Etsy,Williams-Sonoma, uSwitch, Airbnb, Nokia,YieldBot, Square, Harvard, Factual, etc.• use cases: ETL, marketing funnel, anti-fraud,social media, retail pricing, search analytics,recommenders, eCRM, utility grids, telecom,genomics, climatology, agronomics, etc.workflow abstraction addresses:• staffing bottleneck;• system integration;• operational complexity;• test-driven development20
  21. 21. Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMRMachine DataCascadingSample CodeA Little Theory…WorkflowsOpen Data Example21
  22. 22. void map (String doc_id, String text):for each word w in segment(text):emit(w, "1");void reduce (String word, Iterator group):int count = 0;for each pc in group:count += Int(pc);emit(word, String(count));The Ubiquitous Word CountDefinition:count how often each word appearsin a collection of text documentsThis simple program provides an excellent test case forparallel processing, since it illustrates:• requires a minimal amount of code• demonstrates use of both symbolic and numeric values• shows a dependency graph of tuples as an abstraction• is not many steps away from useful search indexing• serves as a “HelloWorld” for Hadoop appsAny distributed computing framework which can runWordCount efficiently in parallel at scale can handle muchlarger and more interesting compute problems.DocumentCollectionWordCountTokenizeGroupBytoken CountRMcount how often each word appearsin a collection of text documents22
  23. 23. DocumentCollectionWordCountTokenizeGroupBytoken CountRM1 map1 reduce18 lines code gist.github.com/3900702word count – conceptual flow diagramcascading.org/category/impatient23
  24. 24. word count – Cascading app in JavaString docPath = args[ 0 ];String wcPath = args[ 1 ];Properties properties = new Properties();AppProps.setApplicationJarClass( properties, Main.class );HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );// create source and sink tapsTap docTap = new Hfs( new TextDelimited( true, "t" ), docPath );Tap wcTap = new Hfs( new TextDelimited( true, "t" ), wcPath );// specify a regex to split "document" text lines into token streamFields token = new Fields( "token" );Fields text = new Fields( "text" );RegexSplitGenerator splitter = new RegexSplitGenerator( token, "[ [](),.]" );// only returns "token"Pipe docPipe = new Each( "token", text, splitter, Fields.RESULTS );// determine the word countsPipe wcPipe = new Pipe( "wc", docPipe );wcPipe = new GroupBy( wcPipe, token );wcPipe = new Every( wcPipe, Fields.ALL, new Count(), Fields.ALL );// connect the taps, pipes, etc., into a flowFlowDef flowDef = FlowDef.flowDef().setName( "wc" ).addSource( docPipe, docTap ) .addTailSink( wcPipe, wcTap );// write a DOT file and run the flowFlow wcFlow = flowConnector.connect( flowDef );wcFlow.writeDOT( "dot/wc.dot" );wcFlow.complete();DocumentCollectionWordCountTokenizeGroupBytoken CountRM24
  25. 25. mapreduceEvery(wc)[Count[decl:count]]Hfs[TextDelimited[[UNKNOWN]->[token, count]]][output/wc]]GroupBy(wc)[by:[token]]Each(token)[RegexSplitGenerator[decl:token][args:1]]Hfs[TextDelimited[[doc_id, text]->[ALL]]][data/rain.txt]][head][tail][{2}:token, count][{1}:token][{2}:doc_id, text][{2}:doc_id, text]wc[{1}:token][{1}:token][{2}:token, count][{2}:token, count][{1}:token][{1}:token]word count – generated flow diagramDocumentCollectionWordCountTokenizeGroupBytoken CountRM25
  26. 26. (ns impatient.core  (:use [cascalog.api]        [cascalog.more-taps :only (hfs-delimited)])  (:require [clojure.string :as s]            [cascalog.ops :as c])  (:gen-class))(defmapcatop split [line]  "reads in a line of string and splits it by regex"  (s/split line #"[[](),.)s]+"))(defn -main [in out & args]  (?<- (hfs-delimited out)       [?word ?count]       ((hfs-delimited in :skip-header? true) _ ?line)       (split ?line :> ?word)       (c/count ?count))); Paul Lam; github.com/Quantisan/Impatientword count – Cascalog / ClojureDocumentCollectionWordCountTokenizeGroupBytoken CountRM26
  27. 27. github.com/nathanmarz/cascalog/wiki• implements Datalog in Clojure, with predicates backedby Cascading – for a highly declarative language• run ad-hoc queries from the Clojure REPL –approx. 10:1 code reduction compared with SQL• composable subqueries, used for test-driven development(TDD) practices at scale• Leiningen build: simple, no surprises, in Clojure itself• more new deployments than other Cascading DSLs –Climate Corp is largest use case: 90% Clojure/Cascalog• has a learning curve, limited number of Clojure developers• aggregators are the magic, and those take effort to learnword count – Cascalog / ClojureDocumentCollectionWordCountTokenizeGroupBytoken CountRM27
  28. 28. import com.twitter.scalding._ class WordCount(args : Args) extends Job(args) {Tsv(args("doc"),(doc_id, text),skipHeader = true).read.flatMap(text -> token) {text : String => text.split("[ [](),.]")}.groupBy(token) { _.size(count) }.write(Tsv(args("wc"), writeHeader = true))}word count – Scalding / ScalaDocumentCollectionWordCountTokenizeGroupBytoken CountRM28
  29. 29. github.com/twitter/scalding/wiki• extends the Scala collections API so that distributed listsbecome “pipes” backed by Cascading• code is compact, easy to understand• nearly 1:1 between elements of conceptual flow diagramand function calls• extensive libraries are available for linear algebra, abstractalgebra, machine learning – e.g., Matrix API, Algebird, etc.• significant investments by Twitter, Etsy, eBay, etc.• great for data services at scale• less learning curve than Cascalogword count – Scalding / ScalaDocumentCollectionWordCountTokenizeGroupBytoken CountRM29
  30. 30. Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMRMachine DataCascadingSample CodeA Little Theory…WorkflowsOpen Data Example30
  31. 31. workflow abstraction – pattern languageCascading uses a “plumbing” metaphor in the Java API,to define workflows out of familiar elements: Pipes, Taps,Tuple Flows, Filters, Joins, Traps, etc.ScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMRData is represented as flows of tuples. Operations withinthe flows bring functional programming aspects into JavaIn formal terms, this provides a pattern language31
  32. 32. references…pattern language: a structured method for solvinglarge, complex design problems, where the syntax ofthe language promotes the use of best practicesamazon.com/dp/0195019199design patterns: the notion originated in consensusnegotiation for architecture, later applied in OOPsoftware engineering by “Gang of Four”amazon.com/dp/020163361232
  33. 33. workflow abstraction – literate programmingCascading workflows generate their own visualdocumentation: flow diagramsIn formal terms, flow diagrams leverage a methodologycalled literate programmingProvides intuitive, visual representations for apps –great for cross-team collaborationScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMR33
  34. 34. references…by Don KnuthLiterate ProgrammingUniv of Chicago Press, 1992literateprogramming.com/“Instead of imagining that our main task isto instruct a computer what to do, let usconcentrate rather on explaining to humanbeings what we want a computer to do.”34
  35. 35. workflow abstraction – business processFollowing the essence of literate programming, Cascadingworkflows provide statements of business processThis recalls a sense of business process managementfor Enterprise apps (think BPM/BPEL for Big Data)Cascading creates a separation of concerns betweenbusiness process and implementation details (Hadoop, etc.)This is especially apparent in large-scale Cascalog apps:“Specify what you require, not how to achieve it.”By virtue of the pattern language, the flow planner thendetermines how to translate business process into efficient,parallel jobs at scale35
  36. 36. references…by Edgar Codd“A relational model of data for large shared data banks”Communications of the ACM, 1970dl.acm.org/citation.cfm?id=362685Rather than arguing between SQL vs. NoSQL…structured vs. unstructured data frameworks…this approach focuses on what apps do:the process of structuring data36
  37. 37. workflow abstraction – functional relational programmingThe combination of functional programming, pattern language,DSLs, literate programming, business process, etc., traces backto the original definition of the relational model (Codd, 1970)prior to SQL.Cascalog, in particular, implements more of what Codd intendedfor a “data sublanguage” and is considered to be close to a fullimplementation of the functional relational programmingparadigm defined in:Moseley & Marks, 2006“Out of theTar Pit”goo.gl/SKspn37
  38. 38. workflow abstraction – functional relational programmingThe combination of functional programming, pattern language,DSLs, literate programming, business process, etc., traces backto the original definition of the relational model (Codd, 1970)prior to SQL.Cascalog, in particular, implements more of what Codd intendedfor a “data sublanguage” and is considered to be close to a fullimplementation of the functional relational programmingparadigm defined in:Moseley & Marks, 2006“Out of theTar Pit”goo.gl/SKspnseveral theoretical aspects convergeinto software engineering practiceswhich minimize the complexity ofbuilding and maintaining Enterprisedata workflows38
  39. 39. source: National Geographic“A kind of Cambrian explosion”algorithmic modeling + machine data+ curation, metadata + Open Dataevolution of feedback loopsinternet of things + complex analyticsaccelerated evolution, additional feedback loops39
  40. 40. A Thought ExerciseConsider that when a company like Catepillar movesinto data science, they won’t be building the world’snext search engine or social networkThey will be optimizing supply chain, optimizing fuelcosts, automating data feedback loops integratedinto their equipment…Operations Research –crunching amazing amounts of data$50B company, in a $250B market segmentUpcoming: tractors as drones –guided by complex, distributed data apps40
  41. 41. Alternatively…climate.com41
  42. 42. Two Avenues to the App Layer:scale ➞complexity➞Enterprise: must contend withcomplexity at scale everyday…incumbents extend current practices andinfrastructure investments – using J2EE,ANSI SQL, SAS, etc. – to migrateworkflows onto Apache Hadoop whileleveraging existing staffStart-ups: crave complexity andscale to become viable…new ventures move into Enterprise spaceto compete using relatively lean staff,while leveraging sophisticated engineeringpractices, e.g., Cascalog and Scalding42
  43. 43. Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMRMachine DataCascadingSample CodeA Little Theory…WorkflowsOpen Data Example43
  44. 44. Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLdatapreppredictivemodeldatasourcesenduses44
  45. 45. Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLdatapreppredictivemodeldatasourcesendusessystem integration45
  46. 46. HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSupportWebAppReportingAnalyticsCubessinktapModeling PMMLCascading workflows – taps• taps integrate other data frameworks, as tuple streams• these are “plumbing” endpoints in the pattern language• sources (inputs), sinks (outputs), traps (exceptions)• text delimited, JDBC, Memcached,HBase, Cassandra, MongoDB, etc.• data serialization: Avro, Thrift,Kryo, JSON, etc.• extend a new kind of tap in justa few lines of Javaschema and provenance getderived from analysis of the taps46
  47. 47. Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLdatapreppredictivemodeldatasourcesendusesANSI SQL for ETL47
  48. 48. HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSupportWebAppReportingAnalyticsCubessinktapModeling PMMLCascading workflows – ANSI SQL• collab with Optiq – industry-proven code base• ANSI SQL parser/optimizer atop Cascadingflow planner• JDBC driver to integrate into existingtools and app servers• relational catalog over a collectionof unstructured data• SQL shell prompt to run queries• enable analysts without retrainingon Hadoop, etc.• transparency for Support, Ops,Finance, et al.a language for queries – not a database,but ANSI SQL as a DSL for workflows48
  49. 49. Lingual – shell prompt, catalogcascading.org/lingual49
  50. 50. Lingual – queriescascading.org/lingual50
  51. 51. # load the JDBC packagelibrary(RJDBC) # set up the driverdrv <- JDBC("cascading.lingual.jdbc.Driver","~/src/concur/lingual/lingual-local/build/libs/lingual-local-1.0.0-wip-dev-jdbc.jar") # set up a database connection to a local repositoryconnection <- dbConnect(drv,"jdbc:lingual:local;catalog=~/src/concur/lingual/lingual-examples/tables;schema=EMPLOYEES") # query the repository: in this case the MySQL sample database (CSV files)df <- dbGetQuery(connection,"SELECT * FROM EMPLOYEES.EMPLOYEES WHERE FIRST_NAME = Gina")head(df) # use R functions to summarize and visualize part of the datadf$hire_age <- as.integer(as.Date(df$HIRE_DATE) - as.Date(df$BIRTH_DATE)) / 365.25summary(df$hire_age)library(ggplot2)m <- ggplot(df, aes(x=hire_age))m <- m + ggtitle("Age at hire, people named Gina")m + geom_histogram(binwidth=1, aes(y=..density.., fill=..count..)) + geom_density()Lingual – connecting Hadoop and R51
  52. 52. > summary(df$hire_age)Min. 1st Qu. Median Mean 3rd Qu. Max.20.86 27.89 31.70 31.61 35.01 43.92Lingual – connecting Hadoop and Rcascading.org/lingual52
  53. 53. Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLdatapreppredictivemodeldatasourcesendusesJ2EE for business logic53
  54. 54. Cascading workflows – business logicScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMR54
  55. 55. Anatomy of an Enterprise appDefinition a typical Enterprise workflow crosses through multipledepartments and frameworks…ETLdatapreppredictivemodeldatasourcesendusesSAS for predictive models55
  56. 56. HadoopClustersourcetapsourcetap sinktaptraptapcustomerprofile DBsCustomerPrefslogslogsLogsDataWorkflowCacheCustomersSupportWebAppReportingAnalyticsCubessinktapModeling PMMLPattern – model scoring• migrate workloads: SAS,Teradata, etc.,exporting predictive models as PMML• great open source tools – R, Weka,KNIME, Matlab, RapidMiner, etc.• integrate with other libraries –Matrix API, etc.• leverage PMML as another kindof DSLcascading.org/pattern56
  57. 57. ## train a RandomForest model f <- as.formula("as.factor(label) ~ .")fit <- randomForest(f, data_train, ntree=50) ## test the model on the holdout test set print(fit$importance)print(fit) predicted <- predict(fit, data)data$predicted <- predictedconfuse <- table(pred = predicted, true = data[,1])print(confuse) ## export predicted labels to TSV write.table(data, file=paste(dat_folder, "sample.tsv", sep="/"),quote=FALSE, sep="t", row.names=FALSE) ## export RF model to PMML saveXML(pmml(fit), file=paste(dat_folder, "sample.rf.xml", sep="/"))Pattern – create a model in R57
  58. 58. public class Main {public static void main( String[] args ) {  String pmmlPath = args[ 0 ];  String ordersPath = args[ 1 ];  String classifyPath = args[ 2 ];  String trapPath = args[ 3 ];  Properties properties = new Properties();  AppProps.setApplicationJarClass( properties, Main.class );  HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );  // create source and sink taps  Tap ordersTap = new Hfs( new TextDelimited( true, "t" ), ordersPath );  Tap classifyTap = new Hfs( new TextDelimited( true, "t" ), classifyPath );  Tap trapTap = new Hfs( new TextDelimited( true, "t" ), trapPath );  // define a "Classifier" model from PMML to evaluate the orders  ClassifierFunction classFunc = new ClassifierFunction( new Fields( "score" ), pmmlPath );  Pipe classifyPipe = new Each( new Pipe( "classify" ), classFunc.getInputFields(), classFunc, Fields.ALL );  // connect the taps, pipes, etc., into a flow  FlowDef flowDef = FlowDef.flowDef().setName( "classify" )   .addSource( classifyPipe, ordersTap )   .addTrap( classifyPipe, trapTap )   .addSink( classifyPipe, classifyTap );  // write a DOT file and run the flow  Flow classifyFlow = flowConnector.connect( flowDef );  classifyFlow.writeDOT( "dot/classify.dot" );  classifyFlow.complete();}}Pattern – score a model, within an app58
  59. 59. CustomerOrdersClassifyScoredOrdersGroupBytokenCountPMMLModelM RFailureTrapsAssertConfusionMatrixPattern – score a model, using pre-defined Cascading appcascading.org/pattern59
  60. 60. PMML – vendor coverage60
  61. 61. ETLdatapreppredictivemodeldatasourcesendusesLingual:DW → ANSI SQLPattern:SAS, R, etc. → PMMLbusiness logic in Java,Clojure, Scala, etc.sink taps forMemcached, HBase,MongoDB, etc.source taps forCassandra, JDBC,Splunk, etc.Anatomy of an Enterprise appCascading allows multiple departments to integrate their workflowcomponents into one app, one JAR file61
  62. 62. Cascading: Workflow AbstractionScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMRMachine DataCascadingSample CodeA Little Theory…WorkflowsOpen Data Example62
  63. 63. Palo Alto is quite a pleasant place• temperate weather• lots of parks, enormous trees• great coffeehouses• walkable downtown• not particularly crowdedOn a nice summer day, who wants to be stuckindoors on a phone call?Instead, take it outside – go for a walkAnd example open source project:github.com/Cascading/CoPA/wiki63
  64. 64. 1. Open Data about municipal infrastructure(GIS data: trees, roads, parks)✚2. Big Data about where people like to walk(smartphone GPS logs)✚3. some curated metadata(which surfaces the value)4. personalized recommendations:“Find a shady spot on a summer day in which to walknear downtown Palo Alto.While on a long conference call.Sipping a latte or enjoying some fro-yo.”ScrubtokenDocumentCollectionTokenizeWordCountGroupBytokenCountStop WordListRegextokenHashJoinLeftRHSMR64
  65. 65. The City of Palo Alto recently began to support Open Datato give the local community greater visibility into howtheir city government operatesThis effort is intended to encourage students, entrepreneurs,local organizations, etc., to build new apps which contributeto the public goodpaloalto.opendata.junar.com/dashboards/7576/geographic-information/discovery65
  66. 66. GIS about trees in Palo Alto:discovery66
  67. 67. Geographic_Information,,,"Tree: 29 site 2 at 203 ADDISON AV, on ADDISON AV 44 from pl"," Private: -1 Tree ID: 29Street_Name: ADDISON AV Situs Number: 203 Tree Site: 2 Species: Celtis australisSource: davey tree Protected: Designated: Heritage: Appraised Value:Hardscape: None Identifier: 40 Active Numeric: 1 Location Feature ID: 13872Provisional: Install Date: ","37.4409634615283,-122.15648458861,0.0 ","Point""Wilkie Way from West Meadow Drive to Victoria Place"," Sequence: 20 Street_Name: WilkieWay From Street PMMS: West Meadow Drive To Street PMMS: Victoria Place Street ID:598 (Wilkie Wy, Palo Alto) From Street ID PMMS: 689 To Street ID PMMS: 567 YearConstructed: 1950 Traffic Count: 596 Traffic Index: residential local TrafficClass: local residential Traffic Date: 08/24/90 Paving Length: 208 Paving Width:40 Paving Area: 8320 Surface Type: asphalt concrete Surface Thickness: 2.0 BaseType Pvmt: crusher run base Base Thickness: 6.0 Soil Class: 2 Soil Value: 15Curb Type: Curb Thickness: Gutter Width: 36.0 Book: 22 Page: 1 DistrictNumber: 18 Land Use PMMS: 1 Overlay Year: 1990 Overlay Thickness: 1.5 BaseFailure Year: 1990 Base Failure Thickness: 6 Surface Treatment Year: SurfaceTreatment Type: Alligator Severity: none Alligator Extent: 0 Block Severity:none Block Extent: 0 Longitude and Transverse Severity: none Longitude and TransverseExtent: 0 Ravelling Severity: none Ravelling Extent: 0 Ridability Severity: noneTrench Severity: none Trench Extent: 0 Rutting Severity: none Rutting Extent: 0Road Performance: UL (Urban Local) Bike Lane: 0 Bus Route: 0 Truck Route: 0Remediation: Deduct Value: 100 Priority: Pavement Condition: excellentStreet Cut Fee per SqFt: 10.00 Source Date: 6/10/2009 User Modified By: mnicolsIdentifier System: 21410 ","-122.1249640794,37.4155803115645,0.0-122.124661859039,37.4154224594993,0.0 -122.124587720719,37.4153758330704,0.0-122.12451895942,37.4153242300888,0.0 -122.124456098457,37.4152680432944,0.0-122.124399616238,37.4152077003122,0.0 -122.124374937753,37.4151774433318,0.0 ","Line"discovery(unstructured data…)67
  68. 68. (defn parse-gis [line]"leverages parse-csv for complex CSV format in GIS export"(first (csv/parse-csv line)))  (defn etl-gis [gis trap]"subquery to parse data sets from the GIS source tap"(<- [?blurb ?misc ?geo ?kind](gis ?line)(parse-gis ?line :> ?blurb ?misc ?geo ?kind)(:trap (hfs-textline trap))))discovery(specify what you require,not how to achieve it…80/20 rule of data prep cost)68
  69. 69. discovery(ad-hoc queries get refinedinto composable predicates)Identifier: 474Tree ID: 412Tree: 412 site 1 at 115 HAWTHORNE AVTree Site: 1Street_Name: HAWTHORNE AVSitus Number: 115Private: -1Species: Liquidambar styracifluaSource: davey treeHardscape: None37.446001565119,-122.167713417554,0.0Point69
  70. 70. discovery(curate valuable metadata)70
  71. 71. (defn get-trees [src trap tree_meta]"subquery to parse/filter the tree data"(<- [?blurb ?tree_id ?situs ?tree_site?species ?wikipedia ?calflora ?avg_height?tree_lat ?tree_lng ?tree_alt ?geohash](src ?blurb ?misc ?geo ?kind)(re-matches #"^s+Private.*Tree ID.*" ?misc)(parse-tree ?misc :> _ ?priv ?tree_id ?situs ?tree_site ?raw_species)((c/comp s/trim s/lower-case) ?raw_species :> ?species)(tree_meta ?species ?wikipedia ?calflora ?min_height ?max_height)(avg ?min_height ?max_height :> ?avg_height)(geo-tree ?geo :> _ ?tree_lat ?tree_lng ?tree_alt)(read-string ?tree_lat :> ?lat)(read-string ?tree_lng :> ?lng)(geohash ?lat ?lng :> ?geohash)(:trap (hfs-textline trap))))discovery?blurb!! Tree: 412 site 1 at 115 HAWTHORNE AV, on HAWTHORNE AV 22?tree_id! " 412?situs"" 115?tree_site" 1?species" " liquidambar styraciflua?wikipedia" http://en.wikipedia.org/wiki/Liquidambar_styraciflua?calflora http://calflora.org/cgi-bin/species_query.cgi?where-calre?avg_height" 27.5?tree_lat" 37.446001565119?tree_lng" -122.167713417554?tree_alt" 0.0?geohash" " 9q9jh071
  72. 72. // run analysis and visualization in Rlibrary(ggplot2)dat_folder <- ~/src/concur/CoPA/out/treedata <- read.table(file=paste(dat_folder, "part-00000", sep="/"),sep="t", quote="", na.strings="NULL", header=FALSE, encoding="UTF8") summary(data)t <- head(sort(table(data$V5), decreasing=TRUE)trees <- as.data.frame.table(t, n=20))colnames(trees) <- c("species", "count") m <- ggplot(data, aes(x=V8))m <- m + ggtitle("Estimated Tree Height (meters)")m + geom_histogram(aes(y = ..density.., fill = ..count..)) + geom_density() par(mar = c(7, 4, 4, 2) + 0.1)plot(trees, xaxt="n", xlab="")axis(1, labels=FALSE)text(1:nrow(trees), par("usr")[3] - 0.25, srt=45, adj=1,labels=trees$species, xpd=TRUE)grid(nx=nrow(trees))discovery72
  73. 73. discoverysweetgumanalysis of the tree data:73
  74. 74. MtreeGISexportRegexparse-gissrcScrubspeciesGeohashRegexparse-treetreeTreeMetadataJoinFailureTrapsEstimateheightMdiscovery(flow diagram, gis tree)74
  75. 75. 9q9jh0geohash with 6-digit resolutionapproximates a 5-block squarecentered lat: 37.445, lng: -122.162modeling75
  76. 76. Each road in the GIS export is listed as a block between twocross roads, and each may have multiple road segments torepresent turns:" -122.161776959558,37.4518836690781,0.0" -122.161390381489,37.4516410983794,0.0" -122.160786011735,37.4512589903357,0.0" -122.160531178368,37.4510977281699,0.0modeling( lat0, lng0, alt0 )( lat1, lng1, alt1 )( lat2, lng2, alt2 )( lat3, lng3, alt3 )NB: segments in the raw GIS have the order of geo coordinatesscrambled: (lng, lat, alt)76
  77. 77. 9q9jh0X XXFilter trees which are too far away to provide shade. Calculate a sumof moments for tree height × distance, as an estimator for shade:modeling77
  78. 78. (defn get-shade [trees roads]"subquery to join tree and road estimates, maximize for shade"(<- [?road_name ?geohash ?road_lat ?road_lng?road_alt ?road_metric ?tree_metric](roads ?road_name _ _ _?albedo ?road_lat ?road_lng ?road_alt ?geohash?traffic_count _ ?traffic_class _ _ _ _)(road-metric?traffic_class ?traffic_count ?albedo :> ?road_metric)(trees _ _ _ _ _ _ _?avg_height ?tree_lat ?tree_lng ?tree_alt ?geohash)(read-string ?avg_height :> ?height);; limit to trees which are higher than people(> ?height 2.0)(tree-distance?tree_lat ?tree_lng ?road_lat ?road_lng :> ?distance);; limit to trees within a one-block radius (not meters)(<= ?distance 25.0)(/ ?height ?distance :> ?tree_moment)(c/sum ?tree_moment :> ?sum_tree_moment);; magic number 200000.0 used to scale tree moment;; based on median(/ ?sum_tree_moment 200000.0 :> ?tree_metric)))modeling78
  79. 79. MtreeJoinCalculatedistanceshadeFilterheightSummomentREstimatetrafficRroadFilterdistanceM MFiltersum_moment(flow diagram, shade)modeling79
  80. 80. (defn get-gps [gps_logs trap]"subquery to aggregate and rank GPS tracks per user"(<- [?uuid ?geohash ?gps_count ?recent_visit](gps_logs?date ?uuid ?gps_lat ?gps_lng ?alt ?speed ?heading?elapsed ?distance)(read-string ?gps_lat :> ?lat)(read-string ?gps_lng :> ?lng)(geohash ?lat ?lng :> ?geohash)(c/count :> ?gps_count)(date-num ?date :> ?visit)(c/max ?visit :> ?recent_visit)))modeling?uuid ?geohash ?gps_count ?recent_visitcf660e041e994929b37cc5645209c8ae 9q8yym 7 1972376866448342ac6fd3f5f44c6b97724d618d587cf 9q9htz 4 197237669096932cc09e69bc042f1ad22fc16ee275e21 9q9hv3 3 1972376670935342ac6fd3f5f44c6b97724d618d587cf 9q9hv3 3 1972376691356342ac6fd3f5f44c6b97724d618d587cf 9q9hwn 13 1972376690782342ac6fd3f5f44c6b97724d618d587cf 9q9hwp 58 1972376690965482dc171ef0342b79134d77de0f31c4f 9q9jh0 15 1972376952532b1b4d653f5d9468a8dd18a77edcc5143 9q9jh0 18 197237694534880
  81. 81. Recommenders often combine multiple signals, via weightedaverages, to rank personalized results:• GPS of person ∩ road segment• frequency and recency of visit• traffic class and rate• road albedo (sunlight reflection)• tree shade estimatorAdjusting the mix allows for further personalization at the end usemodeling(defn get-reco [tracks shades]"subquery to recommend road segments based on GPS tracks"(<- [?uuid ?road ?geohash ?lat ?lng ?alt?gps_count ?recent_visit ?road_metric ?tree_metric](tracks ?uuid ?geohash ?gps_count ?recent_visit)(shades ?road ?geohash ?lat ?lng ?alt ?road_metric ?tree_metric)))81
  82. 82. ‣ addr: 115 HAWTHORNE AVE‣ lat/lng: 37.446, -122.168‣ geohash: 9q9jh0‣ tree: 413 site 2‣ species: Liquidambar styraciflua‣ est. height: 23 m‣ shade metric: 4.363‣ traffic: local residential, light traffic‣ recent visit: 1972376952532‣ a short walk from my train stop ✔apps82
  83. 83. Enterprise DataWorkflowswith CascadingO’Reilly, 2013amazon.com/dp/1449358721references…83
  84. 84. blog, dev community, code/wiki/gists, maven repo,commercial products, career opportunities:cascading.orgzest.to/group11github.com/Cascadingconjars.orggoo.gl/KQtULconcurrentinc.comdrill-down…Copyright @2013, Concurrent, Inc.Hiring for Java API developers in SF!84
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×