Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Spark for Reactive Machine Learning: Building Intelligent Agents at Scale

754 views

Published on

Presented at the Open Source Analytics-New York Meetup on 5/11/16

Published in: Software
  • Be the first to comment

Spark for Reactive Machine Learning: Building Intelligent Agents at Scale

  1. 1. Spark for Reactive Machine Learning: Building Intelligent Agents at Scale Jeff Smith @jeffksmithjr
  2. 2. x.ai is a personal assistant who schedules meetings for you
  3. 3. Agents
  4. 4. Agents Autonomous Goal-oriented Capable of learning Reactive
  5. 5. Amy Ingram x.ai/how-it-works/
  6. 6. Agents Autonomous Goal-oriented Capable of learning Reactive
  7. 7. Tech Stacks
  8. 8. You
  9. 9. Scala & Python Spark MongoDB Machine Learning
  10. 10. nom nom, the data dog Scala & Python Spark & Akka Couchbase Machine Learning
  11. 11. Reactive + Machine Learning
  12. 12. Machine Learning Systems
  13. 13. Traits of Reactive Systems
  14. 14. Responsive
  15. 15. Resilient
  16. 16. Elastic
  17. 17. Message-Driven
  18. 18. Reactive Strategies
  19. 19. Reactive Machine Learning
  20. 20. Reactive Machine Learning
  21. 21. Generating Features
  22. 22. Machine Learning Systems
  23. 23. Feature Generation Raw Data FeaturesFeature Generation Pipeline
  24. 24. Microblogging Data
  25. 25. Pipeline Failure Raw Data FeaturesFeature Generation Pipeline Raw Data FeaturesFeature Generation Pipeline
  26. 26. Supervising Feature Generation Raw Data FeaturesFeature Generation Pipeline Supervision
  27. 27. Original Features object SquawkLength extends FeatureType[Int] object Super extends LabelType[Boolean] val originalFeatures: Set[FeatureType] = Set(SquawkLength) val label = Super
  28. 28. Basic Features object PastSquawks extends FeatureType[Int] val basicFeatures = originalFeatures + PastSquawks
  29. 29. More Features object MobileSquawker extends FeatureType[Boolean] val moreFeatures = basicFeatures + MobileSquawker
  30. 30. Feature Collections case class FeatureCollection(id: Int, createdAt: DateTime, features: Set[_ <: FeatureType[_]], label: LabelType[_])
  31. 31. Feature Collections val earlierCollection = FeatureCollection(101, earlier, basicFeatures, label) val latestCollection = FeatureCollection(202, now, moreFeatures, label) val featureCollections = sc.parallelize( Seq(earlierCollection, latestCollection))
  32. 32. Fallback Collections val FallbackCollection = FeatureCollection(404, beginningOfTime, originalFeatures, label)
  33. 33. Fallback Collections def validCollection(collections: RDD[FeatureCollection], invalidFeatures: Set[FeatureType[_]]) = { val validCollections = collections.filter( fc => !fc.features .exists(invalidFeatures.contains)) .sortBy(collection => collection.id) if (validCollections.count() > 0) { validCollections.first() } else FallbackCollection }
  34. 34. Learning Models
  35. 35. Machine Learning Systems
  36. 36. Learning Models Features ModelModel Learning Pipeline
  37. 37. Models of Love
  38. 38. Data Preparation val labelIndexer = new StringIndexer() .setInputCol("label") .setOutputCol("indexedLabel") .fit(instances) val featureIndexer = new VectorIndexer() .setInputCol("features") .setOutputCol("indexedFeatures") .fit(instances) val Array(trainingData, testingData) = instances.randomSplit( Array(0.8, 0.2))
  39. 39. Learning a Model val decisionTree = new DecisionTreeClassifier() .setLabelCol("indexedLabel") .setFeaturesCol("indexedFeatures") val labelConverter = new IndexToString() .setInputCol("prediction") .setOutputCol("predictedLabel") .setLabels(labelIndexer.labels) val pipeline = new Pipeline() .setStages(Array(labelIndexer, featureIndexer, decisionTree, labelConverter))
  40. 40. Evolving Modeling Strategies val randomForest = new RandomForestClassifier() .setLabelCol("indexedLabel") .setFeaturesCol("indexedFeatures") val revisedPipeline = new Pipeline() .setStages(Array(labelIndexer, featureIndexer, randomForest, labelConverter))
  41. 41. Deep Models of Artistic Style
  42. 42. Refactoring Command Line Tools > python neural-art-tf.py -m vgg -mp ./vgg -c ./images/ bear.jpg -s ./images/style.jpg -w 800 def produce_art(content_image_path, style_image_path, model_path, model_type, width, alpha, beta, num_iters):
  43. 43. Exposing a Service class NeuralServer(object): def generate(self, content_image_path, style_image_path, model_path, model_type, width, alpha, beta, num_iters): produce_art(content_image_path, style_image_path, model_path, model_type, width, alpha, beta, num_iters) return True daemon = Pyro4.Daemon() ns = Pyro4.locateNS() uri = daemon.register(NeuralServer) ns.register("neuralserver", uri) daemon.requestLoop()
  44. 44. Encoding Model Types object ModelType extends Enumeration { type ModelType = Value val VGG = Value("VGG") val I2V = Value("I2V") }
  45. 45. Encoding Valid Configuration case class JobConfiguration(contentPath: String, stylePath: String, modelPath: String, modelType: ModelType, width: Integer = 800, alpha: java.lang.Double = 1.0, beta: java.lang.Double = 200.0, iterations: Integer = 5000)
  46. 46. Finding the Service val ns = NameServerProxy.locateNS(null) val remoteServer = new PyroProxy(ns.lookup("neuralserver"))
  47. 47. Calling the Service def callServer(remoteServer: PyroProxy, jobConfiguration: JobConfiguration) = { Future.firstCompletedOf( List( timedOut, Future { remoteServer.call("generate", jobConfiguration.contentPath, jobConfiguration.stylePath, jobConfiguration.modelPath, jobConfiguration.modelType.toString, jobConfiguration.width, jobConfiguration.alpha, jobConfiguration.beta, jobConfiguration.iterations).asInstanceOf[Boolean] }))}
  48. 48. Profiles with Style
  49. 49. Hybrid Model learning Features ModelModel Learning Pipeline
  50. 50. Publishing Models
  51. 51. Machine Learning Systems
  52. 52. Publishing Models Model Predictive Service Publishing Process
  53. 53. Detecting Fraud False Negative False Positive
  54. 54. Model Metrics val trainingSummary = model.summary val binarySummary = trainingSummary .asInstanceOf[BinaryLogisticRegressionSummary] binarySummary.roc roc.show() binarySummary.areaUnderROC
  55. 55. Model Metrics val predictions = model.transform(testingData) val evaluator = new BinaryClassificationEvaluator() .setLabelCol("label") .setRawPredictionCol("rawPrediction") .setMetricName("areaUnderPR") val areaUnderPR = evaluator.evaluate(predictions)
  56. 56. Building Lineages val rawData: RawData val featureSet: Set[FeatureType] val model: ClassificationModel val modelMetrics: BinaryLogisticRegressionSummary
  57. 57. Summary
  58. 58. Agents Autonomous Goal-oriented Capable of learning Reactive
  59. 59. Machine Learning Systems
  60. 60. Traits of Reactive Systems
  61. 61. Reactive Strategies
  62. 62. Reactive Machine Learning
  63. 63. For Later
  64. 64. @jeffksmithjr manning.com reactivemachinelearning.com medium.com/data-engineering M A N N I N G Jeff SmithUse the code opensanmu for 40% off!
  65. 65. x.ai @xdotai hello@human.x.ai New York, New York We’re hiring!
  66. 66. Thank You!

×