Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HTM & Apache Flink (2016-06-27)

1,008 views

Published on

Extending Flink for anomaly detection with Hierarchical Temporal Memory (HTM). Presented at Bay Area Apache Flink Meetup, in San Jose on June 27, 2016.

https://github.com/htm-community/flink-htm

Published in: Data & Analytics
  • Be the first to comment

HTM & Apache Flink (2016-06-27)

  1. 1. Eron Wright @eronwright HTM & Apache Flink Extending Flink for Anomaly Detection with Hierarchical Temporal Memory (HTM)
  2. 2. What is HTM? 2
  3. 3. 3 Hierarchical Temporal Memory (HTM) is a theory of computation for the neocortex.
  4. 4. History 4 2005 – 2009  HTM theory  First generation algorithms  Hierarchy and vision problems  Vision Toolkit 2002 2004 2009 – 2012  Cortical Learning Algorithms  SDRs, sequence memory, continuous learning  Applications exploration 2013 – 2015  Continued HTM development  NuPIC open source project  Grok for anomaly detection 2005 2014 –  Sensorimotor  Goal directed behavior  Sequence classificationhttp://www.slideshare.net/numenta/why-neurons-have-thousands-of-synapses-a-model-of-sequence-memory-in-the-brain
  5. 5. Computational Properties  Online, Unsupervised Learning  High-order Representations • For example: sequences “ABCD” vs “XBCY”  Multiple Simultaneous Predictions • For example: “BC” predicts both “D” and “Y”  Anomaly Scores 5
  6. 6. Implementations of HTM  Numerous Implementations • NuPIC – official reference library (Python/C) • HTM.java – community-supported library (Java)  Evolving Rapidly • Tracking the theory! 6
  7. 7. 7 NuPIC learns the time-based patterns in data, predicts future values, and detects anomalies.
  8. 8. Introducing Flink-HTM 8
  9. 9. 9 flink-htm provides HTM-based learning operators for the Flink DataStream API, based on HTM.java.
  10. 10. Benefits  Good fit for Apache Flink • Automated model-building • Continuous learning • Temporal awareness 10 Contrast with: github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine
  11. 11. Benefits (con’t)  Good fit for HTM • Integration w/ data pipeline • Data connectivity • e.g. Kafka, Twitter, HDFS, AWS Kinesis • DSL for stream pre- and post-processing • e.g. aggregation, transformation • Distributed, reliable processing • Event-Time Awareness 11
  12. 12. Features  `Learn` Operator • Feeds input data to an HTM model • Emits predictions and anomaly scores • Supports keyed and non-keyed streams  Checkpoint Integration • Models are serialized • Facilitates exactly-once processing  Numenta RiverView Connector • Public-domain temporal datasets 12
  13. 13. 13 NYC Traffic Example http://data.numenta.org/nyc-traffic/meta.html
  14. 14. 14
  15. 15. General Approach 1. Define Input Type 2. Add Data Source 3. Apply Learn Operator • w/ HTM Network Definition • w/ Field Encoders 4. Define Select Function 1. Process the inference data (predictions & anomaly scores) 15
  16. 16. 16
  17. 17. 17
  18. 18. Advanced Topics  `Reset` Function • Indicates the start of a temporal sequence • For example: A,B,C,D,E, (reset), A,B,C,D,E  Stateful Functions • Use `mapWithState` to store predictions for the future 18
  19. 19. 19
  20. 20. Extending Flink 20
  21. 21. Streaming API/DSL  Java 1. Static Entrypoint, then 2. Intermediate Representation (e.g. HTMStream), then 3. DataStream! 21
  22. 22. Streaming API/DSL (con’t)  Scala 1. `RichDataStream` extensions 2. Scala Functions 3. Scala-Specific TypeInformation  Other • Serialization Hooks • Clean your closures! 22
  23. 23. Learn Operator  Implement `AbstractStreamOperator`  Respect Flink’s type system • Use the `TypeInformation` class  Use the State Handle abstraction • * keyed streams only  Instrument your code • Accumulators 23
  24. 24. RiverView Connector  Extend `RichParallelSourceFunction` • Parallelism is user-defined • Must handle partition assignment  Mix in `Checkpointed` • Synchronize on checkpoint lock  Support cancel/stop 24
  25. 25. Closing 25
  26. 26. Help Wanted! 26  Issues: github.com/htm-community/flink-htm/issues  Follow: @ApacheFlink, @dataArtisans, @Numenta  Info: http://numenta.org/

×