Talk track: Traditionally what has been lacking is enough historical data. Now with new approaches such as Hadoop it’s possible to save long term maintenance histories in a cost –effective way…<CLICK>
Consider what this may mean for scheduling repairs for a particular piece of equipment. Rather than just knowing overall repair rates and costs, many details can be stored for a particular part...
Talk track: And from the field, sensors provide real-time measurements about what is happening for that particular part…<CLICK> <PAUSE>
Talk track: When you combine real time sensor data with maintenance histories, you can leverage the value of your data by using machine learning models to inform your actions:
<click> analyze the records in order to <click> predict maintenance needs for <click> better scheduling of repairs. This saves you money by <click> avoiding down time and reducing risk of costly failures.
<click> Time series data is useful, particularly when saved together with part or equipment specifications. You could, for example, go back <click> and see what happened in the days or months leading up to a part failure and thus better understand how to schedule repairs before problems occur.
Talk track: Here is a familiar view of what is being done with data. New data input can be ingested to persistence layer or used in real time processing. What the user such as an analyst would like to be able to do is to make a single query against the data.
How does this work? Let’s think about it in terms of lambda architecture to get a conceptual view of how to combine real time with batch processing…
Talk track: Lambda architecture divides all components in a system into 3 basic layers:
Batch Layer handles persistence and batch oriented computation
Speed Layer handles real time computation and updates to short term persistence (such as HBase or M7 tables)
Serving Layer combines the partial batch query results and the partial query results from real time processing.
Now we can think about our system components in terms of the lambda architecture…
Talk track: Long term data persistence and batch processing are done by components such as Apache Hadoop –based technologies. For the speed layer, there are several choices to do real-time processing including Apache Spark Streaming or Apache Storm. The query can (soon) be carried out using Apache Drill, Apache Hive, Apache Spark’s Shark component or Impala. The serving layer combines long-time and real-time partial query results to provide the final results that the user wants.
Talk track: Recommendations have wide spread use and building a powerful recommendation engine can be easier than you think with certain innovations..
Talk track: the first trick is to choose the right data. Instead of looking at ratings or characteristics of the items to recommend, instead watch people’s behaviors as they interact with items. You discover patterns and that tells you what to recommend.
Talk track: We demonstrated this powerful two-stage approach by building a music recommender on the MapR platform. Notice that the intensive part of the computation, the
Catching up 10:23 (equals normal 53 min; should be at 38 min (15 min late) … catching up
Transcript of "2014.07.01 - New Technologies, New Roles, New Architectures - Singapore Management University - BigData SG"