Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How Spark Enables the Internet of Things- Paula Ta-Shma

3,304 views

Published on

How Spark Enables the Internet of Things- Paula Ta-Shma

Published in: Data & Analytics
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2Q98JRS ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ http://bit.ly/2Q98JRS ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • DOWNLOAD THAT BOOKS INTO AVAILABLE FORMAT (2019 Update) ......................................................................................................................... ......................................................................................................................... Download Full PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download Full doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download PDF EBOOK here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download EPUB Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... Download doc Ebook here { http://bit.ly/2m6jJ5M } ......................................................................................................................... ......................................................................................................................... ................................................................................................................................... eBook is an electronic version of a traditional print book that can be read by using a personal computer or by using an eBook reader. (An eBook reader can be a software application for use on a computer such as Microsoft's free Reader application, or a book-sized computer that is used solely as a reading device such as Nuvomedia's Rocket eBook.) Users can purchase an eBook on diskette or CD, but the most popular method of getting an eBook is to purchase a downloadable file of the eBook (or other reading material) from a Web site (such as Barnes and Noble) to be read from the user's computer or reading device. Generally, an eBook can be downloaded in five minutes or less ......................................................................................................................... .............. Browse by Genre Available eBooks .............................................................................................................................. Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, ......................................................................................................................... ......................................................................................................................... .....BEST SELLER FOR EBOOK RECOMMEND............................................................. ......................................................................................................................... Blowout: Corrupted Democracy, Rogue State Russia, and the Richest, Most Destructive Industry on Earth,-- The Ride of a Lifetime: Lessons Learned from 15 Years as CEO of the Walt Disney Company,-- Call Sign Chaos: Learning to Lead,-- StrengthsFinder 2.0,-- Stillness Is the Key,-- She Said: Breaking the Sexual Harassment Story That Helped Ignite a Movement,-- Atomic Habits: An Easy & Proven Way to Build Good Habits & Break Bad Ones,-- Everything Is Figureoutable,-- What It Takes: Lessons in the Pursuit of Excellence,-- Rich Dad Poor Dad: What the Rich Teach Their Kids About Money That the Poor and Middle Class Do Not!,-- The Total Money Makeover: Classic Edition: A Proven Plan for Financial Fitness,-- Shut Up and Listen!: Hard Business Truths that Will Help You Succeed, ......................................................................................................................... .........................................................................................................................
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

How Spark Enables the Internet of Things- Paula Ta-Shma

  1. 1. © 2015 IBM Corporation How Spark Enables the Internet of Things: Efficient Integration of Multiple Spark Components for Smart City Use Cases Paula Ta-Shma IBM Research paula@il.ibm.com Joint work with: Adnan Akbar, University of Surrey Michael Factor, IBM Research Guy Hadash, IBM Research Juan Sancho, ATOS
  2. 2. © 2015 IBM Corporation2 The Evolution of Data Collection Internet of Things
  3. 3. © 2015 IBM Corporation3 2005 2012 2017 The IoT market will grow to $1.7 trillion in 2020 (IDC) By 2020 the number of networked devices will be 30 billion (IDC), more than 4 times the entire global population IoT : The Biggest Big Data GlobalDataVolumeinExabytes 2005 2012 2017
  4. 4. © 2015 IBM Corporation4 EMT Madrid Bus Company Needs to Make Decisions According to Current and Predicted Future Traffic State  The Problem – EMT needs to staff control rooms where employees manually analyze Madrid traffic sensor output. This can be slow and costly.  Objective – Improve customer satisfaction and reduce costs by responding more efficiently and quickly to real- time traffic problems  Approach – Monitor data from up to 3000 sensors. React by rerouting buses, modifying traffic lights, etc., based upon knowledge derived from historical data Today Tomorrow
  5. 5. © 2015 IBM Corporation5 1. Collect historical time series data – Collect data from devices – Aggregate into objects – Index and/or partition Generic IoT Architecture – Data Flow Secor IoT Swift
  6. 6. © 2015 IBM Corporation6 2. Learn patterns in data – May be time/location dependent – Generate thresholds, classifiers etc. Generic IoT Architecture – Data Flow Secor Swift
  7. 7. © 2015 IBM Corporation7 IoT 3. Apply what was learned on real time data stream – Take action Generic IoT Architecture – Data Flow Secor CEP Swift
  8. 8. © 2015 IBM Corporation8 How Spark Enables the Internet of Things: Efficient Integration of Multiple Spark Components for Smart City Use Cases IoT Generic IoT Architecture – Data Flow CEP Secor Swift Green Flows: Real time Purple Flows: Batch
  9. 9. © 2015 IBM Corporation9 Aim: Collect historical timeseries data for analysis – Continuously collect data from up to 3000 Madrid council traffic sensors via web service - Data includes traffic speeds and intensities, updated every 5 mins – Push the messages to Kafka – Use Secor to aggregate multiple messages into a single Swift object - According to policy, e.g., every 60 mins - Possibly partition the data, e.g. according to date - Convert to Parquet format - Annotate with metadata, e.g., min/max speed, start/end time – Index Swift objects according to their metadata using ElasticSearch Secor Swift IoT Architecture – Madrid Traffic – Ingestion Flow IoT
  10. 10. © 2015 IBM Corporation10 IoT Architecture – Madrid Traffic – Data Access Aim: Access data efficiently and cost effectively – Store IoT data in OpenStack Swift object storage - Open source, low cost deployment, and highly scalable – Parquet data is accessible via Spark SQL – Optimized predicate pushdown - Custom Spark SQL external data source driver - Uses object metadata indexes - Searches for Swift objects whose min/max values overlap requested ranges Get all data for morning traffic: SELECT codigo, intensidad, velocidad FROM madridtraffic WHERE tf >= '08:00:00' AND tf <= '12:00:00' Brute force method 13245 Swift requests Optimized predicate pushdown 616 Swift requests 21.5 times improvement Swift
  11. 11. © 2015 IBM Corporation11 IoT Architecture – Madrid Traffic – Machine Learning Aim: Learn to differentiate between ‘good’ and ‘bad’ traffic – Depends on context - Time (morning/evening), Day (weekday/weekend) - Location – Use Spark MLlib k-means clustering – Produce threshold values for real-time decision making – Re-run algorithm when quality of clusters decreases - Can use silhouette index to measure quality Swift
  12. 12. © 2015 IBM Corporation12 IoT Architecture – Madrid Traffic – Machine Learning Event Detection: • Use Spark MLlib k-means clustering to separate data into 2 clusters • Find the midpoint between the 2 cluster centres • Use this midpoint to generate the thresholds • Repeat for each context e.g. time period (morning, afternoon, evening, night) Anomaly Detection: • Use a single cluster and define an anomaly to be further than a certain distance from the cluster centre Morning Traffic on Weekdays
  13. 13. © 2015 IBM Corporation13 IoT Architecture – Madrid Traffic – Real Time Decision Making Aim: Respond in real time to traffic conditions – Use Complex Event Processing (CEP) approach - Rule based - Process events record by record - CEP rules are typically defined manually but in many cases it is difficult to get them right - We automate this process and make it smart - uCEP has a small footprint, can be run at the edge CEP IoT Work in Progress Proactive approach: • Use Spark streaming linear regression to predict traffic behavior (e.g. speed, intensity) for near future • Apply CEP on predicted data • Respond pro-actively to predicted events such as traffic congestion – e.g. EMT can proactively re- route buses
  14. 14. © 2015 IBM Corporation14 Demo
  15. 15. © 2015 IBM Corporation15 Our Architecture Applies to Many IoT Use Cases  Energy/utilities – Anomaly detection - Pipe leakage - Appliance malfunction – Occupancy detection  Healthcare – Healthcare patient monitoring/alert/response  Insurance – Driver behavior and location monitoring  Transportation – Connected vehicles, engine diagnostics, automated service scheduling  Logistics – Goods tracking, sensitive goods management
  16. 16. © 2015 IBM Corporation Data Sources Apache Spark Node-RED Secor Message Bus Data Storage Data Analytics Data Visualization Freeboard Dashboard Object Storage 16 MQTT The Madrid Traffic Use Case on IBM Bluemix Madrid Traffic Sensors Joint work with Naeem Altaf and team
  17. 17. © 2015 IBM Corporation17 Thank You !
  18. 18. © 2015 IBM Corporation18 Backup
  19. 19. © 2015 IBM Corporation19 COSMOS  Funding: EU FP7 at level of 2PY x 3 years  Started: Sept 2013  Coordinator: ATOS  Technical partners: IBM, NTUA, Univ Surrey, Siemens, ATOS  Use Case Partners: Hildebrand/Camden, EMT Madrid Bus Transport/Madrid Council, III Taiwan – Smart Cities use cases  Project Vision: Enable ‘things’ to interact with each other based on shared experience, trust, reputation etc.
  20. 20. © 2015 IBM Corporation20 IBM Bluemix Data Analytics for IoT Architecture
  21. 21. © 2015 IBM Corporation21  What is it? – Apache Kafka is a high throughput distributed publish/subscribe messaging system. – Secor is an open source tool developed by Pinterest, which aggregates Kafka messages and saves as an S3 object.  What extensions were needed? – Support for OpenStack Swift as a Secor target. We also added support for Parquet format and annotating objects with metadata search to support indexing.  What is the value of integration with Swift? – Enables bringing new data and applications to Swift which is an open source solution. Parquet and metadata search enable improved performance for batch analytics.  Status – We contributed OpenStack Swift support to the Secor community and it is now part of Secor. Secor Kafka + Secor
  22. 22. © 2015 IBM Corporation22 Parquet  What is it? – A column based semi-structured, schema-based storage format supported by Hadoop and Spark. Enables column-wise compression and projection pushdown.  What integration is needed? – Since Swift is now part of the Hadoop ecosystem, no additional integration is needed. Data in Swift can be stored in Apache Parquet format, inheriting associated advantages.  Status – Spark SQL supports storing tabular data in Parquet format in Hadoop compatible storage systems such as Swift.
  23. 23. © 2015 IBM Corporation23 elasticsearch  What is it? – A distributed, scalable, real-time search and analytics engine, built on Apache Lucene.  What integration is needed? – Index object metadata allowing search for objects by attributes.  What is the value of integration with Swift – Use search to select objects for further processing, e.g., relevant objects for analytics. - Note that S3 does not yet have native search according to metadata.  Status – The IBM SoftLayer object service includes a basic implementation of metadata search; At IBM Research, we added extensions such as data type support and range searches.

×