Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Time Series Analysis: DataFlow

242 views

Published on

Time Series Analysis: DataFlow

6-6:15 pm - Introduction to CSA/Apache Flink and FLaNK Demo
https://www.flankstack.dev/ by Principal DataFlow Field Engineer Tim.

Building Edge-to-AI Applications for Hybrid Cloud with CDP

The Main Event - DataScience and Machine Learning on Time Series IoT Data - Analyzing Time Series Data with an ARIMA model

6:30 pm - Data Scientist Victor Dibia will talk about how to build a Time Series model for sensor data. Read on of his awesome articles https://blog.cloudera.com/deep-learning-for-anomaly-detection/. And build, test, experiment and deploy your model and a visual application with Cloudera Machine Learning platform in AWS.

I will feed him sensor data with MiNiFi Agents from sensor devices, Edge Flow Manager, NiFi, Kafka and Flink.

Our Tri-State Meetup Data Team: Amol Thacker, Paul Vidal and John Kuchmek will be hosting and providing color commentary.

If you collect your data, then you will find it ...Time After Time (spin of Cyndi Lauper song)
Collect data at the edge and analyze with an ARIMA model. (matter of fact tile)

The Internet of Things (IoT) is growing in popularity but it isn’t new. Connected devices have existed in manufacturing and utilities with Supervisory Control and Data Acquisition (SCADA) systems. Time series data has been looked at for sometime in these industries as well as the stock market. Time series analysis can bring valuable insight to businesses and individuals with smart homes. There are many parts and components to be able to collect data at the edge, store in a central location for initial analysis, model build, train and eventually deploy. Time series forecasting is one of the more challenging problems to solve in data science. Important factors in time series analysis and forecasting are seasonality, stationary nature of data and autocorrelation of target variables. We show you a platform, built on open source technology, that has this potential. Sensor data will be collected at the edge, off a Raspberry Pi, using Cloudera’s Edge Flow Manager (powered by MiNiFi). The data will then be pushed to a cluster containing Cloudera Flow Manager (powered by NiFi) so it can be manipulated, routed, and then be stored in Kudu on Cloudera’s Data Platform. Initial inspection can be done in Hue using Impala. The time series data will be analyzed with potential forecasting using an ARIMA model in CML (Cloudera Machine Learning). Time series analysis and forecasting can be applied to but not limited to stock market analysis, forecasting electricity loads, inventory studies, weather conditions, census analysis and sales forecasting.

https://github.com/tspannhw/meetup-sensors

Published in: Data & Analytics
  • Be the first to comment

Time Series Analysis: DataFlow

  1. 1. Time Series Analysis: DataFlow Timothy Spann Principal DataFlow Field Engineer @PaasDev
  2. 2. 2© 2020 Cloudera, Inc. All rights reserved.
  3. 3. © 2020 Cloudera, Inc. All rights reserved. 3 Welcome to Future of Data - Princeton @PaasDev https://www.meetup.com/futureofdata-princeton/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  4. 4. © 2020 Cloudera, Inc. All rights reserved. 4 Welcome to Future of Data - New York https://www.meetup.com/futureofdata-newyork/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  5. 5. © 2020 Cloudera, Inc. All rights reserved. 5 Welcome to Future of Data - Philadelphia @futureofdataphl https://www.meetup.com/futureofdata-philadelphia/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to K8 to ...
  6. 6. © 2020 Cloudera, Inc. All rights reserved. 6 Meetup Presenter 1 Who am I? Principal DataFlow Field Engineer @PaasDev DZone Zone Leader and Big Data MVB; Princeton NJ Future of Data Meetup; ex-Pivotal Field Engineer; Apache Kafka, Tensorflow, Apache Spark RefCards https://github.com/tspannhw https://www.datainmotion.dev/ https://dzone.com/users/297029/bunkertor.html
  7. 7. © 2019 Cloudera, Inc. All rights reserved. 7 CLOUDERA DATA PLATFORM World’s first enterprise data cloud
  8. 8. © 2020 Cloudera, Inc. All rights reserved. 8 THE ENTERPRISE DATA CLOUD COMPONENTS Traditional Platform Consumption: • Data Hub Clusters New analytic experiences: • Data Warehouse • Machine Learning • More to come Control Plane services: • Workload Manager • Replication Manager • Data Catalog • Management Console
  9. 9. © 2020 Cloudera, Inc. All rights reserved. 9 CLOUDERA - THE ENTERPRISE DATA COMPANY STREAMING & DATA FLOW DATA WAREHOUSE VISUAL APPLICATIONS Collect Report Enable SECURITY | GOVERNANCE | LINEAGE | MANAGEMENT | AUTOMATION DATA ENGINEERING Enrich MACHINE LEARNING & AI Predict
  10. 10. © 2020 Cloudera, Inc. All rights reserved. 10 CSA
  11. 11. © 2020 Cloudera, Inc. All rights reserved. 11 Streaming Analytics Powered by Apache Flink
  12. 12. 12© 2020 Cloudera, Inc. All rights reserved. Buffer Analyze Connect Modern AppsApache Kafka gateway-west- raw-sensors gateway-central- raw-sensors gateway-east- raw-sensors C++ agent US-Central Plants C++ agent US-East Plants Data-at-Rest opDB CDP-DC S3 Azure Streaming Analytics Powered by FLINK Microservices by KAFKA Streams Ingest Gateway Powered by Kafka US-West Plants C++ agent Model Scoring Powered by CML Data-at-Rest Cloud Storage Data Lake Operational Stores Data Collection at the Edge Collect Apache NiFi & MiNiFi Distribute Apache NiFi Data Flow Apps Powered by NIFI A DATA-IN-MOTION REFERENCE ARCHITECTURE
  13. 13. © 2020 Cloudera, Inc. All rights reserved. 13 SQL & Table API ● Unified APIs for streaming data and data at rest ○ Run the same query on batch and streaming data ○ ANSI SQL: No stream-specific syntax or semantics! ○ Many common stream analytics use cases supported SELECT userId, COUNT(*) AS cnt SESSION_START(clicktime, INTERVAL '30' MINUTE) FROM clicks GROUP BY SESSION(clicktime, INTERVAL '30' MINUTE), userId Count clicks per user and session (defined by 30 min. gap of inactivity).
  14. 14. Quick Flink SQL Demo Preview
  15. 15. FLaNK Stack https://github.com/tspannhw/MmFLaNK https://www.datainmotion.dev/2019/11/introducing-mm-flank-apache-flink-stack.html SELECT * FROM sensors;
  16. 16. © 2020 Cloudera, Inc. All rights reserved. 16 CFM CSM
  17. 17. © 2020 Cloudera, Inc. All rights reserved. 17 CDF The Active Data Warehouse with Apache Kudu IOT Devices Applications Metrics Logs & Files HDFS/ Object Storage Hot Storage Cold Storage SQL Real-Time Analytics Alerting Event Driven Applications Dashboards Authorization Audit & LineageAuthentication Kerberos Encryption NavEncrypt
  18. 18. Sensor Data https://www.datainmotion.dev/2020/04/predicting-sensor-readings-with-time.html
  19. 19. Sensors ● BME280 temperature, pressure, humidity sensor ● LTR-559 light and proximity sensor ● MICS6814 analog gas sensor ● ADS1015 ADC ● MEMS microphone ● 0.96-inch, 160 x 80 color LCD
  20. 20. Sensor Data - Edge
  21. 21. Sensor Data - Hydrate Date Lakes
  22. 22. Sensor Data - Example Row {"uuid": "rpi4_uuid_omi_20200417211935", "amplitude100": 0.3, "amplitude500": 0.1, "amplitude1000": 0.1, "lownoise": 0.1, "midnoise": 0.1, "highnoise": 0.1, "amps": 0.3, "ipaddress": "192.168.1.243", "host": "rp4", "host_name": "rp4", "macaddress": "dc:a6:32:03:a6:e9", "systemtime": "04/17/2020 17:19:36", "endtime": "1587158376.22", "runtime": "36.47", "starttime": "04/17/2020 17:18:58", "cpu": 0.0, "cpu_temp": "59.0", "diskusage": "46651.6 MB", "memory": 6.3, "id": "20200417211935_7b7ae5da-905b-418b-94f1-270a15dbc1df", "temperature": "38.7", "adjtemp": "29.7", "adjtempf": "65.5", "temperaturef": "81.7", "pressure": 1015.6, "humidity": 6.8, "lux": 1.2, "proximity": 0, "oxidising": 8.3, "reducing": 306.4, "nh3": 129.5, "gasKO": "Oxidising: 8300.63 OhmsnReducing: 306352.94 OhmsnNH3: 129542.17 Ohms"}
  23. 23. Sensor Ingest Demo
  24. 24. Data Science Up Next
  25. 25. © 2020 Cloudera, Inc. All rights reserved. 26 LINKS
  26. 26. LINKS ● https://www.datainmotion.dev/2019/12/iot-series-minifi-agent-on-raspberry-pi.html ● https://learn.pimoroni.com/tutorial/sandyj/getting-started-with-enviro-plus ● https://github.com/tspannhw/meetup-sensors/ ● https://github.com/tspannhw/ClouderaFlowManagementWorkshop ● https://github.com/tspannhw/minifi-enviroplus ● https://github.com/tspannhw/minifi-movidius-electric ● https://github.com/tspannhw/table-ddl
  27. 27. TH N Y U

×