Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Building a D...
2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Intelligent ...
3© Copyright 2015 EMC Corporation. All rights reserved.
“Smart Cities that use Big Data are neither about intuition nor ab...
4© Copyright 2015 EMC Corporation. All rights reserved. 4© Copyright 2015 EMC Corporation. All rights reserved.
VISION FOR...
5© Copyright 2015 EMC Corporation. All rights reserved. 5© Copyright 2015 EMC Corporation. All rights reserved.
IMPLICATIO...
6© Copyright 2015 EMC Corporation. All rights reserved. 6© Copyright 2015 EMC Corporation. All rights reserved.
“BIG DATA”...
7© Copyright 2015 EMC Corporation. All rights reserved. 7© Copyright 2015 EMC Corporation. All rights reserved.
BUILD SMAR...
8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Architecture
9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Guiding Prin...
10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Architectur...
11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Ingestion
I...
12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Transformat...
13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Integration...
14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Integration...
15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Integration...
16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Visualizati...
17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
API impleme...
18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Example use...
19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• Available...
20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Architectur...
21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• Use each ...
22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• The goal ...
23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Average spe...
26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Prediction ...
27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
• Based on ...
28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Screen shot...
29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Data Qualit...
30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Extending
P...
31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Real-time
D...
32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Real-time
D...
33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Pivotal GPD...
34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Simple to m...
35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved.
Lots of Lit...
Building a Data Analytics PaaS for Smart Cities
Upcoming SlideShare
Loading in …5
×

Building a Data Analytics PaaS for Smart Cities

1,395 views

Published on

Building a Data Analytics PaaS for Smart Cities

Published in: Technology
  • Be the first to comment

Building a Data Analytics PaaS for Smart Cities

  1. 1. 1© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Building a Data Analytics PaaS for Smart Cities Smiti Sharma, EMC Virtustream Keith Manthey, EMC ETD BRDC
  2. 2. 2© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Intelligent Communities Cities and Regions that use technology not just to save money or make things work better, but also to create high-quality employment, increase citizen participation and become great places to live and work. ICF – Intelligent Community Forum
  3. 3. 3© Copyright 2015 EMC Corporation. All rights reserved. “Smart Cities that use Big Data are neither about intuition nor about looking back and analyzing what went wrong and could be better. They spot patterns. They look forward. They predict potential crisis situations. They find what could be better and make it better. Smart cities don’t guess. Theyaresure!
  4. 4. 4© Copyright 2015 EMC Corporation. All rights reserved. 4© Copyright 2015 EMC Corporation. All rights reserved. VISION FOR CITIES OF THE FUTURE Become an innovative city SAFE Anticipate risks and protect people and information EFFICIENT Optimized use of city resources SEAMLESS Integrated daily life services IMPACTFUL Enriched life and business experiences for all
  5. 5. 5© Copyright 2015 EMC Corporation. All rights reserved. 5© Copyright 2015 EMC Corporation. All rights reserved. IMPLICATIONS TO THE CITY Empower the city, citizens, visitors, and businesses IMPROVE Quality of urban living CREATE Efficient city and transparent government DEVELOP Vital economy REDUCE Environmental impact ADDRESS Infrastructure, buildings and urban planning IMPROVE Tourism, recreation, and city image
  6. 6. 6© Copyright 2015 EMC Corporation. All rights reserved. 6© Copyright 2015 EMC Corporation. All rights reserved. “BIG DATA” ENABLESCITIES OF THE FUTURE Any data-set that cannot be processed with traditional systems Social Networks, UGCPublic records Location DataInternet of things Emerging Data Sources Unstructured Data Dark Data Structured Data Traditional Data Sources
  7. 7. 7© Copyright 2015 EMC Corporation. All rights reserved. 7© Copyright 2015 EMC Corporation. All rights reserved. BUILD SMART CITY: THE PROBLEM Understand the city data challenge Geo Distributed Data Source Satellite-borne Imaging Device Airborne Imaging Device Webcam Environmental Monitor Health Monitor Traffic monitor Industrial Process Monitor Data Center Centralized Storage and Analytics Systems City Network DATA CHALLENGE There are massive endpoints for these systems. How to manage massive and heterogeneous data becomes an enormous challenge. Diverse data sources requires normalization and standardization to address Data Orchestration and integration. DATA USAGE CHALLENGE How could we create some innovative business to use these data to create more value and fully use current investment?
  8. 8. 8© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Architecture
  9. 9. 9© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Guiding Principles Agile Open Portable Extensible Modular/Ftl. Blocks Analytics Driven Software Defined
  10. 10. 10© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Architecture Ingestion Layer Spring XD Transformation Layer Python /Transformed_Files KPIs Métricas exploration Maps & Graphs Visualization Layer API Open Data Data Integration Layer Python /Transformed_Files Schema and Instance Alignment Data Sources GUIs, Dashboard that access the underlying databases and promote an excellent User experience Data modelling, metrics , ETL mechanisms, definitions and variable selectionProprietary and Open Data sources APIs to expose data Analytics Data mining prediction
  11. 11. 11© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Ingestion Ingestion Layer Spring XD Data Sources
  12. 12. 12© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Transformation Transformation Layer Python /Transformed_Files Data Cleaning Conversion
  13. 13. 13© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Integration Schema Integration Instance Alignment Integration Layer Python /Transformed_Files Schema and Instance Alignment
  14. 14. 14© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Integration • Schema DB (INPUT) • Schema Matching (Algorithms & Heuristics) • Suggest Attribute Mappings (OUTPUT : SEMI-SUPERVISED) • Instances of DB tables & Integration Rules (INPUT) • Deduplication, Record Consolidation (Algorithms & Heuristics) • Instance alignment using 2 phase-pass algorithm to avoid duplicate insertion in a semi supervised data integration tool) • Attribute name similarity: fuzzy string comparisons (cosine similarity) • Levenshtein similarity: Categorical/String Data • http://pgsimilarity.projects.pgfoundry.org/ • List of deduplicated instances (OUTPUT - SEMI-SUPERVISED) Schema Integration Instance Alignment Integration Layer Python /Transformed_Files Schema and Instance Alignment
  15. 15. 15© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Integration Schema Integration Instance Alignment Camada de Integração Python /Transformed_Files Schema and Instance Alignment Deduplication Similarity Join Mapeamento de atributos (Inserir) Mapeamento de atributos (Selecionar) Cosine Levenshtein
  16. 16. 16© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Visualization KPIs Métricas exploration Maps & Graphs Visualization Layer
  17. 17. 17© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. API implementation API Open Data
  18. 18. 18© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Example use Case Transportation
  19. 19. 19© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • Available Data: – City bus movement information from on board devices (lat- long, time, date, bus line, bus ID) • Goals: – Predict the time of the arrival in a bus stop • Challenges – Lack of data in certain areas of the city – GPS precision Prediction of bus arrival
  20. 20. 20© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Architecture GPS Ônibus Gemfire XD Routes & Bus Stop Data Lake Streaming Scheduler Lazy-write GPS
  21. 21. 21© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • Use each bus stop as a node, and the street as edges Transportation Network 𝑥𝑖 𝑥𝑗 𝑎𝑖𝑗 = +1 𝑎𝑖𝑗 = −1 𝑋 = 𝑥1, … , 𝑥 𝑁 , 𝑥𝑖 = 𝑙𝑎𝑡 𝑖 , 𝑙𝑜𝑛𝑔(𝑖) 𝐸 = 𝑒1, … , 𝑒 𝑀 , 𝑒 𝑘 = (𝑥𝑖, 𝑥𝑗)
  22. 22. 22© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • The goal is to find, for each 𝑒𝑗, an estimation of the average speed in a instant 𝑡, 𝑣 𝑒𝑗, 𝑡 . • Default model - estimate the velocity in each edge, using historical data from the last month. – Different hourly models for each day of the week • Online Model - Use real-time date to calculate the speed. The model
  23. 23. 23© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Average speed(km/h) Default Model
  24. 24. 26© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Prediction of Bus Arrival
  25. 25. 27© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. • Based on the information of last use case, extrapolate to verify the quality of the service • Need to identify each bus trip, to evaluate the time interval between two buses of the same line, at each bus stop. Another use case - Auditing
  26. 26. 28© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Screen shot - Auditing
  27. 27. 29© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Data Quality Issues Route A Route B Bus GPS
  28. 28. 30© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Extending PaaS for Smart Cities
  29. 29. 31© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Real-time Dashboard Personal Dashboard Government Transactional Applications Commercial Big Data Application Government Big Data Application Commercial Transactional Applications Unified Control Center Application Layer Security Rules Payment Gateway Trust Authentication Identity Management Locations & Mapping Platform as a Service Data Governance DATA ANALYTICS TOOLS Historic & Predictive/DATA APIs Transactiona l Data Store Data Transformation Unstructured Data Structured Data City Semantics Audit Open Standards Data Ingestion Interfaces and Storage CITY IoT INFRASTRUCTURE CITY DATA SOURCES CITY ICT INFRASTRUCTURE Government Devices Commercia l Devices Utility Devices Personal Devices IoT Data Aggregation Governmen t Systems Social Media Commercial Systems Archived Data Fixed & Wireless Networks Cloud Services Enablement Layer Data Orchestration Layer Infrastructure Layer SECURITY Smart City Platform requirements
  30. 30. 32© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Real-time Dashboard Personal Dashboard Government Transactional Applications Commercial Big Data Application Government Big Data Application Commercial Transactional Applications Unified Control Center Application Layer Security Rules Payment Gateway Trust Authentication Identity Management Locations & Mapping Platform as a Service Data Governance DATA ANALYTICS TOOLS Historic & Predictive/DATA APIs Transactiona l Data Store Data Transformation Unstructured Data Structured Data City Semantics Audit Open Standards Data Ingestion Interfaces and Storage CITY IoT INFRASTRUCTURE CITY DATA SOURCES CITY ICT INFRASTRUCTURE Government Devices Commercia l Devices Utility Devices Personal Devices IoT Data Aggregation Governmen t Systems Social Media Commercial Systems Archived Data Fixed & Wireless Networks Cloud Services Enablement Layer Data Orchestration Layer Infrastructure Layer SECURITY High level Smart City Platform components PCF Pivotal Cloud Foundry E M C S T O R A G E IISILON and./or CLOUD NATIVE SOFTWARE DEFINED STORAGE V M W A R E v R e a l i z e C l o u d S u i t e & B I G D A T A E X T E N S I O N S P I V O T A L B I G D A T A S U I T E A D V A N C E D A N A L Y T I C SA P P L I C A T I O N S A T S C A L E D A T A P R O C E S S I N G GREENPLUM DATABASE HAWQ SPRING XD SPARK REDIS RABBITMQ GEMFIRE H A D O O P
  31. 31. 33© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Pivotal GPDB Delivers  Massively Parallel Analytics Performance  In-Database Analytical Extensions  Industry-Leading Load Speed  Rich SQL with Schema Agnosticism  Industry-Leading Workload Mgmt.  SAS Acceleration Options  Parallel Co-Processing with Hadoop  No-Forklift Scalability  Multi-Level Redundancy  Rich, Easy-to-Use Administration Tools  Big Data Backup  Comprehensive Security  Software-only or DCA
  32. 32. 34© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Simple to manage Single file system, single volume, global namespace Massively scalable Scales from 16 TB to over 50 PB in a single cluster 200GB/s throughput, 3.75M IOPS Unmatched efficiency Over 80% storage utilization, automated tiering and SmartDedupe Enterprise data protection Efficient backup and disaster recovery, and N+1 thru N+4 redundancy Robust security and compliance options RBAC, Access Zones, WORM data security, File System Auditing Data At Rest Encryption with SEDs, STIG hardening CAC/PIV Smartcard authentication, FIPS OpenSSL support Operational flexibility Multi-protocol support including NFS, SMB, HTTP, FTP and HDFS Object and Cloud computing including OpenStack Swift Isilon Scale-Out NAS
  33. 33. 35© Copyright 2014 EMC Corporation. All rights reserved.© Copyright 2014 EMC Corporation. All rights reserved. Lots of Little Files Hadoop Impact on Telemetry AKA - Small Files Problem for Hadoop Rio Smart Sensors - ESRI NameNode = 512 GB for RAM Each file eats away 1K in RAM 512GB / 1K = At Most 500M Files assuming no other processes on the box. Rio has 12.5K sensors for the 2016 Olympics. Assuming each sensor sent a file every minute, 18M files in 1 day. EMC believes in storing Metadata on SSD. This allows a scale out for the NameNode to get around the limitations of file growth on the scale-up NameNode.

×