Enablers, Platforms, & Early adopters for internet of things. How hadoop helps in enabling the technology to process data from sensors? What are the limitations in using Hadoop for internet of things?
2. Agenda
•Introduction
•IoT and Big Data
•Big Data use case
•Break
•Hadoop Architecture
•Hadoop Security
•Lunch
•Hadoop Project Management
•Break
•Hive
•Review and feedback
4. Why it matters
•1990 – Fixed internet wave So far it has connected 1 billion users to internet
•2000 – Mobile internet wave Added another 2 billions users to internet
•Today – Internet of Things Potential to connect 28 billion things to internet from Nike fuel band to Nest thermostat
5. Enablers, Platforms, & Early adopters
Enablers: Increased share for Wi-Fi, sensors and low-cost microcontrollers. Platforms: Software applications for managing communications between devices, middleware, storage, and data analytics. Early adopters: Home automation and monitoring health are at the forefront of the early product opportunity, while factory optimization may lead the efficiency side.
6. Different than regular internet
Leverages sensors attached to things example – temperature, pressure and acceleration. More data is generated by devices than by people. Add intelligence to manual process example – use less power on hot days. Extends the productivity gains to things people use. Connect objects to network. Intelligence shifting to network edge than cloud computing. More fog computing.
7. List of Enablers
Cheap Sensors - Price reduced 50% in last 10 years Cheap Bandwidth - 40X cheaper in last ten years Cheap Processing - 60X cheaper in last 10 years Smartphones - Personal Gateway for people Wireless Coverage - Wi-Fi coverage now ubiquitous Big Data - To make sense of generated data IPv6 adoption - IPv4 – 32 bit – 4.3 billion addresses, IPv6 – 128 bit – 3.4 * 10**38 – almost limitless
8. Why companies care
Revenue generation –incremental revenue streams based on new products and services. AT&T has introduced a Connected Car service in partnership with a number of automobile manufacturers, including Audi, GM, Tesla and Volvo, which offer high-speed 3G or 4G connections for a monthly subscription fee of $10. By the end of 2014, 30 of GM’s 2015 vehicle models will have LTE support, enabling vehicles to act as a Wi-Fi hotspot with connectivity for up to 7 devices, as well as access to OnStar for remote vehicle access, diagnostics and emergency service. Productivity and cost savings – Businesses are also embracing the Internet of things to improve productivity and save costs, such as Capex, labor, and energy. Verizon is saving more than 55 million kWh annually across 24 data centers by deploying hundreds of sensors and control points throughout the data center, connected wirelessly. The result is a reduction of 66 million pounds of greenhouse gases per year.
9. Role of software
•Managing the communication with connected devices/sensors;
•Providing middleware for integration to data repositories;
•Storing and securing the data; and
•Analyzing and visualizing the data. Looks like a platform-centric approach will favor the mobile leaders from the 2nd wave
10. Data Processing Requirements
Dealing with raw data. In terms of data ingestion, an internet of things data-processing platform should be able to natively deal with data, which shows little standardization in terms of formats. Hadoop makes it possible to land the incoming data in its raw format and, for optimization purposes, to convert data downstream to more sophisticated formats, such as Parquet. Supporting different workload types. Traditionally, Hadoop was, with MapReduce as its primary processing paradigm, a rather batch-centric system. Internet of things applications, however, usually require that the platform support stream processing from the get-go as well as dealing with low-latency queries against semi-structured data items, at scale. Hadoop offers an append-only file system called HDFS to persist data. For stream data, usually message queues such as Apache Kafka are used to buffer and feed the data into stream processing systems such as Apache Storm or to leverage the stream part of generic engines such as Apache Spark.
11. Business continuity. Commercial Internet of things applications usually come with SLAs in terms of uptime, latency, and disaster-recovery metrics such as Recovery Point Objective / Recovery Time Objective. The platform should be able to guarantee those SLAs. This is especially critical in the context of Internet of things applications in domains such as health care. Security and Privacy. The platform must ensure a secure operation –that as of today is still considered to be challenging, in an end-to-end manner. This requirement includes integration with existing authentication and authorization systems, as well as with services in the enterprise such as Lightweight Directory Access Protocol (LDAP), Active Directory (AD), Kerberos, Security Assertion Markup Language (SAML), or Linux Pluggable Authentication Module (PAM). Further, the privacy of human users must be guaranteed. Access Control Lists (ACLs) and data- provenance mechanisms must be available in the platform, along with data encryption on the wire and/or masking at rest.
Data Processing Requirements – contd.
12. Hadoop is perfectly capable of dealing with raw internet of things data, the support for a wide array of workloads is rather limited. The recent changes in the Hadoop, with YARN and Mesos, allows isolation on the compute layer. On the storage layer, HDFS provides only a flat namespace, forcing real-world apps to use dedicated, separate clusters for different workloads. Due to architectural constraints of HDFS, it is not able to deal with many (small) files in a read/write manner. The same design constraints also cause issues concerning business continuity (around availability and the capability to recover from disasters) as well as security challenges. Security challenges are being addressed, from on-disk encryption to incubating projects at Apache, such as Sentry (role-based authorization) and Knox (HTTP Gateway for authentication and access).
Hadoop Limitations
13. Contact: Phone: 408-647-3010 URL: http://www.aziksa.com
For further training information contact us. Email : support@aziksa.com