Exploiting the Internet of Things


Published on

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Exploiting the Internet of Things

  1. 1. White Paper Exploiting the Internet of Things with investigative analytics A White Paper by Bloor Research Author : Philip Howard Publish date : May 2013
  2. 2. “The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so” Kevin Ashton
  3. 3. Exploiting the Internet of Things with investigative analytics Introduction There is a wealth of information hidden in the Internet of Things that can help organisations to understand what happened or might happen and why it happened or may happen, and help to point towards what to do about it. However, before we consider how to analyse this information and why it is important to your business we need to understand what we mean by the “Internet of Things” and by “investigative analysis”. The Internet of Things The Internet of Things was first described by Kevin Ashton in 1999. He wrote “computers— and, therefore, the Internet—are almost wholly dependent on human beings for information. Nearly all of the data available on the Internet was first captured and created by human beings—by typing, pressing a record button, taking a digital picture or scanning a bar code. Conventional diagrams of the Internet ... leave out the most numerous and important routers of all—people. The problem is, people have limited time, attention and accuracy—all of which means they are not very good at capturing data about things in the real world. And that’s a big deal. We’re physical, and so is our environment ... You can’t eat bits, burn them to stay warm or put them in your gas tank. Ideas and information are important, but things matter much more…. If we had computers that knew everything there was to know about things—using data they gathered without any help from us—we would be able to track and count everything, and greatly reduce waste, loss and cost. We would know when things needed replacing, repairing or recalling, and whether they were fresh or past their best. The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so.” Today there are multiple definitions of the Internet of Things but this is as good a place to start as any: the point is that a) more and more things (vehicles, smart meters, cell phones, planes, oil rigs, shop floor devices, clickstream data, anything with an active RFID tag, and so on) are being or have been instrumented and b) we now have the ability to analyse the information coming from this instrumentation in a cost-effective manner, so that the Internet of Things is becoming a reality. A Bloor White Paper 1 What does the Internet of Things mean to your organisation? Clearly it depends on your business but in principle it allows you to perform what we might call investigative analytics: exploring the what, why and how of all this instrumented data. Investigative analysis The term investigative analysis was first coined by Curt Monash in 2011 as a function to support “research, investigation and analysis in support of future decisions”. He defines it as “seeking (previously unknown) patterns in data”. More specifically he describes it as a “conflation of several disciplines, including statistics, data mining, machine learning and/ or predictive analytics; together with the more research-oriented aspects of business intelligence tools, including ad hoc query, drilldown, most things done by BI-using ‘business analysts’, and most things within BI called ‘data exploration’; plus analogous technologies as applied to non-tabular data types such as text or graph.” In other words, you are interested in discovering a pattern of past activity that point to some likely outcome in the future. And you want to be able to do that across any type of data regardless of whether it is transactional or not. To put this another way: something happened—is this part of a pattern that indicates that it might happen again? If so, what is that pattern and how can we can leverage it for business purposes in the future? In this paper we will explore some of the use cases around investigative analysis and how that can be applied to the Internet of Things, and then we go on to consider the sort of technology you need to enable this capability. We will conclude with a discussion of the solution provided by Infobright, a data warehousing vendor that is addressing the market for investigative analytics. © 2013 Bloor Research
  4. 4. Exploiting the Internet of Things with investigative analytics Use cases There are a great many potential environments where investigative analytics might be deployed. The following represent a sampling only and as the Internet of Things becomes more prevalent it is likely that new use cases will emerge. However, broadly speaking we can say that investigative analysis will allow you to: events or TV programmes) to predict future energy requirements so that appropriate resources will be available. If we return to the German example, some 32 million households will need to be metered by 2020, which represents an enormous amount of event data that must be captured, analysed, and acted upon. 1. Discover why something went wrong and determine what to do to prevent it going wrong in the future—this would apply to things like dropped calls for mobile networks, preventative maintenance across various industry sectors, smart meters (ditto), routing of both transportation and goods and so on. Banking and financial services 2. Discover why something went right so that you can build processes to support an increased likelihood of things going right in the future—for example, monitoring and analysing web traffic or mobile usage to encourage upsell or cross-sell opportunities. Sales of location-based services in mobile environments are a particular case in point. 3. Plan capacity to support requirements and service level agreements in the most costeffective fashion. This applies particularly in smart metering environments, mobile services and transportation environments, amongst others, where forecasting and meeting future demand is essential. It is worth noting, before we move on to discuss individual use cases that a number of these scenarios require real-time query processing as well as more in-depth, batch-oriented analytics. Smart meters Smart metering is of increasing interest around the world. While there are significant implementations already in the United States the rest of the world is some way behind in this respect. However that will change: for example, in accordance with European Union market guidelines, 80% of all households in Germany should be equipped with smart meters by 2020. Smart meters are installed in homes and businesses and feed a steady stream of data to the relevant application, where the data is analysed and the results used to efficiently allocate energy resources in real time, so that less energy is wasted. In addition, information collected from smart meters can be combined with weather forecast and other data (such as major sporting © 2013 Bloor Research 2 ATMs (automated teller machines) are not dissimilar to smart meters: they provide you with a service (money, statements and so forth) and they update your account. They have not, historically, been used to collect data in order to forecast future demand but it is clear that there is a shift away from cash and towards automated payments of various kinds. Banks are therefore looking to rationalise their ATM networks. On the other hand they do now wish to alienate existing customers that still need access to cash. Understanding who uses cash machines and how often will be fundamental to any decisions and, given that you can withdraw cash from your bank at a rival bank’s ATM it will make sense for banks to collaborate on where ATMs should be rationalised. The flip side of the move away from cash is the increase in mobile payments. This raises two interesting areas with respect to investigative analysis. One is that the less people use cash and the more they use electronic payments the easier it is to profile those individuals and to understand their preferences, which, in turn, can enable better upsell and cross-sell opportunities. Conversely, there are also security implications: in particular, the better you understand customer spending patterns the better you are able to detect the likelihood that a card or mobile phone has been stolen and is being used for fraudulent purposes. Network analysis in telecommunications For both performance and planning purposes telcos need to monitor and analyse traffic. Key elements to determine are ‘hot spots’ within the network—areas with particularly high usage—and failures within the network. A proper understanding of the former, and how this is developing over time, will be critical to future investments in new infrastructure in order to meet growing demand. Conversely, any failure within the network is an immediate problem that needs to be resolved as speedily as possible. Failures may lead to A Bloor White Paper
  5. 5. Exploiting the Internet of Things with investigative analytics Use cases connections being dropped (which are well known potential indicators of customer churn) or reduced service. The analysis of usage trends, combined with location-based and demographic data, will be important for planning future infrastructure investments. In telecommunications there are also a number of other areas where investigative analysis may be used, for analysis associated with mobile payments, location-based services and so forth, as previously discussed. Preventative maintenance The average oil platform has 40,000 sensors. A flight across the Atlantic generates over 9TB of data about the status of the plane you are in. Trains and railway tracks abound with sensors. However, this is not limited to transportation: equipment of all sorts, whether on construction sites or the shop floor, has built-in monitors and sensors designed to alert operators, pilots or drivers to any problems that may occur. However, historically this information has been discarded rather than analysed, principally because there were not the tools available to analyse this data in a cost-effective manner. With modern technology this is now changing and this wealth of information is being used to identify patterns of failure (if this component fails then that one is likely to do so within a certain period) and to predict the failure of particular elements so that preventative maintenance can avert potential problems. It should be noted that preventative maintenance doesn’t just apply to equipment and machinery of various sorts but also to people. For example, there are a number of professional sports bodies (for example, in football) that monitor their players’ activities on the field and in training so that they can be rested at appropriate times in order to avoid injury. As a different example, there is a company in Australia that provides glasses to drivers of trucks and heavy (mining) equipment which monitors how often the driver’s blink, blinking being a sign of tiredness. Not only does this prevent accidents in the short term (the driver will be alerted if tiredness is indicated) but subsequent analysis of the data can helps to optimise shift patterns and rosters. Telemetry used by motor insurance providers to monitor driving safety is yet another example (though the emphasis here is more on calculating premiums) and no doubt there are also applications within healthcare. A Bloor White Paper 3 Logistics Anywhere where GPS signals are part of a business process there is likely to be an application for investigative analytics. For example, one of the major oil companies that has oil rigs in the Arctic uses GPS tracking information combined with weather data to predict the movement of ice floes that can impact on drilling operations. More prosaically, road transportation, field service management and similar sectors are heavily dependent on traffic patterns and the location of relevant vehicles to optimise routing, while container tracking has similar requirements. In these cases there are both real-time issues (recognising that re-routing would be appropriate and doing so) and long-term analytic requirements that will enable better routing in future. Comparable requirements can also apply in retail and manufacturing environments where, instead of using GPS signals, goods or parts are identified by RFID tags. One leading aircraft manufacturer, for example, makes extensive use of this technology for part tracking and optimisation. Security information and log data Unlike the other use cases discussed, the management and analysis of security information (who might be attacking your company’s infrastructure and how) is by no means a new market. The standard approaches to this market include the use of SIEM (security information and event management) and log management, where the former is a superset of the latter that includes (near) real-time identification of attack vectors as well as the storage and forensic analysis capabilities provided against log data. The storage of log data needs to be extremely efficient. Historically, companies used to store only a few months’ worth of data for online analysis. However, the increase in low and slow attacks, or advanced persistent threats (APTs), which can spread over not just months but years means that very efficient storage mechanisms are required that at the same time support the sort of in-depth analytics that are required to identify patterns of activity. Such patterns may be fraudulent or, more often, patterns that will enable the identification of threats. © 2013 Bloor Research
  6. 6. Exploiting the Internet of Things with investigative analytics Use cases In summary We do not need to belabour the point: almost anywhere that machines or devices generate information there is scope for investigative analytics because machines always go wrong or are in the wrong place or are doing something interesting right now that you would like to know about. And you would like to know about it not just so that you can take appropriate action at this particular moment but also so that you can analyse and predict when this might happen again so that you can prevent it and, thereby, provide a better service to your customers and/or users. © 2013 Bloor Research 4 A Bloor White Paper
  7. 7. Exploiting the Internet of Things with investigative analytics What is required Because you are potentially going to be storing and analysing a lot of data you will need a technology that enables you to exploit this data for business insight—to reduce costs, identify new revenue streams, and improve competitive positioning. However, because the sorts of analytics we are discussing include a realtime component, then a simple batch-based analytic environment (such as Hadoop) will not be sufficient for the fast, interactive queries needed. There are therefore a number of requirements for such an engine, as follows: 1. It must be scalable enough to hold all the data you need for long-term analytics. This will obviously be dependent on the environment. For example, to determine preventative maintenance characteristics against the 40,000 sensors on an oil rig will certainly require months’ and quite possibly years’ worth of data, which will be on a different scale from network analysis in a telecommunications company which has a limited number of masts and only needs to perform analyses across limited periods of time. 2. It must be fast enough to ingest the data within a reasonable timeframe, depending on the latency required. That is, you need to able to load the data fast enough to provide for whatever real-time query processing or alerting that is required. 3. We are potentially talking about very large quantities of data, notwithstanding the comments made in paragraph 1. In order to be able to store this in an economical fashion you need very efficient compression of the data so that storage requirements can be minimised. A Bloor White Paper 5 4. Next, while not necessarily imperative, it is likely that you will not want a system that requires no manual tuning or DBA administration such as the creation of indexes. If you have to index the data as it is loaded this will significantly slow down the loading process and it will add to the size of the database, not to mention adding to administrative costs. 5. Finally, the actual time taken to process queries needs to be fast enough to meet service level requirements, especially bearing in mind that this may involve complex analytics. In addition, it is probable that you will wish to run ad hoc queries against the data as well as running standard report and analytic processes, so the database will need to be fast enough and flexible enough to support this in an efficient manner. Databases that use indexes or other constructs, such as projections, to achieve fast query performance will not usually provide good enough performance for unplanned (ad hoc) queries. 6. The sorts of applications we are discussing are often mission-critical as they support the real-time operations of your organisation. It is therefore necessary that any solution is at least highly available (caters for unplanned downtime without stopping) and, preferably, that it is continuously available (caters for planned downtime as well as unplanned stoppages). Of course there are also more generic requirements such as simple and quick implementation, low costs (both direct and indirect), minimal administration and so on. © 2013 Bloor Research
  8. 8. Exploiting the Internet of Things with investigative analytics Infobright Infobright is a provider of analytic database technology that comes in three flavours: enterprise-class appliance configurations, software-only installations, and embedded OEM implementations. At its core, Infobright is a columnar database initially built on MySQL. Column-oriented databases are better suited for analytics than row-based databases since, unlike transaction processing environments, it is commonly the case that only a limited subset of columns are required from each record. By grouping the data together in this way, the database only needs to retrieve columns that are relevant to the query, greatly reducing the overall I/O. Being column-based also has the advantage of providing improved compression, which further reduces storage and improves performance. However, Infobright goes beyond the conventional use of columns to provide even better performance, better compression, and reduced administration through the use of its Knowledge Grid. This is based on the concept of Data Packs. The data within each column is stored in 64K item groupings called Data Packs. The use of Data Packs improves data compression as the optimal compression algorithm is applied based on the data contents. According to Infobright, an average compression ratio of 10:1 is achieved after loading data into Infobright (though many users see compression of 40:1 and more). At the same time the software creates metadata about the contents of each Data Pack as it is being loaded. This metadata is stored in the Knowledge Grid. The Knowledge Grid contains information about the contents of each Data Pack as well as the relationships between Data Packs, which are automatically created and stored. This includes a set of statistics and aggregate values of the data from each Data Pack, such as MIN, MAX, SUM, AVG, COUNT, and Number of NULLs. A further set of metadata describing ranges of numeric value occurrences and character positions, as well as column relationships between Data Packs, is also stored. © 2013 Bloor Research 6 As a query comes in, Infobright uses the information in the Knowledge Grid to determine which Data Packs are relevant to the query before decompressing any data. In many cases, the summary information already contained in the Knowledge Grid is sufficient to resolve the query, and nothing is decompressed. Working together, the Data Packs, Knowledge Grid and Infobright’s iterative computing engine (Granular Computing Engine) should ensure fast, consistent query performance even when data volumes increase dramatically. Needless to say, the Knowledge Grid is automatically updated whenever the database is updated. Note that, thanks to the Knowledge Grid, Infobright does not require you to partition or index the data. This not only reduces administration but it also prevents data skew, which is a performance problem for vendors using horizontal (row-based) partitioning and which forces re-balancing of the database. In addition, by eliminating the need to partition the data, Infobright delivers support for ad hoc queries, which are a foundational requirement for investigative analytics. The reason for this is that if you partition (or shard) your data, you limit the way that you can access the data: if your query matches the way you have partitioned the data then your queries will perform well—but if they don’t then they won’t. In other words, partitioning works best when you know in advance what queries you are going to ask: which is the antithesis of ad hoc and self-service query processes. By not needing to partition data, Infobright ensures a consistent level of query performance regardless of the nature of the query (assuming equal complexity). In so far as loading is concerned this can run at up to TBs per hour within a multi-machine loader configuration with Infobright. Many customers use Infobright in a highly dynamic production environment where new data needs to be loaded and accessed within minutes for near-real-time analytics. A Bloor White Paper
  9. 9. Exploiting the Internet of Things with investigative analytics Conclusion As Kevin Ashton wrote, back in the last century, “The Internet of Things has the potential to change the world, just as the Internet did. Maybe even more so.” It has taken more than a decade but the Internet of Things is here. It isn’t yet as widely implemented as it will be and it will take a while before its full impact is felt, both at a business level and in our daily lives, especially as it is exploited through the use of investigative analytics. But make no mistake: it is here and it is growing. From a business perspective this has very significant repercussions, with the addition of investigative analytics the Internet of Things will enable substantial steps forward in customer service in the present, and in business planning for the future. Like all major technology changes this combination of capabilities offers both opportunities and threats and there will be winners and losers. The winners will be those that grasp these new technologies and use them to enhance and expand their business. Further Information Further information about this subject is available from http://www.BloorResearch.com/update/2170 A Bloor White Paper 7 © 2013 Bloor Research
  10. 10. Bloor Research overview Bloor Research is one of Europe’s leading IT research, analysis and consultancy organisations. We explain how to bring greater Agility to corporate IT systems through the effective governance, management and leverage of Information. We have built a reputation for ‘telling the right story’ with independent, intelligent, well-articulated communications content and publications on all aspects of the ICT industry. We believe the objective of telling the right story is to: • Describe the technology in context to its business value and the other systems and processes it interacts with. • Understand how new and innovative technologies fit in with existing ICT investments. • Look at the whole market and explain all the solutions available and how they can be more effectively evaluated. • Filter “noise” and make it easier to find the additional information or news that supports both investment and implementation. • Ensure all our content is available through the most appropriate channel. Founded in 1989, we have spent over two decades distributing research and analysis to IT user and vendor organisations throughout the world via online subscriptions, tailored research services, events and consultancy projects. We are committed to turning our knowledge into business value for you. About the author Philip Howard Research Director - Data Management Philip started in the computer industry way back in 1973 and has variously worked as a systems analyst, programmer and salesperson, as well as in marketing and product management, for a variety of companies including GEC Marconi, GPT, Philips Data Systems, Raytheon and NCR. After a quarter of a century of not being his own boss Philip set up his own company in 1992 and his first client was Bloor Research (then B ­ utlerBloor), with Philip working for the company as an associate analyst. His relationship with Bloor Research has continued since that time and he is now Research Director focused on Data Management. Data management refers to the management, movement, governance and storage of data and involves diverse technologies that include (but are not limited to) databases and data warehousing, data integration (including ETL, data migration and data federation), data quality, master data management, metadata management and log and event management. Philip also tracks spreadsheet management and complex event processing. In addition to the numerous reports Philip has written on behalf of Bloor Research, Philip also contributes regularly to IT-Director.com and I ­T-Analysis.com and was previously editor of both “Application ­ evelopment D News” and “Operating System News” on behalf of Cambridge Market Intelligence (CMI). He has also contributed to various magazines and written a number of reports published by companies such as CMI and The Financial Times. Philip speaks regularly at conferences and other events throughout Europe and North America. Away from work, Philip’s primary leisure activities are canal boats, skiing, playing Bridge (at which he is a Life Master), dining out and walking Benji the dog.
  11. 11. Copyright & disclaimer This document is copyright © 2013 Bloor Research. No part of this publication may be reproduced by any method whatsoever without the prior consent of Bloor Research. Due to the nature of this material, numerous hardware and software products have been mentioned by name. In the majority, if not all, of the cases, these product names are claimed as trademarks by the companies that manufacture the products. It is not Bloor Research’s intent to claim these names or trademarks as our own. Likewise, company logos, graphics or screen shots have been reproduced with the consent of the owner and are subject to that owner’s copyright. Whilst every care has been taken in the preparation of this document to ensure that the information is correct, the publishers cannot accept responsibility for any errors or omissions.
  12. 12. 2nd Floor, 145–157 St John Street LONDON, EC1V 4PY, United Kingdom Tel: +44 (0)207 043 9750 Fax: +44 (0)207 043 9748 Web: www.BloorResearch.com email: info@BloorResearch.com