Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building a Front End for a Sensor Data Cloud


Published on

The talk was delivered by Ian Rolewicz at the International Workshop on Cloud for High Performance Computing 2011 (C4HPC'11), co-located with the 2011 International Conference on Computational Science and its Applications (ICCSA 2011) .


This document introduces the TimeCloud Front End, a web based interface for the TimeCloud platform that manages large-scale time series in the cloud. While the Back End is built upon scalable, fault tolerant distributed systems as Hadoop and HBase and takes novel approaches for faciliating data analysis over massive time series, the Front End was built as a simple and intuitive interface for viewing the data present in the cloud, both with simple tabular display and the help of various visualizations. In addition, the Front End implements model-based views and data fetch on-demand for reducing the amount of work performed at the Back End.

Published in: Technology, Economy & Finance
  • Be the first to comment

  • Be the first to like this

Building a Front End for a Sensor Data Cloud

  1. 1. Building a Front End Interface for aSensor Data Cloud Ian Rolewicz Semester Project, FALL 2010Supervised by Hoyoung Jeung, Michele Catasta & Zoltán Miklós
  2. 2. Introducing TimeCloud• Platform for massive time-series management and analysis• Currently developed at the LSIR
  3. 3. TimeCloud System Overview
  4. 4. My job
  5. 5. The Front End• Web-based interface• Main Goals: – Display the Data – Be user-friendly (preferably) – Reduce the work performed at the Back End• Implemented in Python using the Django Framework and the YUI 2 library.• Visualizations implemented with Protovis
  6. 6. TimeCloud Front End Live Demo
  7. 7. Full Precision vs. Model-Based• Full Precision – Real Data – Whole Data taken from the Back End – Only display at the Front End• Model-Based Approximations – Reconstructed Data from Parameters – Less Data retrieved from the Back End – Reconstruction and display of the values at the Front End
  8. 8. The Data Model• NULLs not stored in HBase → better for sparse data• Column families stored in separate files
  9. 9. Performance Measures• Testbed on a cluster of 13 Amazon EC2 servers, each having: – 15 GB Memory – 8 EC2 Computing Units – 1.7 TB Storage – 64-bit platform• One of them: HBase Master + Front End• 12 others: HBase Region Servers
  10. 10. Data Used for Measures• « Worst-case » for TimeCloud• Compress no more than 1/5 of original data when linearly approximated• Linear regression → in GSN, usually 99% of compression
  11. 11. Random Reads• 1000 random reads in approximated dataset• Evenly spread• 22% improvement in query execution time• Less data retrieved → more cache hits
  12. 12. Scan
  13. 13. Network usage KB transferred KB transferredGraph # (original) (approximated) 1 112.3 23.3 2 124.5 28.0 3 126.6 25.9 4 120.2 25.1 5 119.9 26.8 6 124.4 27.7
  14. 14. Conclusion• Goals achieved: – Display the Data – Keep it simple – Reduce the work performed at the Back End• Good Basis for future extensions• Future Work – User/Group-based managment and access – Completion of the model-based views – Design of additional visualizations
  15. 15. Questions ?