Slides for the talk describing the paper on Emrooz, a scalable database for sensor observations with semantics according to the Semantic Sensor Network ontology.
A lot of data is best represented as time series: Operational data, financial data, and even in data warehouses the dominant dimension is often time. We present Chronix, a time series database based on Apache Solr and Spark which is able to handle trillions of time series data points and perform interactive queries. Chronix Spark is open source software and battle-proven at a German car manufacturer and an international telco.
We demonstrate several use cases of Chronix from real-life. Afterwards we lift the curtain and deep-dive into the Chronix architecture esp. how we're using Solr to store time series data and how we've hooked up Solr with Spark. We provide some benchmarks showing how Chronix has outperformed other time series databases in both performance and storage-efficiency.
Chronix is open source under the Apache License (http://chronix.io).
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
A lot of data is best represented as time series: Operational data, financial data, and even in data warehouses the dominant dimension is often time. We present Chronix, a time series database based on Apache Solr and Spark which is able to handle trillions of time series data points and perform interactive queries. Chronix Spark is open source software and battle-proven at a German car manufacturer and an international telco.
We demonstrate several use cases of Chronix from real-life. Afterwards we lift the curtain and deep-dive into the Chronix architecture esp. how we're using Solr to store time series data and how we've hooked up Solr with Spark. We provide some benchmarks showing how Chronix has outperformed other time series databases in both performance and storage-efficiency.
Chronix is open source under the Apache License (http://chronix.io).
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
OSDC 2016, Berlin: Talk by Florian Lautenschlager (@flolaut, Senior Software Engineer at QAware)
Abstract: How to store billions of time series points and access them within a few milliseconds? Chronix! Chronix is a young but mature open source project that allows one for example to store about 15 GB (csv) of time series in 238 MB with average query times of 21 ms. Chronix is built on top of Apache Solr a bulletproof distributed NoSQL database with impressive search capabilities. In this code-intense session we show how Chronix achieves its efficiency in both respects by means of an ideal chunking, by selecting the best compression technique, by enhancing the stored data with (pre-computed) attributes, and by specialized query functions.
Round Table Introduction: Analytics on 100 TB+ catalogsMario Juric
Introductory slides to spark the discussion at the MSDSE 2017 round table on tools enabling data management and analytics of 10-100 TB catalogs, using a specific astronomy problem as a case study.
Overview of the virtual observatory course by Juan de Dios Santander Vela, and part of the MTAF (Métodos y Técnicas Avanzadas en Física, Advanced Methods and Techniques in Physics) Master at the University of Granada (UGR).
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Globus
This project, which involved streaming light source data from the SC19 show floor to Argonne’s Leadership Computing Facility (ALCF) outside Chicago, won the top prize at the inaugural SCinet Technology Challenge at SC19 in Denver, CO.
This is a slide deck that I have been using to present on GeoTrellis for various meetings and workshops. The information is speaks to GeoTrellis pre-1.0 release in Q4 of 2016.
Jack Zhu presents a quick set of overview slides on the Bioconductor SRAdb package. The package allows SQL access to the Sequence Read Archive (SRA) repository at NCBI.
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...Srinath Perera
DEBS Grand Challenge is a yearly, real-life data based event processing challenge posted by Distributed Event Based Systems conference. The 2015 challenge uses a taxi trip data set from New York city that includes 173 million events col- lected over the year 2013. This paper presents how we used WSO2 CEP, an open source, commercially available Complex Event Processing Engine, to solve the problem. With a 8-core commodity physical machine, our solution did about 350,000 events/second throughput with mean latency less than one millisecond for both queries. With a VirtualBox VM, our solution did about 50,000 events/second throughput with mean latency of 1 and 6 milliseconds for the first and second queries respectively. The paper will outline the solution, present results, and discuss how we optimized the solution for maximum performance.
Round Table Introduction: Analytics on 100 TB+ catalogsMario Juric
Introductory slides to spark the discussion at the MSDSE 2017 round table on tools enabling data management and analytics of 10-100 TB catalogs, using a specific astronomy problem as a case study.
Overview of the virtual observatory course by Juan de Dios Santander Vela, and part of the MTAF (Métodos y Técnicas Avanzadas en Física, Advanced Methods and Techniques in Physics) Master at the University of Granada (UGR).
Real-Time Analysis of Streaming Synchotron Data: SCinet SC19 Technology Chall...Globus
This project, which involved streaming light source data from the SC19 show floor to Argonne’s Leadership Computing Facility (ALCF) outside Chicago, won the top prize at the inaugural SCinet Technology Challenge at SC19 in Denver, CO.
This is a slide deck that I have been using to present on GeoTrellis for various meetings and workshops. The information is speaks to GeoTrellis pre-1.0 release in Q4 of 2016.
Jack Zhu presents a quick set of overview slides on the Bioconductor SRAdb package. The package allows SQL access to the Sequence Read Archive (SRA) repository at NCBI.
ACM DEBS Grand Challenge: Continuous Analytics on Geospatial Data Streams wit...Srinath Perera
DEBS Grand Challenge is a yearly, real-life data based event processing challenge posted by Distributed Event Based Systems conference. The 2015 challenge uses a taxi trip data set from New York city that includes 173 million events col- lected over the year 2013. This paper presents how we used WSO2 CEP, an open source, commercially available Complex Event Processing Engine, to solve the problem. With a 8-core commodity physical machine, our solution did about 350,000 events/second throughput with mean latency less than one millisecond for both queries. With a VirtualBox VM, our solution did about 50,000 events/second throughput with mean latency of 1 and 6 milliseconds for the first and second queries respectively. The paper will outline the solution, present results, and discuss how we optimized the solution for maximum performance.
A study of the demographic differences of instructors in using e-Textbooks in...Sirui Wang
This study presented at AECT 2015 was a part of my dissertation of instructors' using e-Textbooks in higher education. It examined instructors in public universities in east south central of the U.S.
https://favordelivery.com/1-million-favors/
For our 1 Millionth Favor delivery, we were tasked with creating an infographic which summarizes key milestones, dates, and stats from our first 2 years. I had the fortunate opportunity to work with our own internal Data Scientist who pumped out stats like a rapper busts rhymes. This was a lot of fun and we were allowed to inject a good bit of humor, as well.
Introdução a Organização Mundial do Comércio (OMC). Ingresso da China na OMC. Comercio Exterior Brasil. Comercio Exterior China. Politica Comercial China, Hong Kong e Macau. Disputas comerciais na OMC: China e EUA. OMC Rodada de Doha.
WattGo: Analyses temps-réél de series temporelles avec Spark et Solr (Français)DataStax Academy
Since two years, embracing new challenges as Smart Grid technologies emerge and IoT world grows, WattGo engages utility customers with personalized Smart Energy Analytics revealing the value of raw energy data.
During this session, we will show you how we are able to handle massive dataflow and perform real-time analysis on smart meters and IoT devices data using Spark Streaming.
Then we will describe some key features of our infrastructure and how we designed reactive data processing pipeline on top of Cassandra using core functionalities like Cassandra Triggers and DSE field transformers.
In the end, we will explain why we decide to move from ElasticSearch to Solr leveraging full power of DSE.
Hybrid acquisition of temporal scopes for rdf dataAnisa Rula
Information on the temporal interval of validity for facts described by RDF triples plays an important role in a large number of applications. Yet, most of the knowledge bases available on the Web of Data do not provide such information in an explicit manner. In this paper, we present a generic approach which addresses this drawback by inserting temporal information into knowledge bases. Our approach combines two types of information to associate RDF triples with time intervals. First, it relies on temporal information gathered from the document Web by an extension of the fact validation framework DeFacto. Second, it harnesses the time information contained in knowledge bases. This knowledge is combined within a three-step approach which comprises the steps matching,
selection and merging. We evaluate our approach against a corpus of facts gathered from Yago2 by using DBpedia and Freebase as input and different parameter settings for the underlying algorithms. Our results suggest that we can detect temporal information for facts from DBpedia
with an F-measure of up to 70%.
Lucene Revolution 2016, Boston: Talk by Josef Adersberger (@adersberger, CTO at QAware).
Abstract: A lot of data is best represented as time series: Operational data, financial data, and even in data warehouses the dominant dimension is often time. We present Chronix, a time series database based on Apache Solr and Spark which is able to handle trillions of time series data points and perform interactive queries. Chronix Spark is open source software and battle-proven at a German car manufacturer and an international telco.
We demonstrate several use cases of Chronix from real-life. Afterwards we lift the curtain and deep-dive into the Chronix architecture esp. how we're using Solr to store time series data and how we've hooked up Solr with Spark. We provide some benchmarks showing how Chronix has outperformed other time series databases in both performance and storage-efficiency.
Chronix is open source under the Apache License (http://chronix.io).
Environment Canada's Data Management ServiceSafe Software
A brief history in TimeSeries data at Environment Canada. An Enterprise view of how FME can be integrated into departmental data management activities.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Utilocate offers a comprehensive solution for locate ticket management by automating and streamlining the entire process. By integrating with Geospatial Information Systems (GIS), it provides accurate mapping and visualization of utility locations, enhancing decision-making and reducing the risk of errors. The system's advanced data analytics tools help identify trends, predict potential issues, and optimize resource allocation, making the locate ticket management process smarter and more efficient. Additionally, automated ticket management ensures consistency and reduces human error, while real-time notifications keep all relevant personnel informed and ready to respond promptly.
The system's ability to streamline workflows and automate ticket routing significantly reduces the time taken to process each ticket, making the process faster and more efficient. Mobile access allows field technicians to update ticket information on the go, ensuring that the latest information is always available and accelerating the locate process. Overall, Utilocate not only enhances the efficiency and accuracy of locate ticket management but also improves safety by minimizing the risk of utility damage through precise and timely locates.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Transform Your Communication with Cloud-Based IVR Solutions
SSN-TC workshop talk at ISWC 2015 on Emrooz
1. First Joint International Workshop on
Semantic Sensor Networks and Terra Cognita
October 11, 2015, Bethlehem, PA, USA
Emrooz: A Scalable Database for
SSN Observations
Markus Stocker, Narasinha Shurpali, Kerry Taylor, George
Burba, Mauno Rönkkö, Mikko Kolehmainen
markus.stocker@uef.fi
@markusstocker and @envinf
2. 2
Introduction
Expressive ontologies for sensor (meta-) data (SSN)
Flexible graph data model (RDF)
Triple stores obvious choice
Unfortunately hardly viable at scale
Triple stores indexes for graph pattern queries
Not designed for time series interval queries
3. 3
Aim
Build a database that ...
Consumes SSN observations in RDF
Evaluates SSN observation SPARQL queries
Scales to billions of observations
Has better query performance than triple stores
5. 5
Cassandra data model
Schema consisting of
Partition key (row key) of type ascii
Clustering key (column name) of type timeuuid
Column value of type blob
The partition key consists of two (dash-concatenated) parts
SHA-256 hex string digest of sensor-property-feature URIs
Date time string of pattern yyyyMMddHHmm
Computed from observation result time
Floor-rounded to year, month, day, hour, or minute
Rounding depends on sensor sampling frequency
Goal is to limit the number of columns per row
Clustering key determined by observation result time
Column value is set of triples for observation (binary)
6. 6
Experiments
LI-7500A Open Path CO2/H2O Gas Analyzer
LI-7700 Open Path CH4 Analyzer
Property of mole fraction
Three features for the monitored gases
7.
8. 8
Experiments
January 7 to May 26, 2015, 6045 GHG archive files
Estimated # of sensor observations is 326 430 000
Estimated # of triples is 4.9 billion (15 triples / observation)
Load and query performance on 10 subsets
SPARQL query with 10 min interval
Compared to Stardog and Blazegraph
Test performance with varying time interval
18. 18
Related and future work
Other authors have pointed out the problem
“Semantification of measurement data not promising”
RDF databases on NoSQL systems (e.g. Cumulus RDF)
Support for QB observations (done)
REST API (preliminary)
Integration with R/Matlab (preliminary)
Performance comparison with other systems
19. 19
Conclusion
SSN and RDF nice for sensor (meta-) data
Triple stores inadequate for observation data
Alternative approaches required
What are the advantages and disadvantages?
Reasoning on all data by some sensor?
Query for observation values exceeding threshold?