GeoWave: Scaling Complex (Not Just Geo) Data

•Download as PPTX, PDF•

2 likes•1,102 views

DataWorks Summit

This presentation will discuss the successes of GeoWave applied to the spatiotemporal domain, and focus on how the successes in this domain can be further generalized to a diverse set of complex data structures. The intent is to draw corollaries to the data challenges of the audience. Fast indexed access to massive datasets fundamentally involves highly optimized range scans within key-value stores. If your reaction is "that's easier said than done" than you've had the pre-requisite experiences to attend this talk. The intent of the software is to make these use cases as seamless as possible for downstream consumers of the framework. Briefly, a GeoWave "dimension" is simply a function to apply sort order to real world values. The constructs for defining these "dimensions" and many more details will be discussed in this presentation. At the core of GeoWave is a capability to store, retrieve, and analyze multi-dimensional data structures within distributed key-value stores. Fundamentally, spatio-temporal data serves as a special case for which GeoWave provides tailored extensions. The software is intended to be easily pluggable into any sorted key-value store, with current implementations available for Apache HBase, Apache Accumulo, Apache Cassandra, Apache Kudu, Redis, RocksDB, Google BigTable, and Amazon DynamoDB. The datastore support is truly provided as an extension that is discoverable at runtime. Following any GeoWave programmatic API, commandline, or service access will not be tied to any particular key-value store. Furthermore there are optimized data transfer utilities across supported stores. This approach has proven to provide seamless transitions of scale from embedded applications, external in-memory services, all the way up to its primary applications within highly distributed ecosystems.

Barry Bragg
Maxar Technologies
Rich Fecher
Maxar Technologies

An open source framework that
leverages the scalability of key-value
stores for effective storage, retrieval,
and analysis of massive geospatial
datasets

At its core, GeoWave
handles spatial and
spatiotemporal indexing
within distributed key-
value stores with
natural integrations for
various popular
frameworks
popular geospatial platforms distributed processing
frameworks
GeoWave bridges the gap between and

Hosted

Use a Space Filling Curve
(SFC) to impose multi-
dimensional data.

Z-Order Hilbert H-order Peano AR2W2 BΩ
WL∞ ∞ 6 4 8 5.40 5.00
WL2 ∞ 6 4 8 6.04 5.00
WL1 ∞ 9 8 10.66 12.00 9.00
WBA ∞ 2.40 3.00 2.00 3.05 2.22
ABA 2.86 1.41 1.69 1.42 1.47 1.40
Haverkort, Walderveen Locality and Bounding-Box Quality of Two-Dimensional Space-Filling Curves 2008 arXiv:0806.4787v2
Average Total Bounding
Box Area (ABA)Worst Case Dilation
Worst Case Bounding Box
Area Ratio (WBA)

● What about data with extents such as lines/polys
or time ranges?
○ We need to represent multiple resolutions...
● What about unbounded dimensions?
○ We can define a periodicity to bound a single
SFC. We end up with an SFC per period (or
combination of periods).
● What about queries?
○ Bounding hyperrectangles are discontinuous
on the space filling curve

From Massive Scale in the Cloud to
GeoWave Embedded in the Client
With a single interface, you can use both!
An example analysis tool requiring GeoWave multi-dimensional indexing for map, timeline,
and graph search and visualization of massive datasets

Use a Space Filling Curve
(SFC) to impose multi-
dimensional data.

Made Missed

richard.fecher@radiantsolutions.com
barry.bragg@radiantsolutions.com

More Related Content

More from DataWorks Summit

Recently, Apache Phoenix has been integrated with Apache (incubator) Omid transaction processing service, to provide ultra-high system throughput with ultra-low latency overhead. Phoenix has been shown to scale beyond 0.5M transactions per second with sub-5ms latency for short transactions on industry-standard hardware. On the other hand, Omid has been extended to support secondary indexes, multi-snapshot SQL queries, and massive-write transactions. These innovative features make Phoenix an excellent choice for translytics applications, which allow converged transaction processing and analytics. We share the story of building the next-gen data tier for advertising platforms at Verizon Media that exploits Phoenix and Omid to support multi-feed real-time ingestion and AI pipelines in one place, and discuss the lessons learned.

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

DataWorks Summit

Cybersecurity requires an organization to collect data, analyze it, and alert on cyber anomalies in near real-time. This is a challenging endeavor when considering the variety of data sources which need to be collected and analyzed. Everything from application logs, network events, authentications systems, IOT devices, business events, cloud service logs, and more need to be taken into consideration. In addition, multiple data formats need to be transformed and conformed to be understood by both humans and ML/AI algorithms. To solve this problem, the Aetna Global Security team developed the Unified Data Platform based on Apache NiFi, which allows them to remain agile and adapt to new security threats and the onboarding of new technologies in the Aetna environment. The platform currently has over 60 different data flows with 95% doing real-time ETL and handles over 20 billion events per day. In this session learn from Aetna’s experience building an edge to AI high-speed data pipeline with Apache NiFi.

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

DataWorks Summit

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Supporting Apache HBase : Troubleshooting and Supportability Improvements

DataWorks Summit

In the healthcare sector, data security, governance, and quality are crucial for maintaining patient privacy and ensuring the highest standards of care. At Florida Blue, the leading health insurer of Florida serving over five million members, there is a multifaceted network of care providers, business users, sales agents, and other divisions relying on the same datasets to derive critical information for multiple applications across the enterprise. However, maintaining consistent data governance and security for protected health information and other extended data attributes has always been a complex challenge that did not easily accommodate the wide range of needs for Florida Blue’s many business units. Using Apache Ranger, we developed a federated Identity & Access Management (IAM) approach that allows each tenant to have their own IAM mechanism. All user groups and roles are propagated across the federation in order to determine users’ data entitlement and access authorization; this applies to all stages of the system, from the broadest tenant levels down to specific data rows and columns. We also enabled audit attributes to ensure data quality by documenting data sources, reasons for data collection, date and time of data collection, and more. In this discussion, we will outline our implementation approach, review the results, and highlight our “lessons learned.”

Security Framework for Multitenant Architecture

Security Framework for Multitenant Architecture

Security Framework for Multitenant Architecture

DataWorks Summit

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores. With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.

Presto: Optimizing Performance of SQL-on-Anything Engine

Presto: Optimizing Performance of SQL-on-Anything Engine

Presto: Optimizing Performance of SQL-on-Anything Engine

DataWorks Summit

Specialized tools for machine learning development and model governance are becoming essential. MlFlow is an open source platform for managing the machine learning lifecycle. Just by adding a few lines of code in the function or script that trains their model, data scientists can log parameters, metrics, artifacts (plots, miscellaneous files, etc.) and a deployable packaging of the ML model. Every time that function or script is run, the results will be logged automatically as a byproduct of those lines of code being added, even if the party doing the training run makes no special effort to record the results. MLflow application programming interfaces (APIs) are available for the Python, R and Java programming languages, and MLflow sports a language-agnostic REST API as well. Over a relatively short time period, MLflow has garnered more than 3,300 stars on GitHub , almost 500,000 monthly downloads and 80 contributors from more than 40 companies. Most significantly, more than 200 companies are now using MLflow. We will demo MlFlow Tracking , Project and Model components with Azure Machine Learning (AML) Services and show you how easy it is to get started with MlFlow on-prem or in the cloud.

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

DataWorks Summit

Twitter's Data Platform is built using multiple complex open source and in house projects to support Data Analytics on hundreds of petabytes of data. Our platform support storage, compute, data ingestion, discovery and management and various tools and libraries to help users for both batch and realtime analytics. Our DataPlatform operates on multiple clusters across different data centers to help thousands of users discover valuable insights. As we were scaling our Data Platform to multiple clusters, we also evaluated various cloud vendors to support use cases outside of our data centers. In this talk we share our architecture and how we extend our data platform to use cloud as another datacenter. We walk through our evaluation process, challenges we faced supporting data analytics at Twitter scale on cloud and present our current solution. Extending Twitter's Data platform to cloud was complex task which we deep dive in this presentation.

Extending Twitter's Data Platform to Google Cloud

Extending Twitter's Data Platform to Google Cloud

Extending Twitter's Data Platform to Google Cloud

DataWorks Summit

At Comcast, our team has been architecting a customer experience platform which is able to react to near-real-time events and interactions and deliver appropriate and timely communications to customers. By combining the low latency capabilities of Apache Flink and the dataflow capabilities of Apache NiFi we are able to process events at high volume to trigger, enrich, filter, and act/communicate to enhance customer experiences. Apache Flink and Apache NiFi complement each other with their strengths in event streaming and correlation, state management, command-and-control, parallelism, development methodology, and interoperability with surrounding technologies. We will trace our journey from starting with Apache NiFi over three years ago and our more recent introduction of Apache Flink into our platform stack to handle more complex scenarios. In this presentation we will compare and contrast which business and technical use cases are best suited to which platform and explore different ways to integrate the two platforms into a single solution.

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

DataWorks Summit

Companies are increasingly moving to the cloud to store and process data. One of the challenges companies have is in securing data across hybrid environments with easy way to centrally manage policies. In this session, we will talk through how companies can use Apache Ranger to protect access to data both in on-premise as well as in cloud environments. We will go into details into the challenges of hybrid environment and how Ranger can solve it. We will also talk through how companies can further enhance the security by leveraging Ranger to anonymize or tokenize data while moving into the cloud and de-anonymize dynamically using Apache Hive, Apache Spark or when accessing data from cloud storage systems. We will also deep dive into the Ranger’s integration with AWS S3, AWS Redshift and other cloud native systems. We will wrap it up with an end to end demo showing how policies can be created in Ranger and used to manage access to data in different systems, anonymize or de-anonymize data and track where data is flowing.

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

DataWorks Summit

Advanced Big Data Processing frameworks have been proposed to harness the fast data transmission capability of Remote Direct Memory Access (RDMA) over high-speed networks such as InfiniBand, RoCEv1, RoCEv2, iWARP, and OmniPath. However, with the introduction of the Non-Volatile Memory (NVM) and NVM express (NVMe) based SSD, these designs along with the default Big Data processing models need to be re-assessed to discover the possibilities of further enhanced performance. In this talk, we will present, NRCIO, a high-performance communication runtime for non-volatile memory over modern network interconnects that can be leveraged by existing Big Data processing middleware. We will show the performance of non-volatile memory-aware RDMA communication protocols using our proposed runtime and demonstrate its benefits by incorporating it into a high-performance in-memory key-value store, Apache Hadoop, Tez, Spark, and TensorFlow. Evaluation results illustrate that NRCIO can achieve up to 3.65x performance improvement for representative Big Data processing workloads on modern data centers.

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

DataWorks Summit

Background: Some early applications of Computer Vision in Retail arose from e-commerce use cases - but increasingly, it is being used in physical stores in a variety of new and exciting ways, such as: ● Optimizing merchandising execution, in-stocks and sell-thru ● Enhancing operational efficiencies, enable real-time customer engagement ● Enhancing loss prevention capabilities, response time ● Creating frictionless experiences for shoppers Abstract: This talk will cover the use of Computer Vision in Retail, the implications to the broader Consumer Goods industry and share business drivers, use cases and benefits that are unfolding as an integral component in the remaking of an age-old industry. We will also take a ‘peek under the hood’ of Computer Vision and Deep Learning, sharing technology design principles and skill set profiles to consider before starting your CV journey. Deep learning has matured considerably in the past few years to produce human or superhuman abilities in a variety of computer vision paradigms. We will discuss ways to recognize these paradigms in retail settings, collect and organize data to create actionable outcomes with the new insights and applications that deep learning enables. We will cover the basics of object detection, then move into the advanced processing of images describing the possible ways that a retail store of the near future could operate. Identifying various storefront situations by having a deep learning system attached to a camera stream. Such things as; identifying item stocks on shelves, a shelf in need of organization, or perhaps a wandering customer in need of assistance. We will also cover how to use a computer vision system to automatically track customer purchases to enable a streamlined checkout process, and how deep learning can power plausible wardrobe suggestions based on what a customer is currently wearing or purchasing. Finally, we will cover the various technologies that are powering these applications today. Deep learning tools for research and development. Production tools to distribute that intelligence to an entire inventory of all the cameras situation around a retail location. Tools for exploring and understanding the new data streams produced by the computer vision systems. By the end of this talk, attendees should understand the impact Computer Vision and Deep Learning are having in the Consumer Goods industry, key use cases, techniques and key considerations leaders are exploring and implementing today.

Computer Vision: Coming to a Store Near You

Computer Vision: Coming to a Store Near You

Computer Vision: Coming to a Store Near You

DataWorks Summit

Whole genome shotgun based next generation transcriptomics and metagenomics studies often generate 100 to 1000 gigabytes (GB) sequence data derived from tens of thousands of different genes or microbial species. De novo assembling these data requires an ideal solution that both scales with data size and optimizes for individual gene or genomes. Here we developed an Apache Spark-based scalable sequence clustering application, SparkReadClust (SpaRC), that partitions the reads based on their molecule of origin to enable downstream assembly optimization. SpaRC produces high clustering performance on transcriptomics and metagenomics test datasets from both short read and long read sequencing technologies. It achieved a near linear scalability with respect to input data size and number of compute nodes. SpaRC can run on different cloud computing environments without modifications while delivering similar performance. In summary, our results suggest SpaRC provides a scalable solution for clustering billions of reads from the next-generation sequencing experiments, and Apache Spark represents a cost-effective solution with rapid development/deployment cycles for similar big data genomics problems.

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

DataWorks Summit

The Census Bureau is the U.S. government's largest statistical agency with a mission to provide current facts and figures about America's people, places and economy. The Bureau operates a large number of surveys to collect this data, the most well known being the decennial population census. Data is being collected in increasing volumes and the analytics solutions must be able to scale to meet the ever increasing needs while maintaining the confidentiality of the data. Past data analytics have occurred in processing silos inhibiting the sharing of information and common reference data is replicated across multiple system. The use of the Hortonworks Data Platform, Hortonworks Data Flow and other open-source technologies is enabling the creation of a cloud-based enterprise data lake and analytics platform. Cloud object stores are used to provide scalable data storage and cloud compute supports permanent and transient clusters. Data governance tools are used to track the data lineage and to provide access controls to sensitive data.

Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...

Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...

Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...

DataWorks Summit

Knowledge graphs (KGs) have recently emerged as a powerful way to represent knowledge in multiple communities, including data mining, natural language processing and machine learning. Large-scale KGs like Wikidata and DBpedia are openly available, while in industry, the Google Knowledge Graph is a good example of proprietary knowledge that continues to fuel impressive advances in Google's semantic search capabilities. Yet, both crowdsourced and automatically constructed KGs suffer from noise, both during KG construction and during search and inference. In this talk, I will discuss how to build and use such knowledge graphs effectively, despite the noise and sparsity of labeled data, to solve real-world social problems such as providing insights in disaster situations, and helping law enforcement fight human trafficking. I will conclude by providing insight on the lessons learned, and the applicability of research techniques to industrial problems. The talk will be designed to appeal both to business and technical leaders.

Applying Noisy Knowledge Graphs to Real Problems

Applying Noisy Knowledge Graphs to Real Problems

Applying Noisy Knowledge Graphs to Real Problems

DataWorks Summit

A city is a system of systems with independent objectives and governance for transportation, energy, healthcare, safety, security, and infrastructure. A smart city relies on data to be the connectivity between independent functions, and open data to be the building blocks for citizen-centered design, inclusion, and sustainability. Big Data is not about size – it is about finding new life-changing and transformational opportunities using data. From Smart Mobility and Smart Energy to improved Public Health, Safety, & Governance – this session will discuss how cities are delivering better citizen services leveraging open source technology with a consistent governance and security framework that spans the data center and the public clouds. Data integration is the key to ensuring that a city’s attempts to become an intelligent system of systems doesn’t result in a system of silos. A single view requires the capability to integrate transactional data from traditional data stores with person generated data, unstructured data, and machine sensor (IoT) data. The key to managing such a range of data is a capability that allows for both scaling analytic workloads and the preservation of detailed data with unexplored value, as both are vital to future growth potential. Key Takeaways: Understand the common use cases that tier 1, tier 2, and emerging cities are undertaking to deliver tactical results and progress towards policy objectives. Understand the role of a shared catalog, unified security and consistent governance in building a secure, trusted, and connected capability.

Open Source, Open Data: Driving Innovation in Smart Cities

Open Source, Open Data: Driving Innovation in Smart Cities

Open Source, Open Data: Driving Innovation in Smart Cities

DataWorks Summit

In the current digital world, Enterprises are drowning under the weight of data that are required to store for customers, for corporate analysis, and for the business forecast. With the convergence of cloud, IoT, and big data technologies, data lakes are becoming the critical fuel for enterprise-wide digital transformations which are proven to be cost-effective, self-service with elastic in nature. This enterprise data is spread widely across numerous clusters and repositories residing in both the companies data centers and multiple cloud locations posing a new “data protection” problem in hybrid environments. Protecting data is very critical as part of every business continuity plan because data loss or corruption may have a huge impact on enterprise survival. Protecting data is more challenging than ever in a complex hybrid enterprise data lake environments since we need to answer questions such as - How do we move data seamlessly between enterprise data centers and cloud? - How to secure enterprise data that resides in different locations with multiple authorization policies? - How do we protect data from natural or accidental disasters to ensure operational continuity? Not having immediate answers to these questions makes it very difficult for business users and platform operators to do their jobs in protecting data in hybrid enterprise data lake environments. Therefore enterprises require a unified data protection orchestration platform which seamlessly protects the data across multiple environments. In this talk, we will address the above challenges faced by enterprises using Apache Hadoop, Apache Hive, Apache Ranger and Apache Atlas. We will outline using a unified open source orchestration platform how, - You can protect mission-critical data along with their security and governance policies across multiple data lakes and change data capture works using Apache Hadoop, Apache Hive, Apache Ranger and Apache Atlas. - You can monitor replication jobs and metric collections associated with the replicated data across hybrid enterprise data lake environments. We will also showcase, - How to seamlessly replicate HDFS data, Hive databases between Hortonworks clusters securely along with Apache Ranger policies and Apache Atlas metadata. - How to securely move the data between on-premise clusters and cloud storages.

Data Protection in Hybrid Enterprise Data Lake Environment

Data Protection in Hybrid Enterprise Data Lake Environment

Data Protection in Hybrid Enterprise Data Lake Environment

DataWorks Summit

Join the presenters from Meharry Medical College as they present an overview of the technologies that support the Data Science Institute for the advancement of medical training, Data Science and medical research. In 2016 the president of Meharry Medical College initiated a program to establish the Data Science Institute that would concentrate on programs to support the underserved population of Nashville, Tennessee. As a rule, Medical Colleges that serve this population are usually the last on the list to avail themselves of advanced technology. Meharry Medical College through an innovative approach was able to leapfrog their better funded peers by utilizing and applying the same opensource technologies used by Google, Facebook, Twitter, LinkedIn and Yahoo. The technical infrastructure, academic programs, research and the applied data science use cases will be discussed.

Big Data Technologies in Support of a Medical School Data Science Institute

Big Data Technologies in Support of a Medical School Data Science Institute

Big Data Technologies in Support of a Medical School Data Science Institute

DataWorks Summit

Hadoop was born much earlier than the Cloud Native era. But the question is still the same: what can it offer in the time of Kubernetes, containerization and hybrid clouds? Apache Hadoop Ozone is a new subproject of Hadoop. It has a generic low-level binary layer, the Hadoop Distributed Data Storage (HDDS) and a S3 compatible Object Store implementation on top of it. But the HDDS data storage layer is not just for the object store. It could be used for multiple purposes: to enhance the scalability the HDFS or provide block level access to the managed storage space. With this approach the same Hadoop Ozone cluster could provide hadoop file system based storage, object store space and block level storage. Storage is still a hot topic with Kubernetes and in Cloud Native environments. Container Storage Interface specification is a vendor neutral standard to provide storage plugin for multiple container orchestration system. Quadra provides block level access on top of the Hadoop Distributed Data Storage layer and it’s first class citizen of the containerized word. It implements the Container Storage Interface and can work as a Kubernetes dynamic volume provisioner. In this talk we will demonstrate how the Hadoop Ozone storage could be used from containers. We will explain the basic storage type of Kubernetes clusters and show how Hadoop Ozone and Quadra could help to solve the storage problem in an industry standard way.

Hadoop Storage in the Cloud Native Era

Hadoop Storage in the Cloud Native Era

Hadoop Storage in the Cloud Native Era

DataWorks Summit

A couple of thousands of servers for a big data system is also a big investment. Microsoft Bing has figured out a way to fulfill our needs without signing a huge check. We have the technology to harvest spare cycles on underutilized servers. And we tweak the configurations in Hadoop and Spark to fit the flexible capacity base. We have saved hundreds of millions of dollars per year. Bing is adopting open source big data technologies for our offline data processing system. It requires a massive amount of capacity, which implies a significant bill. With collaboration with Windows and Azure, Research teams, we can harvest most of the needed capacity from our existing server fleet. We make use of the capacity on reserve servers while keeping them instantly available for emergency use; we allocate compute and storage to servers when they are not fully occupied. We updated Hadoop node decommission, HDFS block placement, YARN node label mapping, and a few other policies so that they can adapt to the capacity that is even less reliable than commodity servers. We brought open source capacity to Bing product with less than 1 percent of the cost we had done it through normal approach. We also extend the YARN and Spark framework to better fit the need of deep learning training and inferencing workloads in our system. This extension is equipping Bing with direct questioning and answering type of interactive query features. Big does not mean expensive. The audience can learn about the approach from Bing that they can make better use of their existing servers to do additional big data systems.

Free Servers to Build Big Data System on: Bing’s Approach

Free Servers to Build Big Data System on: Bing’s Approach

Free Servers to Build Big Data System on: Bing’s Approach

DataWorks Summit

American Water shares how bringing IoT to fleet management can provide value to the customer. In the utilities industry, fleet management plays a major part in the business. The front line is one of the largest parts of the business whether it is the field employees working on mains, or those working on the customers' property. American Water strives to provide the best customer experience and part of that includes improving the effectiveness of our fleet. Currently, there is no insight or active feedback on the effectiveness of the routes or driving behaviors. As a PoC, American Water leveraged NiFi to track metrics against a simulated truck, showing the initial values in capturing this type of data. Technologies: NiFi, Druid, Hive

IoFMT – Internet of Fleet Management Things

IoFMT – Internet of Fleet Management Things

IoFMT – Internet of Fleet Management Things

DataWorks Summit

More from DataWorks Summit (20)

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Supporting Apache HBase : Troubleshooting and Supportability Improvements

Security Framework for Multitenant Architecture

Security Framework for Multitenant Architecture

Security Framework for Multitenant Architecture

Presto: Optimizing Performance of SQL-on-Anything Engine

Presto: Optimizing Performance of SQL-on-Anything Engine

Presto: Optimizing Performance of SQL-on-Anything Engine

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...

Extending Twitter's Data Platform to Google Cloud

Extending Twitter's Data Platform to Google Cloud

Extending Twitter's Data Platform to Google Cloud

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Event-Driven Messaging and Actions using Apache Flink and Apache NiFi

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

Computer Vision: Coming to a Store Near You

Computer Vision: Coming to a Store Near You

Computer Vision: Coming to a Store Near You

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...

Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...

Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...

Applying Noisy Knowledge Graphs to Real Problems

Applying Noisy Knowledge Graphs to Real Problems

Applying Noisy Knowledge Graphs to Real Problems

Open Source, Open Data: Driving Innovation in Smart Cities

Open Source, Open Data: Driving Innovation in Smart Cities

Open Source, Open Data: Driving Innovation in Smart Cities

Data Protection in Hybrid Enterprise Data Lake Environment

Data Protection in Hybrid Enterprise Data Lake Environment

Data Protection in Hybrid Enterprise Data Lake Environment

Big Data Technologies in Support of a Medical School Data Science Institute

Big Data Technologies in Support of a Medical School Data Science Institute

Big Data Technologies in Support of a Medical School Data Science Institute

Hadoop Storage in the Cloud Native Era

Hadoop Storage in the Cloud Native Era

Hadoop Storage in the Cloud Native Era

Free Servers to Build Big Data System on: Bing’s Approach

Free Servers to Build Big Data System on: Bing’s Approach

Free Servers to Build Big Data System on: Bing’s Approach

IoFMT – Internet of Fleet Management Things

IoFMT – Internet of Fleet Management Things

IoFMT – Internet of Fleet Management Things

Recently uploaded

MySQL Webinar, presented on the 25th of April, 2024. Summary: MySQL solutions enable the deployment of diverse Database Architectures tailored to specific needs, including High Availability, Disaster Recovery, and Read Scale-Out. With MySQL Shell's AdminAPI, administrators can seamlessly set up, manage, and monitor these solutions, ensuring efficiency and ease of use in their administration. MySQL Router, on the other hand, provides transparent routing from the application traffic to the backend servers in the architectures, requiring minimal configuration. Completely built in-house and supported by Oracle, these solutions have been adopted by enterprises of all sizes for their business-critical applications. In this presentation, we'll delve into various database architecture solutions to help you choose the right one based on your business requirements. Focusing on technical details and the latest features to maximize the potential of these solutions.

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Whatsapp Number Escorts Call girls 8617370543 Available 24x7 Navi Mumbai Call Girls Service Offer Genuine VIP Model Escorts Call Girls in Your Budget. Navi Mumbai Call Girls Service Provide Real Call Girls Number. Make Your Sexual Pleasure Memorable with Our Navi Mumbai Call Girls at Affordable Price. Top VIP Escorts Call Girls, High Profile Independent Escorts Call Girls, Housewife Women Escorts Call Girl, College Girls Escorts Call Girls, Russian Escorts Call girls Service in Your Budget.

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Axa Assurance Maroc - Insurer Innovation Award 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

The Digital Insurer

The value of a flexible API Management solution for Open Banking Steve Melan, Manager for IT Innovation and Architecture - State's and Saving's Bank of Luxembourg Apidays New York 2024: The API Economy in the AI Era (April 30 & May 1, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The value of a flexible API Management solution for O...

In the thrilling conclusion to 2023, ransomware groups had a banner year, really outdoing themselves in the "make everyone's life miserable" department. LockBit 3.0 took gold in the hacking olympics, followed by the plucky upstarts Clop and ALPHV/BlackCat. Apparently, 48% of organizations were feeling left out and decided to get in on the cyber attack action. Business services won the "most likely to get digitally mugged" award, with education and retail nipping at their heels. Hackers expanded their repertoire beyond boring old encryption to the much more exciting world of extortion. The US, UK and Canada took top honors in the "countries most likely to pay up" category. Bitcoins were the currency of choice for discerning hackers, because who doesn't love untraceable money?

Ransomware_Q4_2023. The report. [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

Overkill Security

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Building Digital Trust in a Digital Economy Veronica Tan, Director - Cyber Security Agency of Singapore Apidays Singapore 2024: Connecting Customers, Business and Technology (April 17 & 18, 2024) ------ Check out our conferences at https://www.apidays.global/ Do you want to sponsor or talk at one of our conferences? https://apidays.typeform.com/to/ILJeAaV8 Learn more on APIscene, the global media made by the community for the community: https://www.apiscene.io Explore the API ecosystem with the API Landscape: https://apilandscape.apiscene.io/

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Strategies for Landing an Oracle DBA Job as a Fresher

Strategies for Landing an Oracle DBA Job as a Fresher

Strategies for Landing an Oracle DBA Job as a Fresher

Remote DBA Services

GenAI Risks & Security Meetup 01052024.pdf

GenAI Risks & Security Meetup 01052024.pdf

GenAI Risks & Security Meetup 01052024.pdf

AXA XL - Insurer Innovation Award Americas 2024

AXA XL - Insurer Innovation Award Americas 2024

AXA XL - Insurer Innovation Award Americas 2024

The Digital Insurer

Created by Mozilla Research in 2012 and now part of Linux Foundation Europe, the Servo project is an experimental rendering engine written in Rust. It combines memory safety and concurrency to create an independent, modular, and embeddable rendering engine that adheres to web standards. Stewardship of Servo moved from Mozilla Research to the Linux Foundation in 2020, where its mission remains unchanged. After some slow years, in 2023 there has been renewed activity on the project, with a roadmap now focused on improving the engine’s CSS 2 conformance, exploring Android support, and making Servo a practical embeddable rendering engine. In this presentation, Rakhi Sharma reviews the status of the project, our recent developments in 2023, our collaboration with Tauri to make Servo an easy-to-use embeddable rendering engine, and our plans for the future to make Servo an alternative web rendering engine for the embedded devices industry. (c) Embedded Open Source Summit 2024 April 16-18, 2024 Seattle, Washington (US) https://events.linuxfoundation.org/embedded-open-source-summit/ https://ossna2024.sched.com/event/1aBNF/a-year-of-servo-reboot-where-are-we-now-rakhi-sharma-igalia

A Year of the Servo Reboot: Where Are We Now?

A Year of the Servo Reboot: Where Are We Now?

A Year of the Servo Reboot: Where Are We Now?

Artificial Intelligence Chap.5 : Uncertainty

Artificial Intelligence Chap.5 : Uncertainty

Artificial Intelligence Chap.5 : Uncertainty

Khushali Kathiriya

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Juan lago vázquez

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

As privacy and data protection regulations evolve rapidly, organizations operating in multiple jurisdictions face mounting challenges to ensure compliance and safeguard customer data. With state-specific privacy laws coming up in multiple states this year, it is essential to understand what their unique data protection regulations will require clearly. How will data privacy evolve in the US in 2024? How to stay compliant? Our panellists will guide you through the intricacies of these states' specific data privacy laws, clarifying complex legal frameworks and compliance requirements. This webinar will review: - The essential aspects of each state's privacy landscape and the latest updates - Common compliance challenges faced by organizations operating in multiple states and best practices to achieve regulatory adherence - Valuable insights into potential changes to existing regulations and prepare your organization for the evolving landscape

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Real Time Object Detection Using Open CV

Real Time Object Detection Using Open CV

Real Time Object Detection Using Open CV

MINDCTI Revenue Release Quarter One 2024

MINDCTI Revenue Release Quarter One 2024

MINDCTI Revenue Release Quarter One 2024

Webinar Recording: https://www.panagenda.com/webinars/why-teams-call-analytics-is-critical-to-your-entire-business Nothing is as frustrating and noticeable as being in an important call and being unable to see or hear the other person. Not surprising then, that issues with Teams calls are among the most common problems users call their helpdesk for. Having in depth insight into everything relevant going on at the user’s device, local network, ISP and Microsoft itself during the call is crucial for good Microsoft Teams Call quality support. To ensure a quick and adequate solution and to ensure your users get the most out of their Microsoft 365. But did you know that ‘bad calls’ are also an excellent indicator of other problems arising? Precisely because it is so noticeable!? Like the canary in the mine, bad calls can be early indicators of problems. Problems that might otherwise not have been noticed for a while but can have a big impact on productivity and satisfaction. Join this session by Christoph Adler to learn how true Microsoft Teams call quality analytics helped other organizations troubleshoot bad calls and identify and fix problems that impacted Teams calls or the use of Microsoft365 in general. See what it can do to keep your users happy and productive! In this session we will cover - Why CQD data alone is not enough to troubleshoot call problems - The importance of attributing call problems to the right call participant - What call quality analytics can do to help you quickly find, fix-, and prevent problems - Why having retrospective detailed insights matters - Real life examples of how others have used Microsoft Teams call quality monitoring to problem shoot problems with their ISP, network, device health and more.

Why Teams call analytics are critical to your entire business

Why Teams call analytics are critical to your entire business

Why Teams call analytics are critical to your entire business

Manulife - Insurer Transformation Award 2024

Manulife - Insurer Transformation Award 2024

Manulife - Insurer Transformation Award 2024

The Digital Insurer

Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows. We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases. This video focuses on the deployment of external web forms using Jotform for Bonterra Impact Management. This solution can be customized to your organization’s needs and deployed to support the common use cases below: - Intake and consent - Assessments - Surveys - Applications - Program registration Interested in deploying web form automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Jeffrey Haguewood

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model

Axa Assurance Maroc - Insurer Innovation Award 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

Axa Assurance Maroc - Insurer Innovation Award 2024

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The value of a flexible API Management solution for O...

Apidays New York 2024 - The value of a flexible API Management solution for O...

Ransomware_Q4_2023. The report. [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

Ransomware_Q4_2023. The report. [EN].pdf

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Strategies for Landing an Oracle DBA Job as a Fresher

Strategies for Landing an Oracle DBA Job as a Fresher

Strategies for Landing an Oracle DBA Job as a Fresher

GenAI Risks & Security Meetup 01052024.pdf

GenAI Risks & Security Meetup 01052024.pdf

GenAI Risks & Security Meetup 01052024.pdf

AXA XL - Insurer Innovation Award Americas 2024

AXA XL - Insurer Innovation Award Americas 2024

AXA XL - Insurer Innovation Award Americas 2024

A Year of the Servo Reboot: Where Are We Now?

A Year of the Servo Reboot: Where Are We Now?

A Year of the Servo Reboot: Where Are We Now?

Artificial Intelligence Chap.5 : Uncertainty

Artificial Intelligence Chap.5 : Uncertainty

Artificial Intelligence Chap.5 : Uncertainty

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

Real Time Object Detection Using Open CV

Real Time Object Detection Using Open CV

Real Time Object Detection Using Open CV

MINDCTI Revenue Release Quarter One 2024

MINDCTI Revenue Release Quarter One 2024

MINDCTI Revenue Release Quarter One 2024

Why Teams call analytics are critical to your entire business

Why Teams call analytics are critical to your entire business

Why Teams call analytics are critical to your entire business

Manulife - Insurer Transformation Award 2024

Manulife - Insurer Transformation Award 2024

Manulife - Insurer Transformation Award 2024

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...

GeoWave: Scaling Complex (Not Just Geo) Data

1. Barry Bragg Maxar Technologies Rich Fecher Maxar Technologies

2. An open source framework that leverages the scalability of key-value stores for effective storage, retrieval, and analysis of massive geospatial datasets

3. At its core, GeoWave handles spatial and spatiotemporal indexing within distributed key- value stores with natural integrations for various popular frameworks popular geospatial platforms distributed processing frameworks GeoWave bridges the gap between and

5. Use a Space Filling Curve (SFC) to impose multi- dimensional data.

6. Z-Order Hilbert H-order Peano AR2W2 BΩ WL∞ ∞ 6 4 8 5.40 5.00 WL2 ∞ 6 4 8 6.04 5.00 WL1 ∞ 9 8 10.66 12.00 9.00 WBA ∞ 2.40 3.00 2.00 3.05 2.22 ABA 2.86 1.41 1.69 1.42 1.47 1.40 Haverkort, Walderveen Locality and Bounding-Box Quality of Two-Dimensional Space-Filling Curves 2008 arXiv:0806.4787v2 Average Total Bounding Box Area (ABA)Worst Case Dilation Worst Case Bounding Box Area Ratio (WBA)

7. ● What about data with extents such as lines/polys or time ranges? ○ We need to represent multiple resolutions... ● What about unbounded dimensions? ○ We can define a periodicity to bound a single SFC. We end up with an SFC per period (or combination of periods). ● What about queries? ○ Bounding hyperrectangles are discontinuous on the space filling curve

8.

9. From Massive Scale in the Cloud to GeoWave Embedded in the Client With a single interface, you can use both! An example analysis tool requiring GeoWave multi-dimensional indexing for map, timeline, and graph search and visualization of massive datasets

10.

11. Use a Space Filling Curve (SFC) to impose multi- dimensional data.

12. Made Missed

13.

14.

15. richard.fecher@radiantsolutions.com barry.bragg@radiantsolutions.com