In my group at Microsoft, we have worked with the United Nations, Guide Dogs for the Blind in the UK, and Ströer in Germany on a number of projects involving high scale data.
In this talk, I'll share some of the best practices and patterns that have come out of those experiences: best practices for storing and indexing geospatial data at scale, incremental ingestion and slice processing of the data, efficiently building and presenting progressive levels of detail on a web and mobile.
The audience will walk away with an understanding of how to efficiently summarize data over a geographic area, general methods for doing incremental updates to large scale datasets with Apache Spark, and best practices around precomputing high scale frontend data views.
This document provides information from a surveying camp, including:
1. Coordinates of existing buildings that were determined through field observations.
2. A table of leveling observations and reduced levels taken along a ground profile.
3. Graphs plotting the building coordinates, ground profile, and layout of a simple circular curve determined through deflection angle calculations.
- The document provides a snapshot of various Indian stock market indices as of 18-Nov-2013 at 4:00 PM, including the S&P BSE SENSEX (30 stocks), Nifty 50 (CNX NIFTY), Bank Nifty, and other sectoral indices.
- It lists each index's current value, day's high and low, change from the previous day's close in both absolute and percentage terms, and other statistical values like 52-week highs and lows.
- Most indices saw gains over the previous day's close, with the S&P Bankex up 3.15% and Bank Nifty rising 3.06%, reflecting strength in the banking sector.
This document contains a list of 33 icons with their corresponding ground coordinates in northing, easting and elevation. All icons have an elevation code of E1.
This document contains data points that define the profile of a NACA four-digit airfoil, including its maximum camber, position of maximum camber, thickness, and coordinates for 25 data points each on the upper and lower surfaces. It notes that cosine values were used to generate evenly spaced x-coordinates along the chord and explains that changing the number of data points would require manually adding or removing rows.
- The document provides a snapshot of various Indian stock market indices as of 06-Nov-2013 including the S&P BSE SENSEX, NSE Nifty 50, and other sectoral indices. It lists the current value, day's high and low, previous close, change in points and percentage for each index.
- It also provides the 52-week high and low values and price-earnings ratio for each index to give context on their current performance and valuation.
- The indices cover various sectors of the Indian economy like banks, automobiles, IT, healthcare, infrastructure, metals, and small/mid cap companies. This snapshot offers a holistic view of the movement of the overall and sector
The document presents an economic feasibility study comparing two alternatives for Highway 640:
1) A traditional highway design
2) A sustainable highway design
It provides initial cost estimates and annual operation and maintenance cost projections for 30 years for each alternative. The sustainable highway design has higher initial costs but lower projected annual costs, resulting in it being the recommended option.
This document is a polar graph paper template with degree markings from 0 to 360 degrees in 30 degree increments radiating from the center point outward. The polar graph paper can be used for plotting data or functions in a polar coordinate system.
The document discusses real-time machine learning using the Lambda architecture. It describes the need for models that can learn incrementally from streaming data and remain accurate over time. The Lambda architecture is introduced as having a speed layer for real-time processing, a serving layer to query current and batch views, and a batch layer for immutable datasets. Mahout is described as an Apache library for scalable machine learning like recommendation, clustering, and classification using Hadoop. Basic recommendation algorithms are covered along with use cases like e-commerce personalization, fraud detection, and media metadata generation.
This document provides information from a surveying camp, including:
1. Coordinates of existing buildings that were determined through field observations.
2. A table of leveling observations and reduced levels taken along a ground profile.
3. Graphs plotting the building coordinates, ground profile, and layout of a simple circular curve determined through deflection angle calculations.
- The document provides a snapshot of various Indian stock market indices as of 18-Nov-2013 at 4:00 PM, including the S&P BSE SENSEX (30 stocks), Nifty 50 (CNX NIFTY), Bank Nifty, and other sectoral indices.
- It lists each index's current value, day's high and low, change from the previous day's close in both absolute and percentage terms, and other statistical values like 52-week highs and lows.
- Most indices saw gains over the previous day's close, with the S&P Bankex up 3.15% and Bank Nifty rising 3.06%, reflecting strength in the banking sector.
This document contains a list of 33 icons with their corresponding ground coordinates in northing, easting and elevation. All icons have an elevation code of E1.
This document contains data points that define the profile of a NACA four-digit airfoil, including its maximum camber, position of maximum camber, thickness, and coordinates for 25 data points each on the upper and lower surfaces. It notes that cosine values were used to generate evenly spaced x-coordinates along the chord and explains that changing the number of data points would require manually adding or removing rows.
- The document provides a snapshot of various Indian stock market indices as of 06-Nov-2013 including the S&P BSE SENSEX, NSE Nifty 50, and other sectoral indices. It lists the current value, day's high and low, previous close, change in points and percentage for each index.
- It also provides the 52-week high and low values and price-earnings ratio for each index to give context on their current performance and valuation.
- The indices cover various sectors of the Indian economy like banks, automobiles, IT, healthcare, infrastructure, metals, and small/mid cap companies. This snapshot offers a holistic view of the movement of the overall and sector
The document presents an economic feasibility study comparing two alternatives for Highway 640:
1) A traditional highway design
2) A sustainable highway design
It provides initial cost estimates and annual operation and maintenance cost projections for 30 years for each alternative. The sustainable highway design has higher initial costs but lower projected annual costs, resulting in it being the recommended option.
This document is a polar graph paper template with degree markings from 0 to 360 degrees in 30 degree increments radiating from the center point outward. The polar graph paper can be used for plotting data or functions in a polar coordinate system.
The document discusses real-time machine learning using the Lambda architecture. It describes the need for models that can learn incrementally from streaming data and remain accurate over time. The Lambda architecture is introduced as having a speed layer for real-time processing, a serving layer to query current and batch views, and a batch layer for immutable datasets. Mahout is described as an Apache library for scalable machine learning like recommendation, clustering, and classification using Hadoop. Basic recommendation algorithms are covered along with use cases like e-commerce personalization, fraud detection, and media metadata generation.
PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...Swagatam Mitra
Multiple-criteria decision-making (MCDM) or multiple-criteria decision analysis (MCDA) is a sub-discipline of operations research that explicitly evaluates multiple conflicting criteria in decision making (both in daily life or in professional settings). Conflicting criteria are typical in evaluating options: cost or price is usually one of the main criteria, and some measure of quality is typically another criterion, easily in conflict with the cost. In order to survive in the present day global competitive environment, it now becomes essential for the manufacturing organisations to take timely and accurate decisions regarding effective use of their scarce resources. Various multicriteria decision-making (MCDM) methods are now available to help those organisations in choosing the best decisive course of actions. In this project work, the applicability of some newly developed MCDM methods will be explored while solving some discrete manufacturing decision making problems. Integrated decision-making framework will also be developed for effective decision-making. Ranking performances of these methods will also be compared. Decision making that deals with several aspects of a finite set of available alternatives in a given situation is often referred to as multi criteria analysis.
The document contains data from surveys measuring student perception and expectation of service quality across 5 dimensions for a university library. It includes tables of data and calculates the gap between perception and expectation for each dimension and individual aspects within each dimension. The dimensions analyzed are tangibles, reliability, responsiveness, assurance, and empathy. Overall, reliability and empathy had the largest negative gaps, while tangibles and responsiveness had smaller negative gaps, indicating the most room for improvement in how the library meets student expectations for those dimensions.
This document discusses how data science involves more than just statistics. It provides examples of how computation can be used to find things to count in text and images, inject context using data from London bike stations and car accident data, change viewpoints such as analyzing data from a supersonic car, and inject new viewpoints like exploring finance portfolio correlations. Computation is a key part of data science that involves techniques beyond just statistics like machine learning, visualization, and other domains.
DecodedConf presentation around processing geospatial data centered around the eight patterns that we've found in working on customer and partner projects at Microsoft.DecodedConf presentation around processing geospatial data centered around the eight patterns that we've found in working on customer and partner projects at Microsoft.
Using amazon machine learning to identify trends in io t data technical 201Amazon Web Services
Internet of Things is creating a tidal wave of new data including events, correlations, business value, and much more. With the proliferation of new data sets, it also introduces more potential issues, errors, and spurious values.
In this session, we will explore using Amazon Machine Learning to analyse and understand the new data collected within your IoT solution. In addition, we will learn how to discover patterns, trends, anomalies, and correlations by demonstrating the capabilities of Amazon Machine Learning and SparkML running on AWS Cloud.
Speaker: Simon Elisha, Solutions Architect, Amazon Web Services
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201Amazon Web Services
Internet of Things is creating a tidal wave of new data including events, correlations, business value, and much more. With the proliferation of new data sets, it also introduces more potential issues, errors, and spurious values.
In this session, we will explore using Amazon Machine Learning to analyse and understand the new data collected within your IoT solution. In addition, we will learn how to discover patterns, trends, anomalies, and correlations by demonstrating the capabilities of Amazon Machine Learning and SparkML running on AWS Cloud.
Speaker: Simon Elisha, Solutions Architect, Amazon Web Services
Sample Calculations for solar rooftop project in Indiadisruptiveenergy
This document provides a financial analysis of a proposed solar PV rooftop project over 25 years. It outlines key steps and parameters to consider, including net metering rules, installation costs, internal rate of return, return on investment, payback period, lifetime savings, and discount rate. Tables show projected annual energy generation, costs, revenues, and financial metrics to help evaluate the project's feasibility and profitability. The analysis aims to help consumers make informed decisions without being misled by inaccurate information from solar companies or officials.
The document provides specifications and performance calculations for a BMW 740li vehicle. It includes:
1) Engine specifications such as displacement, power, torque, and transmission gear ratios.
2) Chassis dimensions for the vehicle such as wheelbase, track, length, width, height, and weight.
3) Calculations of forces, resistance, acceleration, velocity, and brake power at different engine speeds and gear ratios.
4) Graphs showing the relationships between acceleration vs velocity, force vs velocity, torque vs engine speed, and brake power vs engine speed.
5) The performance of the vehicle is analyzed based on the graphs, concluding it can reach 100km/hr in 6-
The document contains a table with various hydraulic parameters including:
- K coefficient
- Angle (radians and degrees)
- Dimensionless parameters involving angle
- Differences between parameters
- Wetted area, wetted perimeter, hydraulic radius, velocity, depth, and diameter.
The table contains values for these parameters for different pipe materials with varying angles and coefficients.
The document appears to contain numerical data across multiple categories including potential threats, equipment types, response times, and other factors. The data includes 77 entries with values ranging from -0.087866 to 0.246748 for the first category. Location or device-based data is also included for some categories.
This document contains information about engineering simulations of fluid reservoirs including:
- Basic plots used to model fluid height over time using the Euler integration method
- A schematic layout of a fluid reservoir showing fluid inflow and outflow
- An equation relating the change in fluid height over time to the inlet and outlet mass flows
- A sample simulation run with results showing fluid height decreasing over time as inlet flow stops and outlet flow continues
Volvo EC55B Compact Excavator Service Repair Manual.pdfbin971209zhou
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch including maximum current and wire specifications.
4) Specifications for the battery disconnector switch including operating voltage.
5) Tightening torque specifications for screws, nuts, and other components.
Volvo EC55B Compact Excavator Service Repair Manualfujdfjjskrtekme
This document provides service information for an excavator including:
1) Locations of key components on the excavator and descriptions of each.
2) Conversion tables for common measurement units of length, area, volume, weight, pressure, temperature and flow rate.
3) Specifications for the start switch, battery disconnector switch, and standard tightening torques for screws and nuts.
Volvo EC55B Compact Excavator Service Repair Manual.pdff8usejkdmdd8i
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch including maximum current and wire specifications.
4) Specifications for the battery disconnector switch including operating voltage.
5) Tightening torques for mounting screws and other components along with standard tightening torques for various screw sizes.
Volvo EC55B Compact Excavator Service Repair Manual.pdffjskemdmmded
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch, including maximum current and wire specifications.
4) Specifications for the battery disconnector switch, which has an operating voltage of 6-36V.
5) Standard tightening torques for screws and nuts of various sizes.
Volvo EC55B Compact Excavator Service Repair Manual.pdffujsekmdd9dik
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch including maximum current and wire specifications.
4) Specifications for the battery disconnector switch including operating voltage.
5) Tightening torque specifications for screws, nuts, and other components.
Volvo EC55B Compact Excavator Service Repair Manual.pdffujsekmd9idd1
This document provides service information for an excavator including:
1) Locations of major components on the excavator and descriptions of each.
2) Conversion tables for common measurement units of length, area, volume, weight, pressure, temperature, flow rate and other units.
3) Specifications for start switches, battery disconnect switches, and standard tightening torques for fasteners.
Volvo EC55B Compact Excavator Service Repair Manual.pdffujsekddmdmdm
This document provides service information for an excavator including:
1) Locations of key components on the excavator and diagrams labeling each part.
2) Conversion tables for common measurement units of length, area, volume, weight, pressure, temperature, flow rate and other units.
3) Specifications for start switches, battery disconnect switches, and standard tightening torques for different screw sizes.
Volvo EC55B Compact Excavator Service Repair Manual.pdffyhsejkdm8u
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch including maximum current and wire specifications.
4) Specifications for the battery disconnector switch including operating voltage.
5) Tightening torques for mounting screws and other components along with standard tightening torques for various screw sizes.
Volvo EC55B Compact Excavator Service Repair Manual.pdfttf99929781
This document provides service information for an excavator including:
1) Locations of key components on the excavator and diagrams labeling each part.
2) Conversion tables for common measurement units of length, area, volume, weight, pressure, temperature, flow rate and other units.
3) Specifications for start switches, battery disconnect switches, and standard tightening torques for bolts and nuts.
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
PROJECT 8th SEM - DEVELOPMENT OF SOME INTEGRATED DECISION-MAKING FRAMEWORK FO...Swagatam Mitra
Multiple-criteria decision-making (MCDM) or multiple-criteria decision analysis (MCDA) is a sub-discipline of operations research that explicitly evaluates multiple conflicting criteria in decision making (both in daily life or in professional settings). Conflicting criteria are typical in evaluating options: cost or price is usually one of the main criteria, and some measure of quality is typically another criterion, easily in conflict with the cost. In order to survive in the present day global competitive environment, it now becomes essential for the manufacturing organisations to take timely and accurate decisions regarding effective use of their scarce resources. Various multicriteria decision-making (MCDM) methods are now available to help those organisations in choosing the best decisive course of actions. In this project work, the applicability of some newly developed MCDM methods will be explored while solving some discrete manufacturing decision making problems. Integrated decision-making framework will also be developed for effective decision-making. Ranking performances of these methods will also be compared. Decision making that deals with several aspects of a finite set of available alternatives in a given situation is often referred to as multi criteria analysis.
The document contains data from surveys measuring student perception and expectation of service quality across 5 dimensions for a university library. It includes tables of data and calculates the gap between perception and expectation for each dimension and individual aspects within each dimension. The dimensions analyzed are tangibles, reliability, responsiveness, assurance, and empathy. Overall, reliability and empathy had the largest negative gaps, while tangibles and responsiveness had smaller negative gaps, indicating the most room for improvement in how the library meets student expectations for those dimensions.
This document discusses how data science involves more than just statistics. It provides examples of how computation can be used to find things to count in text and images, inject context using data from London bike stations and car accident data, change viewpoints such as analyzing data from a supersonic car, and inject new viewpoints like exploring finance portfolio correlations. Computation is a key part of data science that involves techniques beyond just statistics like machine learning, visualization, and other domains.
DecodedConf presentation around processing geospatial data centered around the eight patterns that we've found in working on customer and partner projects at Microsoft.DecodedConf presentation around processing geospatial data centered around the eight patterns that we've found in working on customer and partner projects at Microsoft.
Using amazon machine learning to identify trends in io t data technical 201Amazon Web Services
Internet of Things is creating a tidal wave of new data including events, correlations, business value, and much more. With the proliferation of new data sets, it also introduces more potential issues, errors, and spurious values.
In this session, we will explore using Amazon Machine Learning to analyse and understand the new data collected within your IoT solution. In addition, we will learn how to discover patterns, trends, anomalies, and correlations by demonstrating the capabilities of Amazon Machine Learning and SparkML running on AWS Cloud.
Speaker: Simon Elisha, Solutions Architect, Amazon Web Services
Using Amazon Machine Learning to Identify Trends in IoT Data - Technical 201Amazon Web Services
Internet of Things is creating a tidal wave of new data including events, correlations, business value, and much more. With the proliferation of new data sets, it also introduces more potential issues, errors, and spurious values.
In this session, we will explore using Amazon Machine Learning to analyse and understand the new data collected within your IoT solution. In addition, we will learn how to discover patterns, trends, anomalies, and correlations by demonstrating the capabilities of Amazon Machine Learning and SparkML running on AWS Cloud.
Speaker: Simon Elisha, Solutions Architect, Amazon Web Services
Sample Calculations for solar rooftop project in Indiadisruptiveenergy
This document provides a financial analysis of a proposed solar PV rooftop project over 25 years. It outlines key steps and parameters to consider, including net metering rules, installation costs, internal rate of return, return on investment, payback period, lifetime savings, and discount rate. Tables show projected annual energy generation, costs, revenues, and financial metrics to help evaluate the project's feasibility and profitability. The analysis aims to help consumers make informed decisions without being misled by inaccurate information from solar companies or officials.
The document provides specifications and performance calculations for a BMW 740li vehicle. It includes:
1) Engine specifications such as displacement, power, torque, and transmission gear ratios.
2) Chassis dimensions for the vehicle such as wheelbase, track, length, width, height, and weight.
3) Calculations of forces, resistance, acceleration, velocity, and brake power at different engine speeds and gear ratios.
4) Graphs showing the relationships between acceleration vs velocity, force vs velocity, torque vs engine speed, and brake power vs engine speed.
5) The performance of the vehicle is analyzed based on the graphs, concluding it can reach 100km/hr in 6-
The document contains a table with various hydraulic parameters including:
- K coefficient
- Angle (radians and degrees)
- Dimensionless parameters involving angle
- Differences between parameters
- Wetted area, wetted perimeter, hydraulic radius, velocity, depth, and diameter.
The table contains values for these parameters for different pipe materials with varying angles and coefficients.
The document appears to contain numerical data across multiple categories including potential threats, equipment types, response times, and other factors. The data includes 77 entries with values ranging from -0.087866 to 0.246748 for the first category. Location or device-based data is also included for some categories.
This document contains information about engineering simulations of fluid reservoirs including:
- Basic plots used to model fluid height over time using the Euler integration method
- A schematic layout of a fluid reservoir showing fluid inflow and outflow
- An equation relating the change in fluid height over time to the inlet and outlet mass flows
- A sample simulation run with results showing fluid height decreasing over time as inlet flow stops and outlet flow continues
Volvo EC55B Compact Excavator Service Repair Manual.pdfbin971209zhou
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch including maximum current and wire specifications.
4) Specifications for the battery disconnector switch including operating voltage.
5) Tightening torque specifications for screws, nuts, and other components.
Volvo EC55B Compact Excavator Service Repair Manualfujdfjjskrtekme
This document provides service information for an excavator including:
1) Locations of key components on the excavator and descriptions of each.
2) Conversion tables for common measurement units of length, area, volume, weight, pressure, temperature and flow rate.
3) Specifications for the start switch, battery disconnector switch, and standard tightening torques for screws and nuts.
Volvo EC55B Compact Excavator Service Repair Manual.pdff8usejkdmdd8i
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch including maximum current and wire specifications.
4) Specifications for the battery disconnector switch including operating voltage.
5) Tightening torques for mounting screws and other components along with standard tightening torques for various screw sizes.
Volvo EC55B Compact Excavator Service Repair Manual.pdffjskemdmmded
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch, including maximum current and wire specifications.
4) Specifications for the battery disconnector switch, which has an operating voltage of 6-36V.
5) Standard tightening torques for screws and nuts of various sizes.
Volvo EC55B Compact Excavator Service Repair Manual.pdffujsekmdd9dik
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch including maximum current and wire specifications.
4) Specifications for the battery disconnector switch including operating voltage.
5) Tightening torque specifications for screws, nuts, and other components.
Volvo EC55B Compact Excavator Service Repair Manual.pdffujsekmd9idd1
This document provides service information for an excavator including:
1) Locations of major components on the excavator and descriptions of each.
2) Conversion tables for common measurement units of length, area, volume, weight, pressure, temperature, flow rate and other units.
3) Specifications for start switches, battery disconnect switches, and standard tightening torques for fasteners.
Volvo EC55B Compact Excavator Service Repair Manual.pdffujsekddmdmdm
This document provides service information for an excavator including:
1) Locations of key components on the excavator and diagrams labeling each part.
2) Conversion tables for common measurement units of length, area, volume, weight, pressure, temperature, flow rate and other units.
3) Specifications for start switches, battery disconnect switches, and standard tightening torques for different screw sizes.
Volvo EC55B Compact Excavator Service Repair Manual.pdffyhsejkdm8u
This document provides specifications and information for components of an excavator. It includes:
1) Locations and descriptions of 35 major components of the excavator.
2) Conversion tables for common units of length, area, volume, weight, pressure, torque, power, energy, velocity, and temperature.
3) Specifications for the start switch including maximum current and wire specifications.
4) Specifications for the battery disconnector switch including operating voltage.
5) Tightening torques for mounting screws and other components along with standard tightening torques for various screw sizes.
Volvo EC55B Compact Excavator Service Repair Manual.pdfttf99929781
This document provides service information for an excavator including:
1) Locations of key components on the excavator and diagrams labeling each part.
2) Conversion tables for common measurement units of length, area, volume, weight, pressure, temperature, flow rate and other units.
3) Specifications for start switches, battery disconnect switches, and standard tightening torques for bolts and nuts.
Similar to Processing Planetary Sized Datasets (20)
Project Management: The Role of Project Dashboards.pdfKarya Keeper
Project management is a crucial aspect of any organization, ensuring that projects are completed efficiently and effectively. One of the key tools used in project management is the project dashboard, which provides a comprehensive view of project progress and performance. In this article, we will explore the role of project dashboards in project management, highlighting their key features and benefits.
A neural network is a machine learning program, or model, that makes decisions in a manner similar to the human brain, by using processes that mimic the way biological neurons work together to identify phenomena, weigh options and arrive at conclusions.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
Odoo releases a new update every year. The latest version, Odoo 17, came out in October 2023. It brought many improvements to the user interface and user experience, along with new features in modules like accounting, marketing, manufacturing, websites, and more.
The Odoo 17 update has been a hot topic among startups, mid-sized businesses, large enterprises, and Odoo developers aiming to grow their businesses. Since it is now already the first quarter of 2024, you must have a clear idea of what Odoo 17 entails and what it can offer your business if you are still not aware of it.
This blog covers the features and functionalities. Explore the entire blog and get in touch with expert Odoo ERP consultants to leverage Odoo 17 and its features for your business too.
An Overview of Odoo ERP
Odoo ERP was first released as OpenERP software in February 2005. It is a suite of business applications used for ERP, CRM, eCommerce, websites, and project management. Ten years ago, the Odoo Enterprise edition was launched to help fund the Odoo Community version.
When you compare Odoo Community and Enterprise, the Enterprise edition offers exclusive features like mobile app access, Odoo Studio customisation, Odoo hosting, and unlimited functional support.
Today, Odoo is a well-known name used by companies of all sizes across various industries, including manufacturing, retail, accounting, marketing, healthcare, IT consulting, and R&D.
The latest version, Odoo 17, has been available since October 2023. Key highlights of this update include:
Enhanced user experience with improvements to the command bar, faster backend page loading, and multiple dashboard views.
Instant report generation, credit limit alerts for sales and invoices, separate OCR settings for invoice creation, and an auto-complete feature for forms in the accounting module.
Improved image handling and global attribute changes for mailing lists in email marketing.
A default auto-signature option and a refuse-to-sign option in HR modules.
Options to divide and merge manufacturing orders, track the status of manufacturing orders, and more in the MRP module.
Dark mode in Odoo 17.
Now that the Odoo 17 announcement is official, let’s look at what’s new in Odoo 17!
What is Odoo ERP 17?
Odoo 17 is the latest version of one of the world’s leading open-source enterprise ERPs. This version has come up with significant improvements explained here in this blog. Also, this new version aims to introduce features that enhance time-saving, efficiency, and productivity for users across various organisations.
Odoo 17, released at the Odoo Experience 2023, brought notable improvements to the user interface and added new functionalities with enhancements in performance, accessibility, data analysis, and management, further expanding its reach in the market.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
UI5con 2024 - Bring Your Own Design SystemPeter Muessig
How do you combine the OpenUI5/SAPUI5 programming model with a design system that makes its controls available as Web Components? Since OpenUI5/SAPUI5 1.120, the framework supports the integration of any Web Components. This makes it possible, for example, to natively embed own Web Components of your design system which are created with Stencil. The integration embeds the Web Components in a way that they can be used naturally in XMLViews, like with standard UI5 controls, and can be bound with data binding. Learn how you can also make use of the Web Components base class in OpenUI5/SAPUI5 to also integrate your Web Components and get inspired by the solution to generate a custom UI5 library providing the Web Components control wrappers for the native ones.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Unlock the Secrets to Effortless Video Creation with Invideo: Your Ultimate G...The Third Creative Media
"Navigating Invideo: A Comprehensive Guide" is an essential resource for anyone looking to master Invideo, an AI-powered video creation tool. This guide provides step-by-step instructions, helpful tips, and comparisons with other AI video creators. Whether you're a beginner or an experienced video editor, you'll find valuable insights to enhance your video projects and bring your creative ideas to life.
Enhanced Screen Flows UI/UX using SLDS with Tom KittPeter Caitens
Join us for an engaging session led by Flow Champion, Tom Kitt. This session will dive into a technique of enhancing the user interfaces and user experiences within Screen Flows using the Salesforce Lightning Design System (SLDS). This technique uses Native functionality, with No Apex Code, No Custom Components and No Managed Packages required.
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISTier1 app
Are you ready to unlock the secrets hidden within Java thread dumps? Join us for a hands-on session where we'll delve into effective troubleshooting patterns to swiftly identify the root causes of production problems. Discover the right tools, techniques, and best practices while exploring *real-world case studies of major outages* in Fortune 500 enterprises. Engage in interactive lab exercises where you'll have the opportunity to troubleshoot thread dumps and uncover performance issues firsthand. Join us and become a master of Java thread dump analysis!
21. This is a challenge with a large dataset:
• A traditional relational database typically requires
hand sharding to scale to PBs of data (eg. Postgres).
• Highly indexed non relational solutions can be very
expensive (eg. MongoDB).
• Lightly indexed solutions are a good fit because we
really only have one query we need to execute
against the data. (HBase, Cassandra, and Azure
Table Storage)
28. • Total number
of location
samples in a
geographical
area.
• Whole dataset
operation.
29. • Divides world
up into tiles.
• Each tile has
four children at
the next higher
zoom level.
• Maps 2
dimension
space to 1
dimension.
30. • Can think of it is as “Hadoop the Next
Generation”
• Better performance (10-100x)
• Cleaner programming model
• Used HDInsight Spark (Azure) to avoid
operational difficulties of running our own Spark
cluster.
31. For each location, map to tiles at every zoom level:
(36.9741, -122.0308) [
(10_398_164, 1), (11_797_329, 1)
(12_1594_659, 1), (13_3189_1319, 1),
(14_6378_2638, 1), (15_12757_5276,1),
(16_25514_10552, 1), (17_51028_21105, 1),
(18_102057_42211, 1)
]
32. Reduce all these mappings with the same key into an
aggregate value:
(10_398_164, 151) [
(10_398_164, 1), (10_398_164, 1), …
(10_398_164, 1), (10_398_164, 1), …
(10_398_164, 1), (10_398_164, 1), …
(10_398_164, 1), (10_398_164, 1), …
(10_398_164, 1)
]
34. lines = sc.textFile('wasb://locations@loc.blob.core.windows.net/')
locations = lines.flatMap(json_loader)
heatmap = locations
.flatMap(tile_id_mapper)
.reduceByKey(lambda agg1,agg2: agg1+agg2)
heatmap.saveAsTextFile('wasb://heatmap@loc.blob.core.windows.net/');
Building the heatmap then boils down to this in Spark:
55. • geotile: http://github.com/timfpark/geotile
• XYZ tile math in C#, JavaScript, and Python
• heatmap: http://github.com/timfpark/heatmap
• Spark code for building heatmaps
• tileIndex:
http://github.com/timfpark/tileIndexPusher
• Azure Function for pushing tile indexes.
Thanks…
I thought I would start today by briefly looking at where we’ve been as an industry, and how that has shaped our practice of building software
So, in the beginning there was a computer.
It filled a room and was something that only a government had access to.
It was very sexily called the ENIAC and it did mathematically computations at the blazing speed of 360 calculations per second
Required an army of people in suits to keep it operating.
It was also central to the scientific understanding of how far neutrons could penetrate matter before hitting a nucleus and how much energy it would give off when it did.
To calculate this, the engineers on the project to invent the Monte Carlo Method to accomplish this, which basically uses random samples of a problem to arrive at a converged numerical result.
We still use this technique today in software profiling.
Over a decade or so, computers got smaller and only took up half a room.
They became accessible in price to universities and large corporations.
1000x faster than the first computer.
1000x more access by developers to computers.
Let to many more geniuses at the keyboard (like Kernigan and Riche here)
Their desire to be more productive lead to an explosion of useful software and techniques that we still use today:
First modern systems language in C
Modern operating systems like UNIX
We still model many of our programming languages and operating systems off of this work.
All of these techniques trickled down into the original PCs like the IBM PC and the Macintosh.
These computers put computing in hands of ordinary people, workers, students, and hackers.
This lead to the first widespread usage of computers by non-engineers and lead to a wide reinvention of a wide number of tasks
Writers moved from typewriters to…
… To word processing …
Accountants and inventory management moved from error prone paper…
…To spreadsheets and enterprise resource management systems
The big point here is that the personal computer enabled the automation of a wide range of office work tasks that were largely abstract.
And we as developers invented things like filesystems to manage files and B-trees to enable fast queries against databases.
And this is the big meta point. At every point of the evolution of the computing industry, we have had to investigate and discover what have become widely applicable approaches to building applications based on solving practical real world problems.
The mobile phone has obviously taken this technology progression forward another step.
While many people would point to app stores as the most interesting way mobile has changed the software industry.
In my opinion, the most interesting thing it has done is expand computing out to the real world...
Enabling whole classes of new applications...
Like having a map of the world in our pockets…
…being able to push a button and have a car show up 5 meters away…
And, of course, the all important Pokemon Go, where allows us to catch virtual animals…
In the same way that our predecessors had to figure out the best way to process text and structured data
The explosion of mobile, and the fact that many mobile apps revolve around the real 3D world, means that increasingly processing and analyzing geospatial data is woven into the applications that we are building as developers.
I work in our developer advocacy team at Microsoft and I really have an awesome job where I get to work with a bunch of customers and partners on their hardest technical challenges.
Today I’m going to share some of the things we have learned in the course of those projects around effectively processing geospatial data in the cloud.
The first project I wanted to share with you, and the one I’ll use to largely ground this talk, is around a transportation partner that we worked with here in Europe.
This well known transportation company collects location traces for each of the trips that their fleet takes.
Here is a visualization of what one of those trips looks like from a prototype we built with them.
The shape of this dataset will probably not surprise you
Its basically a whole lot of data that includes:
an vehicle id,
A bunch of sets of trips that are identified by a Trip id,
And a bunch of timestamped location data: latitude, longitude, and altitude data
And so, there are many many vehicles
Who each have many trips
Which all have many timestamped locations.
Its probably not hard for you to imagine that this company ends up with literally a mountain of data.
This is pretty common in this space. Collecting geospatial often results in a ton of data landing at your doorstep.
And to give you and idea of this,
We worked with them on a month’s worth of data.
Even this small time slice of data is still pretty large.
A CSV dump of a month, containing 584B locations across 116M trips, is over 39TB in size.
The point here is that this is definitely larger than any typical computing node.
And therefore we have to use larger scale data techniques in order to store and process it
Let’s first start by talking about how we store the location data
As we talked about previously,
An trip has an ordered set of locations by timestamp
To display an trip, we need to pull all of the associated locations for that trip
So what we need from the storage system is the ability to pull a range of timestamps
With this, we can pull all of the locations for a particular trip.
This is pretty standard stuff for a database but becomes a challenge with a large dataset
(reasons above)
This brings us to the first pattern that we have used for all of our geospatial projects.
And that’s to use a lightly structured storage system for this sort of data.
In our projects, we, naturally, almost exclusively use Azure.
And for this sort of storage requirement we use Azure Table Storage.
For those of you that aren’t familiar with Azure Table Storage, it sits somewhere just above blob storage.
And as such, it is very inexpensive, and costs roughly 2 US cents per GB per month.
And unlike blob storage, you can access ranges and individual rows in the data.
But you can only query on a set of RowKeys within the same PartitionKey.
But we only need to query on timestamp.
And it satisfies our need to be able to query a range of user locations by timestamp range.
Now that we have an approach for storing locations
Let’s look at how we store trip metadata itself
One of the key queries we want to do is around querying the trips in a bounding box.
We also, in the future, want to be able to filter trips on a distance or duration.
What this essentially means is that unlike the location data, we want the trip data to be highly indexed.
We want to be able to do queries like
“Give me all of the trips under 10km in length near Dublin” or
”Give me all of the trips over 1 hour in duration near Berlin”
And this means we need the data columns highlighted above to be indexed.
Which is not a great fit for Azure Table Storage that we saw in the first pattern.
But the good news is that there is 10 to 20 thousand times less data as well.
Which brings it within range of a traditional relational database like MySQL or Postgres.
And, so, we can just set up some schema’ed tables
Which allow us to make rich queries against it.
Which brings us to the second pattern for dealing with high scale data like this:
Use the best storage system for each scenario that solves a particular application’s needs.
We call “polyglot persistence”
And what we mean by that is that you shoud choose each storage system because they excel at a portion of the problem that we are trying to solve.
As we saw before, I am using Azure Table Storage for the location data.
And for the Trip Data, I am using PostgresSQL + PostGIS.
The way this works is that we query for trips using Postgres
and then when we need the location data, we query for it from Table Storage.
Ok, so that is how we are storing and querying trips.
We load locations into Table Storage and trips into Postgres at creation and then query them.
As we saw, the data from these vehicle trips
Is basically a whole lot of CSV files with location information that includes:
User id,
activity id (which identifies all of the data that is part of the same run, walk, hike, bike ride, etc.),
timestamp,
latitude and longitude
There are many many users
Who each have many activities
Which all have many timestamped locations.
So what does the shape of this dataset look like
Its basically a whole lot of CSV files with location information that includes:
an user id,
activity id,
timestamp,
latitude and longitude
There are many many users
Who each have many activities
Which all have many timestamped locations.
Now that we have talked about how we store a dataset of this size,
Let’s dig in and talk more about some of the techniques we can use to process a dataset of this size at scale.
The transportation company we worked with also wanted to be able to visualize a heatmap where their fleet spends its time.
This is a pretty common problem. We also tackled this with, Stroeer, an outdoor advertiser in Germany.
They are the company that does most of the advertising billboards you see on the sidewalks in urban centers throughout Germany.
One of the important factors in advertising is that the overall mood of a particular place can make a particular ad much more or less effective.
Given this, Stroer combined a number of datasets, including geotagged social feeds, to come up with an overall estimate of what people were feeling.
And then use this plus demographic data to decide which ad to show and how much to charge for each ad.
So, building heatmaps like these are a pretty common problem…
Let’s look at how we generated the heatmap for our transportation company, since it is a simpler scenario.
In that case, the heatmap is generated by summing up the number of location samples in a particular geographical area.
So for every location that a truck sends back, we attribute that location to a summary in the heatmap, and sum up over all of the locations that are associated with that summary.
So before we dive into how we implemented this
Let’s discuss how we map a particular location point to a geographic summary.
One of the common patterns we’ve seen is using XYZ Tile as a summarization bucket.
This is not something we’ve invented.
Open Street Maps, Google, and Apple use the same concept for addressing geographical areas in their maps.
It is a fairly simple concept.
The top level world is divided into 4 tiles.
For each zoom level below that, you take the parent tile and recursively divide it into 4 tiles.
A individual tile is then addressed by its zoom level and its row and column within that zoom level.
And so, in order to build these heatmap summaries, we basically count location samples in these geographic XYZ tiles.
This means that it is operating over the whole dataset, and given the large size of the dataset we need to open our big data toolbox to accomplish this.
To accomplish this, we used Apache Spark.
For those of you that have used Hadoop, Apache Spark is sort of a “Hadoop the next generation”
It operates on data in a similar paradigm
But offers much better performance and, in my opinion, a much nicer programming model than the original Hadoop engine.
HDInsight Spark is Azure’s hosted version of Apache Spark.
We used this hosted offering so we could ignore the significant operational work of running a Spark cluster and focus on the actual problem we are trying to solve.
In order to compute our heatmap, we use a pretty standard map / reduce algorithm.
For every location in the dataset, we generate a tile id key/value pair for every zoom level that we want results for.
In this case, we are generating tile key/value pairs for the zoom levels from 10 to 18 because we knew that these are the only set of tiles that the user interface would end up using.
So for instance, for this location from a vehicle,
We generated one for zoom level 10, zoom level 11, … thru zoom level 18.
We then reduce all of these mappings with the same key down into its aggregate heatmap value.
In spark, the first element in a tuple is considered the key
And the second element is considered the value
Which is to say, we take all of the locations with the same tile ids from the previous step and count them.
So let’s look at what an implementation of these concepts look like in Python.
We first compute the tileIds for all of the zoom levels that we want to collect results for.
And then use that to build tuples for each tileId.
With this mapper, the overall implementation in Spark is fairly straightforward.
We point it at the dataset in blob storage,
then parse it as json,
And then use the tile_id_mapper function we defined earlier to map each location to the appropriate zoom level result.
We then reduce all of these individual results by the key to get a final total for each tile result.
We implement the reducer as a anonymous lambda function that essentially sums the intermediate aggregates for tiles with the same id.
And then write the heatmaps back out to blob storage.
From a programming standpoint, Spark makes this look really easy.
But under the covers, Spark is doing a lot of work for us
Remember, we are working on a dataset that doesn’t fit into onto a single machine
During the map stage, any tile id could be generated by any of the mappers in the Spark cluster since locations are uniformly distributed in the dataset.
This means that there needs to be a shuffle step in which the results for a give tile id are assigned to the same reducer so that we can calculate an aggregate value.
And therefore there will be potentially billions of these truple floating around across the cluster.
The good news is that Spark handles all of this underneath the covers for us.
The next pattern I wanted to talk about is incremental ingestion
In the real world, we don't usually have static data but instead data that is constantly arriving.
In this case, we have vehicles that are constantly delivering data.
And ideally, to make Spark more efficient, we'd like to combine these incoming small trips into a set of large aggregate files.
For this we are using Azure Event Hub
You can think of Event Hub as a giant cloud buffer where a downstream backend system controlls the rate at which data is read out.
Helpful in situations where you have 1) bursty data and 2) where we don't need the data until the future.
In this case, we want to use it to buffer up the data for a particular hour time slice
And then create hourly summaries of this new data using another Azure service called Stream Analytics.
Using these pieces of infrastructure in conjunction with each other to enable incremental ingestion of data is a key pattern for high scale location data.
Incremental ingestion leads us to our next pattern of processing data in slices
Which is how do you process these new pieces of incremental data in slices.
We do this in a manner analogous to how we processed the whole dataset in the previous slides.
But instead, we only operate on the single new data slice.
Since we are only processing an individual slice, we do not need to have nearly as large of a cluster to do the processing.
We then load in the previous complete result, fold in this newly computed partial heatmap, and then write out the new heatmap.
Although this adds a second step, overall we are operating on a much smaller set of data, and therefore it is much more efficient.
The next project I wanted to share with you is some work that we’ve been doing with the United Nations Office for the Coordination of Humanitarian Affairs.
This part of the UN is tasked with getting help to the most vulnerable people in the midst of humanitarian crises as quickly as possible.
We worked with them to see how technology could have helped them during the many humanitarian crises that have happened
in the Libyan Civil War that has been ongoing since 2014.
Armed conflicts and natural disasters make up nearly equal parts of their remit.
These events yield devastated civilian infrastructure…
…and many vulnerable refugees seeking shelter and food.
The state of the art of detecting these sorts of crises is still very human driven.
For example, a photojournalist may be documenting the event
And might call back with his observations of where and what help is needed most
The UN then kicks into gear and coordinates a disaster relief effort.
But this is a very slow approach – it can be days before critical needs are discovered
And sadly, that is several days too late for some of the victims of these crises
Fortunately, the UN had an idea for how to improve on this, and it leverages the high penetration of mobile in these developing countries.
The idea was pretty simple:
Could we search for humanitarian keywords in geolocated tweets and other short messages
intersecting and summarizing them against real world features like a city or a state,
To build a near real time dashboard to detect more quickly where these crises are occurring.
Geographical features in the real world have complex shapes.
This is probably not a surprise to you.
The challenge, given this, is how to do these intersections at scale.
In the United Nations project we utilized the Open Street Maps dataset to compute intersections using a two stage process.
We again employed the XYZ tile to accomplish this.
We then loaded and keyed each geolocated tweet. These are the small squares in the slide.
We then loaded each open street map feature and keyed it with the tileId that was the maximum zoom level that encompasses the feature.
In this case, the black square represents the smallest XYZ tile that spans the Benghazi region completely.
In Spark, when you can have two datasets that have keyed tuples with the same units, you can use a join operation to select the elements that exist in both datasets.
We use this to join the geolocated tweets (the small rectangles), against the features.
This will yield us a set of candidate feature matches.
It is not the final set of matches because the black box that represents the spanning tileId for Benghazi does not obviously perfectly fit it.
This means that false matches like the red square are included the candidate matches.
That said, the join does narrow down the set of potential matches considerably, including the blue square below it.
This narrowed set of potential matches allows us to make a second pass, where do a fine grained intersection test against the real border data.
So this is how we can do a scaled intersection of the dataset to come up with the set of features each datapoint intersects with so we can aggregate against them.
However, as I mentioned previously, the United Nations wanted a system that could react in near real time to humanitarian crises.
There are a couple of ways that you can do this.
You could do it with Spark Streaming and micro batches, but that can be an expensive way to accomplish this.
Maintaining an entire cluster to handle the worst case load means that in the common case a number of the nodes of the cluster will go underutilized.
So only really makes sense when you have a pretty consistent volume of data and you can size the cluster appropriately.
Using an entirely batch infrastructure for this can be slow and expensive
So instead of using an entirely batch infrastructure to do this, we adopted a Lambda Architecture
This is becoming a pretty common pattern in our industry.
A lambda architecture pairs a batch layer to do full dataset processing with
a speed layer that does near real time stream level processing to keep these results up to date.
Projects use this architecture because it enables you to do a full recompute on the dataset to handle, for example, new feature additions.
In between those full recomputes, you can use a less expensive mechanism to update the overall view with the latest data.
For the batch layer, we are using, like the previous examples, Apache Spark.
This is the architectural diagram that implements the algorithm that I described in the previous slide.
From an architectural perspective it is very straightforward:
we take in the geolocated tweets and the features,
intersect them with a join,
and then aggregate the values over the features.
And then does a bulk update of the results.
We've talked about what the batch layer for this looks like.
The speed processing layer we implemented builds off of the incremental ingestion processing that we described previously.
We used a new service in Azure called Azure Functions.
Azure Functions enables you to process data as it arrives by applying a function to each of them.
In our case, for each tweet that comes in, we find the geographic features they intersect and update the existing dataset.
This enables us to have a nearly real time update of the dashboard
but if we add features that we want to aggregate over, we can still rerun the batch layer to get summaries over those previous results.
We efficiently implemented this feature intersection service in Azure using a platform called Azure Functions.
Azure Functions is a serverless platform, which means that instead of dimensioning the platform in terms of the number of instances that are constantly running…
…you instead provide a function that should be executed every time a particular event happens
…and the infrastructure handles automatically scaling the number of instances to match the incoming event rate.
This slidesshows a very simple example of what one of these functions look like.
I wrote it in node.js, but you can write these functions in a variety of languages including C#.
Basically, the function receives the tweet data slice that the incremental processing pipeline we previously discussed has dropped into blob storage…
... Breaks it down into individual tweets since the blob contains one per line.
… and calls an underlying service to aggregate them on an hourly basis.
... It then stores these such that the frontend that we have that is consuming them can use them.
(old)
The great news is that Azure has a nice new feature called Functions that does just this.
Azure Functions allows you to setup a trigger on a wide variety of events
Including queued events, http requests, service bus, timers, etc.
It builds on top of our Azure App Service’s Web Jobs functionality
To provide a super easy interface
When one of these triggers is triggered, a function that you have written is executed.
In this case, I wrote my function in JavaScript, but C# and a host of other languages are also supported.
I also set up this Azure Function to trigger on a new blob being added to a blob in storage account container.
So each time a blob is created by the previous incremental ingestion pattern…
... It is passed into this function so that it can be enriched with elevation data.
Note that the whole blob is passed into the function, so you need to make sure that the blobs for your scenario will fit in memory.
Once the locations in the blob have been enriched, we then push these out to another blob...
... Which we use downstream.
I wanted to end today by talking about how we present and display this information.
Many of you might have suspected it when you saw this the first time but there is a fairly large set of data that sent with each of these heatmap queries.
If we made a traditional bounding box database query against a geospatial relational store for each of these heatmaps, you’d likely end up with a solution that didn’t scale well or scaled pretty expensively.
Instead of querying for the heatmaps, we instead precomputed the heatmap elements that should be displayed within a particular view.
We use a lower zoom level block, shown here, as a container for the higher zoom level summaries, and then precompute what should be displayed.
We store all of these resultsets in blob storage, which is very inexpensive compared to the comparable number of VMs/databases that you’d need to do this.
This allows us to turn a querying problem into a “sending json” problem from our web frontends.
But also allows us to cache each of these resultsets in the browser.
Architecturally, this looks like this.
We use the slice architecture that I talked about and heatmap deltas to determine which heatmaps require updates
We then push new heatmap resultsets out to blob storage for each of these.
And that’s it – we are trading off using more storage for precomputed views vs. using computation to generate them on the fly – and this is something that you should in general look for in your projects.
One other thing that I glossed over when we described displaying trips.
As you remember, the application has an elevation graph associated with the application.
But, also remember, our input data set does not have elevation as part of it.
We’ve worked with Guide Dogs for the Blind to build a device that uses data from Open Street Maps
The app helps blind people:
* Discover where they are
* What's around them
* And helps navigate them to locations
That said, while storage is becoming increasingly cheap…
… and while JSON is a fantastic format and very developer friendly.
… the traditional laws still apply: reading and writing data incurs latency.
So when you go to do a project like this for real, consider a binary data serialization format like Avro instead.
By establishing a schema, and using a binary serialization format, you can achieve on the order of a 60% size reduction.
This improves performance of deserializing and serializing the data from and to blob storage
It also, naturally, linearly cuts your data storage costs.
Ok, that’s what I have for you today
I’ve open sourced a couple of things as part of this presentation that you should have a look at if you are interested in more details
Geotile
Heatmap
TileIndex: Azure Function for pushing tile indexes
Includes a sample dataset that you are work against
I will share out are these slides via Twitter, so follow me @timpark if you’d like to get a copy of those.
And with that, thank you for coming out today for the talk, and I’d be happy to take any questions…