Anecdotes about real life usage of Analytics - research done on Google, hence no claims on accuracy. Please use this as a directional insights into the applications and benefits.
An Open Spatial Systems Framework for Place-Based Decision-MakingRaed Mansour
Marynia Kolak, PhD Candidate from Arizona State University's GeoDa Center presented on April 15, 2016 for the Chicago GIS in Public Health group at the Chicago Department of Public Health (CDPH). She presented on the “Healthy Access, Health Regions” project, a collaboration of CDPH and the GeoDa Center at Arizona State. See abstract below:
The “Healthy Access, Health Regions” project is a collaboration with the GeoDa Center, the Chicago Department of Public Health, and others to build a customized open-source web application for data integration, exploratory analysis, and decision-making. It seeks to push GIS to the frontiers of spatial data science, where space serves as the place for integrating research design and methodology, data infrastructure, and learning.
This project works on integrating data on-the-fly and working towards dynamic visualization and analysis in a spatial big data infrastructure. Remotely managed resource and health provider data are streamed into the application for analysis. Functions are encoded to evaluate service areas and explore socioeconomic and community health outcome data. Another aspect integrates an implementation of the max-p algorithm to develop data-driven regions for exploration and analysis. The next phase of development will better integrate dynamic analytics and simulation and enhanced user experience design. This application seeks to not only test feasibility of data integration and analysis support, but also serve as a collaboratively developed and community-driven structure.
Anecdotes about real life usage of Analytics - research done on Google, hence no claims on accuracy. Please use this as a directional insights into the applications and benefits.
An Open Spatial Systems Framework for Place-Based Decision-MakingRaed Mansour
Marynia Kolak, PhD Candidate from Arizona State University's GeoDa Center presented on April 15, 2016 for the Chicago GIS in Public Health group at the Chicago Department of Public Health (CDPH). She presented on the “Healthy Access, Health Regions” project, a collaboration of CDPH and the GeoDa Center at Arizona State. See abstract below:
The “Healthy Access, Health Regions” project is a collaboration with the GeoDa Center, the Chicago Department of Public Health, and others to build a customized open-source web application for data integration, exploratory analysis, and decision-making. It seeks to push GIS to the frontiers of spatial data science, where space serves as the place for integrating research design and methodology, data infrastructure, and learning.
This project works on integrating data on-the-fly and working towards dynamic visualization and analysis in a spatial big data infrastructure. Remotely managed resource and health provider data are streamed into the application for analysis. Functions are encoded to evaluate service areas and explore socioeconomic and community health outcome data. Another aspect integrates an implementation of the max-p algorithm to develop data-driven regions for exploration and analysis. The next phase of development will better integrate dynamic analytics and simulation and enhanced user experience design. This application seeks to not only test feasibility of data integration and analysis support, but also serve as a collaboratively developed and community-driven structure.
Discussions on
Dr. S. GOKULA KRISHNAN, 2 Associate Professor @NSM
Definition of Conflict
Transitions in Conflict Thought
Conflict Process
Conflict Management Techniques
Negotiation
Bargaining Strategies
The Negotiation Process
Reference:
Stephen P Robbins, Timothy A Judge & NeharikaVohra, Organizational Behaviour, 15thed., p. 477-502
Discussions on
Dr. S. GOKULA KRISHNAN, 2 Associate Professor @NSM
Definition of Power
Bases of Power
Dependence: The Key to Power
Power Tactics
Politics: Power in Action
Causes and Consequences of Political Behavior
Reference:
Stephen P Robbins, Timothy A Judge & NeharikaVohra, Organizational Behaviour, 15thed., p. 439-466
Discussions on
Disciplines Contributing to OB
Psychology
Sociology
Anthropology
Social Psychology
Economics & Political Science
Case Incident -2
Article –1
Reference:
Stephen P Robbins, Timothy A Judge & NeharikaVohra, Organizational Behaviour, 15thed., p. 14-16
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
Describing a predictive data mining model can provide a competitive advantage for solving business problems with a model. The SSA approach can also provide reasons for the forecast for each record. This can help drive investigations into fields and interactions during a data mining project, as well as identifying "data drift" between the original training data, and the current scoring data. I am working on open source version of SSA, first in R.
This project aims to help incoming students find suitable accommodation by using K-Means and DBSCAN clustering algorithms. The analysis is based on students' preferences for amenities, budget, and proximity to the location. The data consists of accommodation details in various neighborhoods of the city.
The study utilized exploratory data analysis techniques, such
as descriptive statistics, univariate visualization, and
multivariate visualization, to gain insights into the dataset. K-
Means and DBSCAN clustering algorithms were applied to
classify the accommodation into different clusters based on the
preferences of the students. The results showed that both
algorithms successfully classified the accommodation into
clusters, with K-Means providing a more structured clustering,
and DBSCAN being more flexible and able to detect outliers
and noise.
Keywords: Exploratory Data Analysis, K-means, DBSCAN, Machine Learning, Data Visualization, Data
Cleaning, Student accommodation, Geolocation,
Geographic Information Systems, Evaluation.
Discussions on
Dr. S. GOKULA KRISHNAN, 2 Associate Professor @NSM
Definition of Conflict
Transitions in Conflict Thought
Conflict Process
Conflict Management Techniques
Negotiation
Bargaining Strategies
The Negotiation Process
Reference:
Stephen P Robbins, Timothy A Judge & NeharikaVohra, Organizational Behaviour, 15thed., p. 477-502
Discussions on
Dr. S. GOKULA KRISHNAN, 2 Associate Professor @NSM
Definition of Power
Bases of Power
Dependence: The Key to Power
Power Tactics
Politics: Power in Action
Causes and Consequences of Political Behavior
Reference:
Stephen P Robbins, Timothy A Judge & NeharikaVohra, Organizational Behaviour, 15thed., p. 439-466
Discussions on
Disciplines Contributing to OB
Psychology
Sociology
Anthropology
Social Psychology
Economics & Political Science
Case Incident -2
Article –1
Reference:
Stephen P Robbins, Timothy A Judge & NeharikaVohra, Organizational Behaviour, 15thed., p. 14-16
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Greg Makowski
Describing a predictive data mining model can provide a competitive advantage for solving business problems with a model. The SSA approach can also provide reasons for the forecast for each record. This can help drive investigations into fields and interactions during a data mining project, as well as identifying "data drift" between the original training data, and the current scoring data. I am working on open source version of SSA, first in R.
This project aims to help incoming students find suitable accommodation by using K-Means and DBSCAN clustering algorithms. The analysis is based on students' preferences for amenities, budget, and proximity to the location. The data consists of accommodation details in various neighborhoods of the city.
The study utilized exploratory data analysis techniques, such
as descriptive statistics, univariate visualization, and
multivariate visualization, to gain insights into the dataset. K-
Means and DBSCAN clustering algorithms were applied to
classify the accommodation into different clusters based on the
preferences of the students. The results showed that both
algorithms successfully classified the accommodation into
clusters, with K-Means providing a more structured clustering,
and DBSCAN being more flexible and able to detect outliers
and noise.
Keywords: Exploratory Data Analysis, K-means, DBSCAN, Machine Learning, Data Visualization, Data
Cleaning, Student accommodation, Geolocation,
Geographic Information Systems, Evaluation.
Prognosis - An Approach to Predictive Analytics- Impetus White PaperImpetus Technologies
For Impetus’ White Papers archive, visit- http://www.impetus.com/whitepaper
The paper talks about implementation of Behavioral Targeting for the ad world. This is a statistical machine learning algorithm that helps select most relevant ads to be displayed to a web user based on their historical data.
Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execu...Saama
Nikhil Gopinath, Senior Solutions Engineer for the Life Sciences at Saama, spoke at EyeforPharma's Clinical Trial Innovation Summit event in February 2017. These slides are from his "Leverage Big Data Analytics to Enhance Clinical Trials from Planning to Execution" presentation.
Csit65111ASSOCIATIVE REGRESSIVE DECISION RULE MINING FOR ASSOCIATIVE REGRESSI...cscpconf
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns, sentiments and attitudes toward entities, products, services and their attributes. With the rapid development in the field of Internet, potential customer’s provides a satisfactory level of product/service reviews. The high volume of customer reviews were developed for product/review through taxonomy-aware processing but, it was difficult to identify the best reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is developed to predict the pattern for service provider and to improve customer satisfaction based on the review comments. Associative Regression based Decision Rule Mining performs twosteps for improving the customer satisfactory level. Initially, the Machine Learning Bayes Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After that, Regressive factor of the opinion words and Class labels were checked for Association between the words by using various probabilistic rules. Based on the probabilistic rules, the opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of service preferred by the customers with their review comments. The Associative Regressive Decision Rule helps the service provider to take decision on improving the customer satisfactory level. The experimental results reveal that the Associative Regression Decision Rule Mining (ARDRM) technique improved the performance in terms of true positive rate, Associative Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of similar pattern.
Associative Regressive Decision Rule Mining for Predicting Customer Satisfact...csandit
Opinion mining also known as sentiment analysis, involves customer satisfactory patterns,
sentiments and attitudes toward entities, products, services and their attributes. With the rapid
development in the field of Internet, potential customer’s provides a satisfactory level of
product/service reviews. The high volume of customer reviews were developed for
product/review through taxonomy-aware processing but, it was difficult to identify the best
reviews. In this paper, an Associative Regression Decision Rule Mining (ARDRM) technique is
developed to predict the pattern for service provider and to improve customer satisfaction based
on the review comments. Associative Regression based Decision Rule Mining performs twosteps
for improving the customer satisfactory level. Initially, the Machine Learning Bayes
Sentiment Classifier (MLBSC) is used to classify the class labels for each service reviews. After
that, Regressive factor of the opinion words and Class labels were checked for Association
between the words by using various probabilistic rules. Based on the probabilistic rules, the
opinion and sentiments effect on customer reviews, are analyzed to arrive at specific set of
service preferred by the customers with their review comments. The Associative Regressive
Decision Rule helps the service provider to take decision on improving the customer satisfactory
level. The experimental results reveal that the Associative Regression Decision Rule Mining
(ARDRM) technique improved the performance in terms of true positive rate, Associative
Regression factor, Regressive Decision Rule Generation time and Review Detection Accuracy of
similar pattern.
A Big Data Telco Solution by Dr. Laura Wynterwkwsci-research
Presented during the WKWSCI Symposium 2014
21 March 2014
Marina Bay Sands Expo and Convention Centre
Organized by the Wee Kim Wee School of Communication and Information at Nanyang Technological University
We describe a novel approach and a software framework for mobile crowdsensing applications with mobile agents that enables energy efficient, robust and scalable campaign execution. The framework is Web-enabled for integration with existing pervasive computing systems.
Building Predictive Analytics on Big Data PlatformsOlha Hrytsay
SoftServe Innovation Conference in Austin, Texas 2013
Building Predictive Analytics on Big Data Platforms presented by Olha Hrytsay (BI Consultant) and Serhiy Shelpuk (Lead Data Scientist)
We are providing training on IEEE 2016-17 projects for Ph.D Scalars, M.Tech, B.E, MCA, BCA and Diploma students for
all branches for their academic projects.
For more details call us or watsapp us @ 7676768124 0r 9545252155
Email your base papers to "adritsolutions@gmail.co.in"
We are providing IEEE projects on
1) Cloud Computing, Data Mining, BigData Projects Using JAva
2) Image Processing and Video Procesing (MATLAB) , Signal Processing
3) NS2 (Wireless Sensor, MANET, VANET)
4) ANDRIOD APPS
5) JAVA, JEE, J2EE, J2ME
6) Mechanical Design projects
7) Embedded Systems and IoT Projects
8) VLSI- Verilog Projects (ModelSim and Xilinx using FPGA)
For More details Please Visit us at
Adrit Solutions
Near Maruthi Mandir
#42/5, 18th Cross, 21st Main
Vijaynagar
Bangalore.
Machine Learning in 2016: Live Q&A with Carlos GuestrinTuri, Inc.
Live webinar session with Carlos Guestrin, Dato CEO and Amazon Professor of Machine Learning at University of Washington. Carlos reviewed 2015 highlights, previewed the Dato roadmap, and answered real-time questions from participants about use cases, algorithms, and resources.
Tutorial for Machine Learning 101 (an all-day tutorial at Strata + Hadoop World, New York City, 2015)
The course is designed to introduce machine learning via real applications like building a recommender image analysis using deep learning.
In this talk we cover deployment of machine learning models.
Overview of Machine Learning and Feature EngineeringTuri, Inc.
Machine Learning 101 Tutorial at Strata NYC, Sep 2015
Overview of machine learning models and features. Visualization of feature space and feature engineering methods.
Scalable tabular (SFrame, SArray) and graph (SGraph) data-structures built for out-of-core data analysis.
The SFrame package provides the complete implementation of:
SFrame
SArray
SGraph
The C++ SDK surface area (gl_sframe, gl_sarray, gl_sgraph)
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
6. “I have been constantly surprised at how little quantitative
information can be brought to bear on fundamental policy
questions [...] This experience illustrates the need for
flexibility in data collection, especially when policymakers
consider extending new policies or need to evaluate them in
real time for other reasons. Ideally, some sort of ‘rapid
response’ data gathering capacity.”
— Alan Krueger, “Stress Testing Economic Data”
7. “The collection of statistics needs to be modernized;
it is time to use the new technologies to start
collecting data.
…particularly important in developing countries
where the prevalence of mobile phones now offers
an unprecedented opportunity to measure the
economy.”
— Diane Coyle, “GDP”
10. “However, at this moment in survey research,
uncertainty reigns. Participation rates in
household surveys are declining throughout
the developed world. Surveys seeking high
response rates are experiencing crippling cost
inflation. Traditional sampling frames that
have been serviceable for decades are fraying
at the edges.”
— Robert Groves, “Three Eras of Survey
Research”
26. survey campaign
allocation quality control
analytics
end-user
data contributor
User poses a question that is best
answered by via actual, on-the-
ground observation at scale.
Question is translated into
an internal “specification” of
the data points needed to
answer the question: type,
location, frequency,
coverage, etc.
Inventory of data points
automatically allocated to
data contributor pool, taking
into account budget, agent
profiles and geography.
Data points are dynamically
priced.
Contributors collect data in the
field using Android phones…
… which are sent back to the
Premise network.
QC is a mix of automated
(outlier detection; machine
learning; computer vision)
and manual (directed
sampling using oDesk)
checks.
Automated capabilities to
explore data and expose
trends or patterns;
hypothesize new features to
explain variation; suggest
specification refinement;
improve automated
verification.
end user
data
contributor
PLATFORM
30. Average wait times are about ~10m longer in
Maracaibo than in Caracas.
Police are present ~80% of the time in
Maracaibo, but only 30-40% in Caracas.
32. survey campaign
allocation quality control
analytics
end-user
data contributor
User poses a question that is best
answered by via actual, on-the-
ground observation at scale.
Question is translated into
an internal “specification” of
the data points needed to
answer the question: type,
location, frequency,
coverage, etc.
Inventory of data points
automatically allocated to
data contributor pool, taking
into account budget, agent
profiles and geography.
Data points are dynamically
priced.
Contributors collect data in the
field using Android phones…
… which are sent back to the
Premise network.
QC is a mix of automated
(outlier detection; machine
learning; computer vision)
and manual (directed
sampling using oDesk)
checks.
Automated capabilities to
explore data and expose
trends or patterns;
hypothesize new features to
explain variation; suggest
specification refinement;
improve automated
verification.
end user
data
contributor
PLATFORM
33. survey campaign
allocation quality control
analytics
end-user
data contributor
User poses a question that is best
answered by via actual, on-the-
ground observation at scale.
Question is translated into
an internal “specification” of
the data points needed to
answer the question: type,
location, frequency,
coverage, etc.
Inventory of data points
automatically allocated to
data contributor pool, taking
into account budget, agent
profiles and geography.
Data points are dynamically
priced.
Contributors collect data in the
field using Android phones…
… which are sent back to the
Premise network.
QC is a mix of automated
(outlier detection; machine
learning; computer vision)
and manual (directed
sampling using oDesk)
checks.
Automated capabilities to
explore data and expose
trends or patterns;
hypothesize new features to
explain variation; suggest
specification refinement;
improve automated
verification.
end user
data
contributor
allocation
PLATFORM
34. survey campaign
allocation quality control
analytics
end-user
data contributor
User poses a question that is best
answered by via actual, on-the-
ground observation at scale.
Question is translated into
an internal “specification” of
the data points needed to
answer the question: type,
location, frequency,
coverage, etc.
Inventory of data points
automatically allocated to
data contributor pool, taking
into account budget, agent
profiles and geography.
Data points are dynamically
priced.
Contributors collect data in the
field using Android phones…
… which are sent back to the
Premise network.
QC is a mix of automated
(outlier detection; machine
learning; computer vision)
and manual (directed
sampling using oDesk)
checks.
Automated capabilities to
explore data and expose
trends or patterns;
hypothesize new features to
explain variation; suggest
specification refinement;
improve automated
verification.
end user
data
contributor
analytics
PLATFORM
35. survey campaign
allocation quality control
analytics
end-user
data contributor
User poses a question that is best
answered by via actual, on-the-
ground observation at scale.
Question is translated into
an internal “specification” of
the data points needed to
answer the question: type,
location, frequency,
coverage, etc.
Inventory of data points
automatically allocated to
data contributor pool, taking
into account budget, agent
profiles and geography.
Data points are dynamically
priced.
Contributors collect data in the
field using Android phones…
… which are sent back to the
Premise network.
QC is a mix of automated
(outlier detection; machine
learning; computer vision)
and manual (directed
sampling using oDesk)
checks.
Automated capabilities to
explore data and expose
trends or patterns;
hypothesize new features to
explain variation; suggest
specification refinement;
improve automated
verification.
end user
data
contributor
quality control
PLATFORM
59. Exploration vs Survey Consistency
- Campaign layers: separate discovery and survey
- Iteratively refine attribute and geospatial targeting
- Monitor correlation in item responses and
appearance of new attributes
- Monitor residual endogeneity
62. Coalitions vs Referrals
- Referrals are necessary to reach most remote areas
- However we need to be able to partition the
Premise graph into independent subnetworks, e.g.
for re-evaluation, experimentation and sample
stratification.
63. CONTRIBUTOR AFFINITY MODEL
Model features:
direct referral
account features
upload location
visit histories
geographic area
response correlation
Issues: bootstrapping affinity scores for
new users, optimal scheduler is
antagonistic for coalition discovery
Sampling from Large Graphs [Leskovec & Faloutsos; 2006]
weight
64. RECAP
- Orchestrating collective
intelligence
- Optimizing task allocation via
dynamic scheduling and incentives
- Exploration and discovery while
maintaining survey consistency
- Fraud and coalition formation in
networks
67. “The problem of changing statistics is that you lose the
ability to compare across time. The longer the time-
series, the harder it is to change it, but you want to be
able to compare. How do you replace GDP? And if you
do, you lose the past sixty years of relevance. This has
been a problem for centuries—take the Spanish silver
trade. Anything you measure will become increasingly
irrelevant over time.”
— Hans Rosling
[Zachary Karabell, The Leading Indicators]
68.
69.
70. “You need to focus on quality.
You’ll be better off with a
small but carefully structured
sample rather than a large
sloppy sample.”
— Hal Varian, Google