The document discusses controlled experimentation (A/B testing) as a method to study the effects of treatments on users. It notes that experiments randomly divide users into a control and treatment group, with the only difference being the treatment evaluated. Performance metrics are collected and statistically analyzed to determine if any differences are due to the treatment or random chance. Examples of experiments include variations to website design, mobile calls to action, and personalization algorithms. Key aspects of experimentation platforms include hashing to randomly assign users, detailed logging, metrics dashboards, and ensuring control and treatment groups are identical. The document emphasizes measuring overall impact beyond just segments under treatment.
Startup Series: Lean Analytics, Innovation, and Tilting at WindmillsThe Hive
Fifty years ago, a typical company on the S&P 500 stayed there for three-quarters of a century. Today, they last only fifteen years. Technological disruption has run roughshod through the boardrooms of the world.
At the same time, small startups with nothing to lose have become more methodical about iteration, experimentation, and innovation. Fueled by deep investment backing and unfettered by legacy distractions like regulation, customers, and infrastructure, they're turning into Billion-dollar ventures.
From lackluster jobs growth to tech speculation to the disruption of nearly every industry, the death of big companies is the elephant in the room. But can we teach the elephant to dance? Join author, entrepreneur, and Strata conference chair Alistair Croll for a look at how some large organizations are applying data-driven methods, a deliberate portfolio of innovation, and Lean approaches that help them survive—and even thrive—in a changing competitive landscape.
Search at Linkedin by Sriram Sankar and Kumaresh PattabiramanThe Hive
Search is an important and integrated part of the overall LinkedIn experience, and it takes many forms - such as Instant, SERP, Recruiter Search, Job Seeker, etc. Search needs to deal with both structured and unstructured content, and be personalized.
In this talk, Sriram will describe Linkedin unified infrastructure to support these different needs, and will provide some insights into our various approaches to search quality.
Untethered health in a networked society by James MathewsThe Hive
Talk by James Mathews, Chairman, Health 2.0 India
CEO, Whiteboard Design Pvt Ltd at The Hive Big Data Think Tank Meetup - Healthcare 2.0 hosted at the EMC India.
Startup Series: Lean Analytics, Innovation, and Tilting at WindmillsThe Hive
Fifty years ago, a typical company on the S&P 500 stayed there for three-quarters of a century. Today, they last only fifteen years. Technological disruption has run roughshod through the boardrooms of the world.
At the same time, small startups with nothing to lose have become more methodical about iteration, experimentation, and innovation. Fueled by deep investment backing and unfettered by legacy distractions like regulation, customers, and infrastructure, they're turning into Billion-dollar ventures.
From lackluster jobs growth to tech speculation to the disruption of nearly every industry, the death of big companies is the elephant in the room. But can we teach the elephant to dance? Join author, entrepreneur, and Strata conference chair Alistair Croll for a look at how some large organizations are applying data-driven methods, a deliberate portfolio of innovation, and Lean approaches that help them survive—and even thrive—in a changing competitive landscape.
Search at Linkedin by Sriram Sankar and Kumaresh PattabiramanThe Hive
Search is an important and integrated part of the overall LinkedIn experience, and it takes many forms - such as Instant, SERP, Recruiter Search, Job Seeker, etc. Search needs to deal with both structured and unstructured content, and be personalized.
In this talk, Sriram will describe Linkedin unified infrastructure to support these different needs, and will provide some insights into our various approaches to search quality.
Untethered health in a networked society by James MathewsThe Hive
Talk by James Mathews, Chairman, Health 2.0 India
CEO, Whiteboard Design Pvt Ltd at The Hive Big Data Think Tank Meetup - Healthcare 2.0 hosted at the EMC India.
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
Igor Canadi, Facebook
Igor is a software engineer at Facebook where his job is making databases more awesome. He recently graduated from University of Wisconsin-Madison with Masters degree in Computer Science. During his time at UW-M, he worked with prof. Paul Barford in the area of internet measurement and analysis. Igor got his undergraduate degree from University of Zagreb in Croatia. During his undergraduate years, he founded and developed a local non-profit organization that focuses on educating talented high-school students.
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...The Hive
Some of the most demanding real-time big data driven platforms on the Internet today are in programmatic advertising and real-time bidding.
These platforms continuously ingest, store, analyze and act on billions of events and terabytes of data to personalize interactions with every click and swipe across websites, mobile apps, emails, social media, sensors and more. But that’s not enough. In order to win at auction, capture the user’s attention and drive revenue, they must continuously extract new insights with advanced visual analytics and combine these insights with real-time data to perform real-time analytics, moment-by-moment, all the time.
Brian Bulkowski, co-founder & CTO of Aerospike, an open source flash-optimized NoSQL database, will talk about the latest developments in storage and lead a discussion with Kiran about the challenges and opportunities created for analytics at platform scale.
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
Igor Canadi, Facebook
Igor is a software engineer at Facebook where his job is making databases more awesome. He recently graduated from University of Wisconsin-Madison with Masters degree in Computer Science. During his time at UW-M, he worked with prof. Paul Barford in the area of internet measurement and analysis. Igor got his undergraduate degree from University of Zagreb in Croatia. During his undergraduate years, he founded and developed a local non-profit organization that focuses on educating talented high-school students.
Advanced Visual Analytics and Real-time Analytics at Platform scale by Brian ...The Hive
Some of the most demanding real-time big data driven platforms on the Internet today are in programmatic advertising and real-time bidding.
These platforms continuously ingest, store, analyze and act on billions of events and terabytes of data to personalize interactions with every click and swipe across websites, mobile apps, emails, social media, sensors and more. But that’s not enough. In order to win at auction, capture the user’s attention and drive revenue, they must continuously extract new insights with advanced visual analytics and combine these insights with real-time data to perform real-time analytics, moment-by-moment, all the time.
Brian Bulkowski, co-founder & CTO of Aerospike, an open source flash-optimized NoSQL database, will talk about the latest developments in storage and lead a discussion with Kiran about the challenges and opportunities created for analytics at platform scale.
Pairwise and Combinatorial testing can dramatically improve the efficiency and effectiveness of both test design (identifying and document what to test) as well as test execution (the process of executing the test cases).
This presentation, by Justin Hunter, the founder of Hexawise, to members of TISQA, explains how these methods work, highlights empirical evidence that shows this method has been proven to more than double the number of defects found per tester hour in ten separate projects, and highlights a case study of a recent user of the Hexawise test design tool.
Quantum Computing (IBM Q) - Hive Think Tank Event w/ Dr. Bob Sutor - 02.22.18The Hive
Dr. Bob Sutor is Vice President for AI, Blockchain, and Quantum Solutions at IBM Research. In this role he is the R&D executive leading a large global group of scientists, software engineers, and designers who create and integrate leading edge science and technologies to give IBM's clients the most advanced solutions available. Our work is often mathematically-based and thus includes AI technologies like machine learning, deep learning, text and image analytics, statistics, predictive analytics, and optimization. Sutor co-leads the IBM Research effort to support IBM's commercial blockchain efforts with advanced innovations across a broad range of its embedded technologies. He leads the group developing the next generation software stack and algorithms for quantum computers.
Dr. Sutor has an undergraduate degree from Harvard College and a Ph.D. from Princeton University, both in Mathematics.
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive
Think Tank Event 10/23/2017, hosted by The Hive and presented by Ted Dunning, Chief Application Architect of MapR Technologies and Ellen Friedman of MapR Technologies.
The Hive Think Tank: AI in The Enterprise by Venkat SrinivasanThe Hive
This The Hive Think Tank talk by Venkat Srinivasan, CEO of RAGE Frameworks, focuses on successful applications of AI in the Enterprise. We start with a broad and more inclusive definition of AI in the context of enterprise business processes.
We introduce a taxonomy of AI solution methods that broaden the focus beyond a narrow focus on deep learning based on neural nets. In line with the taxonomy, we present several successful AI applications in use today at major corporations across industries including financial services, manufacturing/retail, professional services, logistics. These applications range from commercial lending, contract review, customer service intelligence, market and competitive intelligence, signals for capital markets, regulatory compliance and others.
The Hive Think Tank: Machine Learning Applications in Genomics by Prof. Jian ...The Hive
In this The Hive Think Tank talk, Professor Jian Ma introduces machine learning methods that can be used to help tackle some of the most intriguing questions in genomics and biomedicine. He discusses the research projects in his group to study genome structure and function, including algorithms to unravel complex genomic aberrations in cancer genomes and gene regulatory principles encoded in our genome, by utilizing
probabilistic graphical models and deep neural network techniques. The knowledge obtained from such computational methods can greatly enhance our ability to understand disease genomes.
The Hive Think Tank: The Future Of Customer Support - AI Driven AutomationThe Hive
The Hive Think Tank Panel Discussion moderated by Kate Leggett (Forrester) with panelists: Allan Leinwand (ServiceNow), Nitin Narkhede (Wipro), Jason Smale (Zendesk), Dan Turchin (Neva). The future of customer support is AI-driven virtual agents. Soon, we’ll interact conversationally with bots that know who we are, how we’re impacted, and what we need. Soon, the capabilities of virtual agents will far exceed those of today’s best human agents. We’ll receive support that is more reliable than friends, more accurate than social media, and less frustrating than waiting on hold.
The Hive Think Tank: Talk by Mohandas Pai - India at 2030, How Tech Entrepren...The Hive
Over the next 15 years, India's growth will be fueled by its startups. Today, there are over 20,000 startups in India that have created a value of $80 billion and employ 325,000 people. Over the next ten years, by 2025, there will be 100,000 startups in the country that would have created over $500 billion of value and employ 3.2 million people.
This talk is about India's growth over the next 15 years and the prominent role that entrepreneurs and startups will play in its rapid evolution.
The Hive Think Tank: The Content Trap - Strategist's Guide to Digital ChangeThe Hive
In this The Hive Think Tank talk Harvard Business School Professor of Strategy Prof. Bharat Anand shares his insights on the Digital innovation trends that are shaping the way organizations will act in the future.
In this talk, Professor Anand presents the findings from his forthcoming book. To answer these questions, Anand examines a range of businesses around the world, from Chinese internet giant Tencent to Scandinavian digital trailblazer Schibsted, from The New York Times to The Economist, and from talent management to the future of education.
In this The Hive Think Tank talk, Heron team provides an introduction to Heron, how it is being used at Twitter and shares an operating experiences and challenges of running Heron at scale. They recently announced the open sourcing of Heron under the permissive Apache v2.0 license. Heron has been in production nearly 2 years and is widely used by several teams for diverse use cases. Prior to Heron, Twitter used Apache Storm, which we open sourced in 2011. Heron features a wide array of architectural improvements and is backward compatible with the Storm ecosystem for seamless adoption.
The Hive Think Tank: Unpacking AI for Healthcare The Hive
In this The Hive Think Tank talk, Ash Damle, CEO of Lumiata takes a deep dive into Lumiata’s core technological engine - the Lumiata Medical Graph, which applies graph-based machine learning to compute the complex relationships between health data in the same way that a physician would, and how this medical AI engine powers personalization and automation within risk and care management.
The Hive Think Tank: Translating IoT into Innovation at Every Level by Prith ...The Hive
In this presentation Prith Banerjee discusses how a sustainable future must become radically more efficient with the way we use energy. He shared how the Internet of Things (IoT) and the convergence of Operational Technology (OT) and Information Technology (IT) are enabling Schneider Electric's innovation at every level, redefining power and automation for a new world of energy which is more electric, decarbonized, decentralized and digitized. Prith shared how, in this new world of energy, Schneider ensures that Life Is On everywhere, for everyone and at every moment. He also shared a set of IoT predictions for the future, based on findings of the company’s recent IoT Survey of 2,500 top business executives.
The Hive Think Tank - The Microsoft Big Data Stack by Raghu Ramakrishnan, CTO...The Hive
Until recently, data was gathered for well-defined objectives such as auditing, forensics, reporting and line-of-business operations; now, exploratory and predictive analysis is becoming ubiquitous, and the default increasingly is to capture and store any and all data, in anticipation of potential future strategic value. These differences in data heterogeneity, scale and usage are leading to a new generation of data management and analytic systems, where the emphasis is on supporting a wide range of very large datasets that are stored uniformly and analyzed seamlessly using whatever techniques are most appropriate, including traditional tools like SQL and BI and newer tools, e.g., for machine learning and stream analytics. These new systems are necessarily based on scale-out architectures for both storage and computation.
Hadoop has become a key building block in the new generation of scale-out systems. On the storage side, HDFS has provided a cost-effective and scalable substrate for storing large heterogeneous datasets. However, as key customer and systems touch points are instrumented to log data, and Internet of Things applications become common, data in the enterprise is growing at a staggering pace, and the need to leverage different storage tiers (ranging from tape to main memory) is posing new challenges, leading to caching technologies, such as Spark. On the analytics side, the emergence of resource managers such as YARN has opened the door for analytics tools to bypass the Map-Reduce layer and directly exploit shared system resources while computing close to data copies. This trend is especially significant for iterative computations such as graph analytics and machine learning, for which Map-Reduce is widely recognized to be a poor fit.
While Hadoop is widely recognized and used externally, Microsoft has long been at the forefront of Big Data analytics, with Cosmos and Scope supporting all internal customers. These internal services are a key part of our strategy going forward, and are enabling new state of the art external-facing services such as Azure Data Lake and more. I will examine these trends, and ground the talk by discussing the Microsoft Big Data stack.
2. Controlled Experimentation (A/B Testing)!
• Method to study effects of a treatment
#
• Concept:!
- Randomly split users into two groups#
Randomly
Divide
➥ A : Control#
➥ B: Treatment#
- A and B are identical to each other except
A
(Control)
B
(Treatment)
for the treatment being evaluated#
- Collect performance metrics from the
experiment#
- Run statistical tests to determine if
differences between A and B are purely
by chance#
Measure
&
Evaluate
Controlled
Experimenta=on
Panel
2
3. Why Run Controlled Experiments?!
• Commonly used approach in clinical trials!
- What is the effect of a particular drug / treatment?#
• Systematically validate hypotheses with data!
!
• Concurrently run the treatment and control!
- The difference (if any) is#
➥ Because of the treatment OR#
➥ Due to random chance#
• Determine if a treatment is causal in nature!
- E.g., Making the search box bigger causes increase in queries / user#
Controlled
Experimenta=on
Panel
3
4. Controlled Experimentation: Use Cases!
#
A
B
Stract
Widget
Company
A
B
Stract
Widget
Company
_________________
_________________
_________________
_________________
_________________
_________________
_________________
BUY
NOW
_________________
BUY
NOW
Website
Variants
Controlled
Experimenta=on
Panel
4
8. Controlled Experimentation: Use Cases!
# • Follow-up message for users
who previously clicked on an
ad#
• Incentive campaign to re-
engage lapsed users#
• Think of this as placing filters /
guards on a randomly chosen
user population#
Custom
Defined
User
Segments
Controlled
Experimenta=on
Panel
8
9. Key Components of an Experimentation Platform!
Hashing function! Metrics – suite of KPI!
Group
0
! ! Revenue
!
!
F(
)
Group
1
!
!
Time
Spent
Abandonment
Click-‐Through
Rate
! Group
N-‐1
!
! ! Session
Length
Purchase
Rate
Logging! Dashboard!
!
!
!
!
• Detailed
logging
of
all
user
interac=ons
• Metric
improvements
and
Sta=s=cal
Significance
in
a
central
place
Controlled
Experimenta=on
Panel
9
10. Ensure Identical Control and Treatment!
Gender
• Custom Segments#
Male
Female
• Frequency Distribution#
CONTROL
TREAMENT
Region
Size
Small
Medium
Large
CONROL
TREATMENT
Prior
Exposure
• Large Difference in Prior
Exposure Rate violates δ%
assumptions# No
Yes
CONROL
TREATMENT
Controlled
Experimenta=on
Panel
10
11. A/A Tests!
• Run an experiment with two identical variants#
• Helps to determine if:#
- Users are being split uniformly at random#
- Correct data is being logged#
- Variance between identical populations of users is acceptable#
• Challenge:!
- Few purchases of high value deals render statistically significant
difference between treatment and control#
SPAIN
TRIP
$1,999
Controlled
Experimenta=on
Panel
11
12. Monitor Each Variant!
• Place yourself in each variant
to validate the experience#
!
• Wrong sort order!!
!
!
Carefully
inspect
each
variant
Controlled
Experimenta=on
Panel
12
13. Objective Function!
#
#
Conversion
Revenue
P(conversion)
E(rev)
=
P(conversion)
*
price
• Favors
lower
• More
expensive
deals
can
price
deals
dominate
Need
to
balance
mul=ple,
oZen
conflic=ng
objec=ves
Controlled
Experimenta=on
Panel
13
14. Measure Overall Impact!
• Test focuses on#
- A particular area of
the website#
- A sub-population of
users#
• Measure!
- Improvement on the
sub-segment AND#
- Entire population!#
Measure
overall
impact
to
guard
against
cannibaliza=on
Controlled
Experimenta=on
Panel
14
16. Acknowledgements#
Thanks to many talented individuals at Groupon I am privileged to work with!#
• Data Science#
• Engineering#
• Marketing / Market Research#
Controlled
Experimenta=on
Panel
16