Event data may the fastest growing form of data most people have never heard of. At its core, event data is simply any data point that has a timestamp, entity, and attributes of an action. By analyzing event data, we can better understand user behavior.
2. What is Event Data
Breaking Down Event Data
What Makes Event Data Different?
Where Does Event Data Come From?
Analysis Perfect For Event Data
Challenges of Event Data
Summary
3
4
6
8
9
10
11
Table of Contents
3. What is Event Data 3Understanding “Event” in Event Data
By definition, event data is data from “Any identifiable occurrence that has
significance for system hardware or software. User-generated events include
keystrokes and mouse clicks, among a wide variety of other possibilities.” Events
describes an action performed by or associated with an entity at a certain time.
Event data is a continuous stream of actions that reveals the patterns of events
people, products, and machines make over time. It helps describe when and how
things happen. Event data is the foundation for behavioral analytics; enabling
understanding of how customers behave and products are used.
Event data is simply any data point that has a timestamp, entity, and attributes
of an action. As simple as that sounds, events are at the heart of many
companies’ business. Clickstreams, logs, data from IoT devices, sensor data, and
more are all event data. A mouse click is an event; it happens at a point in time
and its context includes attributes such as where the entity clicked and what was
clicked.
Analysis of event data is based on key concepts about chronologically ordered
data and its relationship to the world.For example, event data is generated by
an entity who follows a path through a conversion flow, taking action at certain
points along the way. If we examine the events of all entities that went through
the conversion flow, we can understand their behavior and start to answer
questions such as:
• What are the characteristics of entities that converted or dropped off?
• Why did some entities take longer to convert, and why?
• What happened between each step of the conversion flow?
What is Event
Data?
4. Breaking Down Event Data 4Understanding “Event” in Event Data
So what does event data look like? Each piece of event data has three key pieces
of information: a timestamp, one or more entities, and attributes.
• Timestamp: Just like it sounds, it records at what point in time the action took
place.
• Entity: Who took the action. This could be a person, machine, sensor, etc.
• Attributes: These are inherent characteristics that describe what happened,
like a click or a call. The more properties and information captured here, the
richer the data.
Here is a simple example of an event captured on a website in JSON:
{“timestamp”: “2015-06-31T13:50:00-0600”, “id”: “05632”,
“attributes”: { “type”: “click”, “page”: “request_demo”,
“previous_page”: “product_tour”, “session_length”: “1060”,
“browser”: “chrome”, “ip_address”: “10.0.0.1”, “ip_region”:
“united states”, “ip_state”: “california”, “ip_city”: “san
francisco”}}
Lets take this one step further and explore a conversion flow for an e-commerce
site. Lets look at some high level events in the flow:
• Event #1: Shopper D (the entity) follows a link from your advertisement on
a 3rd party website
• Event #2: Views a suggested item on your site using the quick-view feature
• Event #3: Views your sizing guide
• Event #4: Selects the sweater shown in the advertisement
• Event #5: Selects size large
• Event #6: Checks out with a credit card
Each of these events can be represented by a different shaped marker on a
timeline.
Breaking Down
Event Data
5. Breaking Down Event Data 5Understanding “Event” in Event Data
Each event above has several important attributes. Some attributes of Event #1
above are:
• The timestamp: exactly when the shopper clicked through to the site (when)
• The entity (Shopper D)
• The session ID (this is context, or the how: - the event happened within a
defined session)
• The advertisement location (more about how the event happened)
• The item pictured in the ad (another attribute that provides context)
Attributes of event #2 (views a suggested item) include:
• The timestamp: exactly when the shopper viewed the suggested item
(when)
• The entity (again, Shopper D)
• The session ID (how)
• The item viewed (context)
6. What Makes Event Data Different? 6Understanding “Event” in Event Data
Event Data is Attribute-Rich
Event data can have hundreds of attributes that describe each event. Because
we use event data to discover behavior patterns, we want to have the full context
for every event. Every attribute we store is context we can analyze; this makes
event data rich. For “Shopper D” in the example above, we can store attributes
like first and last names, birth date, gender, favorite color, home town, and
preferred payment method. Then we could define a cohort of shoppers who
are over 50 and whose hometown is New York, and follow their behavior over
time. Another reason events can have hundreds of properties is that they may
describe not just one entity, but multiple entities involved in a single event. The
attributes of each entity become part of the event data. For every transaction on
an e-commerce site there may be a supplier, a vendor, a shopper and a 3rd party
payer (credit card company, PayPal), any of whom may participate in a given
event during the transaction.
Event Data is Massive
For most companies, it is their fastest growing type of data. But why is it so big?
Event data captures the actions that an entity takes over time, so for every one
entity, you could have tens of thousands of actions. Imagine a popular wearables
company with hundreds of thousands of devices in the market. Each wearable
device could generate thousands of rows of event data daily, quickly adding up to
billions of events in just a short period of time.
Event Data is Denormalized
In an event data store, data is structured but never normalized. This is unlike
a relational database, in which redundant data is normalized and referenced
from a single location in a single table. Every time a value changes, the previous
value is overwritten and only the last update is available. But, when we analyze
event data, we want to know the state of the world at the moment of the event.
For example, imagine storing data from an anemometer, which measures
windspeed. The meter takes a reading every 30 seconds, and the windspeed
value is automatically updated in the weather database. In this case, we will
always know how fast the wind was blowing in the last 30 seconds, but we will
never know how the windspeed has changed over the last hour. This is why,
in an event data store, data is always appended and never updated. Every
“windspeed” event is stored permanently. For a weather station that measures
not just windspeed but also temperature, humidity, barometric pressure and
precipitation, every attribute is stored for every sensor reading. Only when event
data is denormalized can we use it to find patterns and gain insight into change
over time.
Event Data can be Schemaless:
As mentioned earlier, different types of events and even individual events of
the same type may have different numbers of attributes. In other words, the
data does not necessarily follow a particular schema. Since event data may be
schemaless or adhere loosely to a schema, storing event data does not require
What Makes
Event Data
Different?
7. What Makes Event Data Different? 7Understanding “Event” in Event Data
a declared schema and accepts any number of attributes per event. A time
attribute and an entity attribute are required for each event; any other attributes
can be arbitrary. For example, while a group is running, their activity trackers
could record 5 attributes: distance, stride length, heart rate, and speed. But,
when they start to walk, their activity trackers may only capture two attributes:
heart rate and stride length.
Event Data is Connected by Time:
Event data has a native concept of time and illustrates the connections between
related events in a specified time period. This makes it easy to combine multiple
data streams, because they all have time in common. For example, three
separate data streams from mobile logs, web logs, and purchase history have
time as a common reference and can thus be merged into a single source for
even richer insights.
8. Where Does Event Data Come From? 8Understanding “Event” in Event Data
Event data is everywhere and produced in just about every company today.
Remember, it is produced from the actions and interactions people or machines
have with applications and products such as:
• Websites
• Servers
• Sensors
• Automobiles
• Home/Building Automation
• Wearables
• Smart Appliances
• Connected Electronics
• Call Detail Records
Engineers and developers can capture just about any action or interaction
that is made by an application, product, or machine. It is stored in files such as
clickstreams and logs.
Where Does Event
Data Come From?
9. Analysis Perfect for Event Data 9Understanding “Event” in Event Data
• Root Cause – Examines what precipitates an event and is often used to solve
problems or identify catalysts. Focuses on why an event happened.
• A/B Testing – A form of hypothesis testing with two variants to show how
they are similar or how they differ. Experiment results frequently inform
product direction.
• Growth – Uncovers what and how entities are communicating/interacting
with products and services so that businesses can use this information to
develop ways to foster growth of the business.
• Retention – Reveals how often something is used and how often the entity
returns over time. Often, this is explored by tracking a rate across different
entity groups.
• Conversion – Tracks how an entity(s) moves through a pre-determined path
and locates where along the path the entity takes an action. Typical tools
used in this process are funnels.
• Engagement – Method for looking at how much an entity is using a product
or service. Typical metrics used are average session length, daily/weekly/
monthly active use.
• Churn – Commonly known as attrition, turnover or defection, churn is the
measurement of the likelihood of an entity disengaging. In addition to this
probability, another is the exact point where (in usage flow) and when (in
time) this happens.
Analysis Perfect
for Event Data
10. Challenges of Event Data 10Understanding “Event” in Event Data
Challenges of
Event Data
Most companies struggle with event data because they are using technologies
meant for relational data. Traditional RDBMS (Relational Database Management
Solutions) are based on indexes to make point lookup fast, always trying to
minimize the number of rows that need to be scanned. This works great when
an index matches the workload, but for the most part, scanning indexes is slow.
This is especially prevalent when we consider the massive volumes of event data
that need to be analyzed. This can make query times range from a few hours to
days depending on the complexity and the length of time being scanned.
Remember, with event data, time is a first order principal. You need to be able to
scan all rows within a specific time period. A solution built for event data should
assume massive scanning workloads to make queries efficient.
Additionally, RDBMS are usually queried with SQL or another query language
designed for relational data. Again, these query languages are great for a point
lookup, but struggle when asking questions about events over periods of time.
It almost always requires multiple scans and computations that can make them
slow and inefficient - not to mention the complexity in writing them.
When performing analytics on event data, the query language should
have primitives that turn many-step processes into a single pass to allow
for maximum efficiency. Using a RDBMS to analyze event data brings two
predominant challenges to the business. The first has to do with scale. Event
data is massive in scale and traditional relational databases do not store and
analyze this data efficiently; there should be no disincentive to log as many
events as possible. Instead, businesses sample from the event data potentially
losing valuable attributes and then wait hours or days for results. The second
challenge is that the complexities inherent in query languages often prevent
many business teams like, Product or Marketing, from accessing data to generate
needed insights. Rather, business teams rely on data teams to query event data
often providing incomplete answers because this process not iterative; it is one
question at a time.
Using a RDBMS to store and analyze event data is a little like using a screwdriver
to pound in a nail. You can get it done, but it isn’t the best idea.
11. Summary 11Understanding “Event” in Event Data
Investments in big data technologies are expected to top 60% in 2014. The
question is not whether big data is here, it’s how big will this data get? Much
of this data is event data, growing by the millions daily and overwhelming
businesses.
Interana is a purpose-built solution for event data at scale. The full stack
configuration consists of a highly scalable backend which is combined with a
visual and interactive frontend to deliver comprehensive analytics on event data.
Consequently, Interana scales to trillions of events, while keeping query times to
just seconds.
Questions about conversion, retention, root cause analysis and more across
endless dimensions are a few short clicks away with behavior-based tools such
as cohorts, funnels, and sessions. With event data at the core of the solution,
Interana provides behavioral analytics to help companies unlock the insights
they need to create new opportunities to grow their customer base, deepen
engagement, and maximize retention in their products and services. Redefining
self-service, Interana has done the hard work by eliminating the need to
generate long and complicated queries that take hours to write and even longer
to run. We aim to make data part of everyone’s day.
Summary