SlideShare a Scribd company logo
© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Roy Ben-Alta, Business Development Manager, AWS
Rick McFarland, VP of Data Services, Hearst
October 2015
The Life of a Click
How Hearst Publishing Manages
Clickstream Analytics with AWS
What to Expect from the Session
• Common patterns for clickstream analytics
• Tips on using Amazon Kinesis and
Amazon EMR for clickstream processing
• Hearst’s big data journey in building the Hearst analytics
stack for clickstream
• Lesson learned
• Q&A
Clickstream Analytics = Business Value
Accelerated Ingest-
Transform-Load to final
Continual Metrics/
KPI Extraction
Actionable Insights
Ad Tech/
Marketing Analytics
Advertising data aggregation Advertising metrics like coverage,
yield, conversion, scoring
User activity engagement
analytics, optimized bid/ buy
Consumer Online/
Online customer engagement data
Consumer/ app engagement
metrics like page views, CTR
Customer clickstream analytics,
recommendation engines
Financial Services
Digital assets
Improve customer experience on
bank website
Financial market data metrics Fraud monitoring, and value-at-
risk assessment, auditing of
market order data
IoT / Sensor Data Fitness device , vehicle sensor,
telemetry data ingestion
Wearable sensor operational
metrics, and dashboards
Devices / sensor operational
DataXu Records
68.198.92 - - [22/Dec/2013:23:08:37 -0400] "GET
/ HTTP/1.1" 200 6394
"-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-" - - [22/Dec/2013:23:08:38 -0400] "GET
/images/logo.gif HTTP/1.1" 200 807
"" "Mozilla/4.0 (compatible; MSIE 6...)" "-" - - [22/Dec/2013:23:32:14 -0400] "GET
Circus","icctm_ht_dtpub":"2011-04-05","icctm_ht_stnm":"SEATTLE POST-
games","bu":"HNP","brand":"SEATTLE POST-
Clickstream Record
Number of fields is not fixed
Tags names change
Multiple pages/sites
Format can be defined as we
store the data
Clickstream Analytics Is the New “Hello World”
Hello World Word count Clickstream
Clickstream Analytics – Common Patterns
Flume HDFS Hive Batch high latency on retrieve
Hive &
Batch low latency on
More options: Batch with lower
latency when retrieve
enabled app
Amazon S3 Amazon
Amazon S3
Amazon Redshift
It’s All About the Pace, About the Pace…
Big data
Hourly server logs:
were your systems misbehaving 1hr ago
Weekly / monthly bill:
what you spent this billing cycle
Daily customer-preferences report from your web
site’s click stream:
what deal or ad to try next time
Daily fraud reports:
was there fraud yesterday
Real-time big data
•Amazon CloudWatch metrics:
what went wrong now
•Real-time spending alerts/caps:
prevent overspending now
•Real-time analysis:
what to offer the current customer now
•Real-time detection:
block fraudulent use now
Clickstream Storage and Processing with
Amazon Kinesis
Amazon Kinesis
App N
Live dashboard
App 1
Aggregate and ingest
data to S3
App 2
Aggregate and
ingest data to
Amazon Redshift
Data lake
Amazon Redshift
App 3
Machine learning
Shard 1
Shard 2
Shard N
Managed, elastic Hadoop (1.x & 2.x) cluster
Integrates with Amazon S3, Amazon DynamoDB, and
Amazon Redshift
Install Storm, Spark, Hive, Pig, Impala, and end user
tools automatically
Support for Spot instances
Integrated HBase NOSQL database
Amazon EMR with Apache Spark
Apache Spark
Spot Integration with Amazon EMR
aws emr create-cluster --name "Spot cluster" --ami-version 3.3
Spot Integration with Amazon EMR
10 node cluster running for 14 hours
Cost = 1.0 * 10 * 14 = $140
Resize Nodes with Spot Instances
Add 10 more nodes on Spot
Resize Nodes with Spot Instances
20 node cluster running for 7 hours
Cost = 1.0 * 10 * 7 = $70
= 0.5 * 10 * 7 = $35
Total $105
Resize Nodes with Spot Instances
50 % less run-time ( 14  7)
25% less cost (140  105)
Amazon EMR and Amazon Kinesis for Batch and Interactive Processing
• Streaming log analysis
• Interactive ETL
Amazon Kinesis Amazon EMR
Amazon Redshift
Amazon S3
Data scientist
Amazon EMR for data scientists
using Spot instances
Amazon Beanstalk - App to push data into Amazon Kinesis
• Amazon software license linking – Add ASL dependency
to SBT/MAVEN project (artifactId = spark-streaming-
• Shards - Include head-room to catching up with data in
• Tracking Amazon Kinesis application state (DynamoDB)
• Kinesis-Application:DynamoDB-table (1:1)
• Created automatically
• Make sure application name doesn’t conflict with existing DynamoDB tables.
• Adjust DynamoDB provision throughput if necessary (default 10 reads per
sec & 10 writes per second)
Amazon Kinesis Applications – Tips
Spark on Amazon EMR - Tips
• Amazon EMR applications after version 3.8.0 (no need
to run bootstrap actions)
• Use Spot instances for time & cost saving especially
when using Spark
• Run in Yarn cluster mode (--master yarn-cluster) for
production jobs – Spark driver runs in application master
(high availability)
• Data serialization – use Kryo if possible to boost
The Life of a Click at Hearst
• Hearst’s journey with their big data analytics platform on
• Demo
• Clickstream analysis patterns
• Lessons learned
Have you heard of Hearst?
operates more than 20 business-to-businesses with significant holdings in the
automotive, electronic, medical and finance industries
publishes 20 U.S. titles and close to 300 international editions
comprises 31 television and two radio stations
owns 15 daily and 34 weekly newspapers
Hearst includes over 200 businesses in over
100 countries around the world
Data Services at Hearst – Our Mission
• Ensure that Hearst leverages its combined data
• Unify Hearst’s data streams
• Development of Big Data Analytics Platform using AWS
• Promote enterprise-wide product development
• Example: product initiative led by all of Hearst’s editors
– Buzzing@Hearst
Business Value of Buzzing
• Instant feedback on articles from our audiences
• Incremental re-syndication of popular articles across
properties (e.g. trending newspaper articles can be
adopted by magazines)
• Inform the editors to write articles that are more
relevant to our audiences and what channels are our
audiences leveraging to read our articles
• Ultimately, drive incremental value
• 25% more page views, 15% more visitors which
lead to incremental revenue
• Throughput goal: transport data from all 250+ Hearst
properties worldwide
• Latency goal: click-to-tool in under 5 minutes
• Agile: easily add new data fields into clickstream
• Unique metrics requirements defined by Data Science team
(e.g., standard deviations, regressions, etc.)
• Data reporting windows ranging from 1 hour to 1 week
• Front-end developed “from scratch” so data exposed through
API must support development team’s unique requirements
Most importantly, operation of existing sites cannot be
Engineering Requirements of Buzzing…
What we had to work with…
a ”static” clickstream collection process on many Hearst sites
Users to
data center
Once per day
…now how do we get there?
Used for ad hoc
reporting and
~30 GB per day
containing basic web
log data (e.g., referrer,
url, user agent, cookie,
…and we own Hearst’s tag management system
Users to
This not only gave us access
to the clickstream but also
the JavaScript code that
lives on our websites
JavaScript on
web pages
Phase 1 – Ingest Clickstream Data Using AWS
Node.JS App-
Kinesis S3 App –
KCL Libraries
Users to
“Raw JSON”
Raw data
Use tag manager to easily deploy JavaScript to all sites
Kinesis Client
Libraries and
persist data to
Amazon S3
for durability
ElasticBeanstalk with
Node.JS exposes an
HTTP endpoint which
asynchronously takes
the data and feeds to
Amazon Kinesis
JavaScript on
sites that call
an exposed
endpoint and
pass in query
Node.JS – Push clickstream to Amazon Kinesis
function pushToKinesis(data) {
var params = {
Data: data, /* required */
PartitionKey: guid(),
StreamName: streamName /* required */
kinesis.putRecord(params, function(err, data) {
if (err) {
console.log(err, err.stack); // an error occurred
app.get('/hearstkin.gif', function(req, res){
var queryData = url.parse(req.url, true).query;
queryData.proxyts = new Date().getTime().toString();
res.writeHead(200,{'Content-Type': 'text/plain', 'Access-
Control-Allow-Origin': '*'});
res.end(imageGIF, 'binary');
http.createServer(app).listen(app.get('port'), function(){
console.log('Express server listening on port ' +
Asynchronous calls –
ensures no user experience
Server timestamp – to
create a unified timestamp.
Amazon Kinesis now offers
this out-of-the box!
JSON format – this
helps us downstream
Kinesis Partition Key – guid() is a
good partition key to ensure even
distribution across the shards
Ingest Monitoring - AWS
Amazon Kinesis Monitoring
AWS Elastic Beanstalk Monitoring
Auto Scaling triggered by
network in > 20MB. Then scale
up to 40 instances.
Phase 1- Summary
• Use JSON formatting for payloads so more fields can be easily
added without impacting downstream processing
• HTTP call requires minimal code introduced to the actual site
• Flexible to meet rollout and growing demand
• Elastic Beanstalk can be scaled
• Amazon Kinesis stream can be re-sharded
• Amazon S3 provides high durability storage for raw data
• Once a reliable, scalable onboarding platform is in place,
we can now focus on ETL!
Phase 2a- Data Processing First Version (EMR)
ETL on
Amazon EMR
Raw Data Clean Aggregate
• Amazon EMR was
chosen initially for
processing due to ease
of Amazon EMR creation
… and Pig because we
knew how to code in
• 50+ UDFs were written
using Python…also
because we knew Python
Unfortunately, Pig was not performing well – 15 min latency
Processing Clickstream Data with Pig
set output.compression.enabled true;
set output.compression.codec;
REGISTER '/home/hadoop/PROD/' USING jython as pyudf;
REGISTER '/home/hadoop/PROD/' USING jython AS refs;
AA0 = load 's3://BUCKET_NAME/rawjson/datadata.tsv.gz' using TextLoader as
A0 = FILTER AA0 BY ( pyudf.get_obj(line,'url') MATCHES '.*(ad+|gd+).*';
( pyudf.urlclean(pyudf.get_obj(line,'url')) as url:chararray,
pyudf.get_obj(line,'hash') as hash:chararray,
pyudf.get_obj(line,'icxid') as icxid:chararray,
pyudf.pubclean(pyudf.get_obj(line,'icctm_ht_dtpub')) as pubdt:chararray,
pyudf.get_obj(line,'icctm_ht_cnocl') as cnocl:chararray,
pyudf.get_obj(line,'icctm_ht_athr') as author:chararray,
pyudf.get_obj(line,'icctm_ht_attl') as title:chararray,
pyudf.get_obj(line,'icctm_ht_aid') as cms_id:chararray,
pyudf.num1(pyudf.get_obj(line,'mxdpth')) as mxdpth:double,
pyudf.num2(pyudf.get_obj(line,'load')) as topsecs:double,
refs.classy(pyudf.get_obj(line,'url'),1) as bu:chararray,
pyudf.get_obj(line,'ip_address') as ip:chararray,
pyudf.get_obj(line,'img') as img:chararray
Gzip your
Regex in
Python imports
limited to what is
allowed by
Phase 2b- Data Processing (SparkStreaming)
Clean Aggregate Data
Node.JS App-
ProxyUsers to
• Welcome Apache Spark– one framework for
batch and realtime
• Benefits – using same code for batch and
real time ETL
• Use Spot instances – cost savings
• Drawbacks – Scala!
Amazon Kinesis
Using SQL with Scala
Since we
knew SQL,
we wrote
Scala with
SQL Query
endpointUrl =
streamName= hearststream
outputLoc.json.streaming =
window.length = 300
sliding.interval = 300
outputLimit = 5000000
query1= SELECT 
simplestartq(proxyts, 5) as startq,
urlclean(url) as url,
pubclean(icctm_ht_dtpub) as pubdt,
classy(url,1) as bu,
ip_address as ip,
artcheck(classy(url,1),url) as artcheck,
ref_type(ref,url) as ref_type,
FROM hearst1
val jsonRDD = sqlContext.jsonRDD(rdd1)
val query1Result = sqlContext.sql(query1)//.limit(outputLimit.toInt)
val query2Result = sqlContext.sql(query2)
val query3Result = sqlContext.sql(query3).limit(outputLimit.toInt)
val outPartitionFolder = UDFUtils.output60WithRolling(slidingInterval.toInt)
outPartitionFolder), classOf[])"New JSON file written to "+outputLoc+"/"+outPartitionFolder)
Python UDF versus Scala
def artcheck(bu,url):
if url and bu:
cleanurl = url[0:url.find("?")].strip('/')
tailurl = url[findnth(url, '/', 3)+1:url.find("?")].strip('/')
if (bu=='HMI' or bu=='HMG') and re.compile('ad+|gd+').search(tailurl)!=None : return 'T'
elif bu=='HTV' and root.isdigit()==True and re.compile('/search/').search(cleanurl)==None: return 'T'
elif bu=='HNP' and re.compile('blog|fuelfix').search(url)!=None and re.compile(r'S*[0-9]{4,4}/[0-9]{2,2}/[0-9]{2,2}S*').search(tailurl)!=None : return 'T'
elif bu=='HNP' and re.compile('businessinsider').search(url)!=None and re.compile(r'S*[0-9]{4,4}-[0-9]{2,2}').search(root)!=None : return 'T'
elif bu=='HNP' and re.compile('blog|fuelfix|businessinsider').search(url)==None and re.compile('.php').search(url)!=None : return 'T'
else : return 'F'
else : return 'F'
return 'F'
def artcheck(bu:String, url: String )={
val cleanurl = UDFUtils.utilurlclean(url.trim).stripSuffix("/")
val pathClean = UDFUtils.pathURI(cleanurl)
val lastContext = pathClean.split("/").last
var resp = "F"
if(("HMI"==bu||"HMG"==bu)&&Pattern.compile("/ad+|/gd+").matcher(pathClean).find()) resp="T"
else if("HTV"==bu && StringUtils.isNumeric(lastContext) && !cleanurl.contains("/search/")) resp="T"
else if("HNP"==bu && Pattern.compile("blog|fuelfix").matcher(url).find() && Pattern.compile("d{4}/d{2}/d{2}").matcher(pathClean).find()) resp="T"
else if("HNP"==bu && Pattern.compile("businessinsider").matcher(url).find() && Pattern.compile("d{4}-d{2}").matcher(lastContext).find()) resp="T"
else if("HNP"==bu && !Pattern.compile("blog|fuelfix|businessinsider").matcher(url).find()&& Pattern.compile(".php").matcher(url).find()) resp="T"
case e:Exception => "F"
Don’t be intimidated by Scala…if
you know Python, the syntax can
be similar
Try: Except:
Try{} Catch{}
Phase 3a- Data Science!
Data Science on EC2
Clean Aggregate Data API-ready Data
Amazon Kinesis
• We decided to perform our
Data Science using SAS
on Amazon EC2 initially
because of the ability to
perform both data
manipulation and easily
run complex data science
techniques (e.g.,
• Great for exploration and
initial development
• Performing data science
using this method took
3-5 minutes to complete
SAS Code Example
data _null_;
call system("aws s3 cp s3://BUCKET_NAME/file.gz
FILENAME IN pipe "gzip -dc /home/ec2-user/LOGFILES/file.gz" lrecl=32767;
data temp1;
infile IN delimiter='09'x MISSOVER DSD lrecl=32767 firstobs=1;
startq :YMDDTTM.
url :$1000.
pageviews :best32.
visits :best32.
author :$100.
cms_id :$100.
img :$1000.
title :$1000.;
Use pipe to
read in S3
data and
keep it
proc sql;
url FORMAT=$1000.,
SUM(pageviews) as pageviews,
SUM(visits) as visits,
SUM(fvisits) as fvisits,
SUM(evisits) as evisits,
MIN(ttct) as rec,
COUNT(distinct startq) as frq,
AVG(visits) as avg_visits_pp,
SUM(visits1) as visits_soc,
SUM(visits2) as visits_dir,
SUM(visits3) as visits_int,
SUM(visits4) as visits_sea,
SUM(visits5) as visits_web,
SUM(visits6) as visits_nws,
SUM(visits7) as visits_pd,
SUM(visits8) as visits_soc_fb,
SUM(visits9) as visits_soc_tw,
SUM(visits10) as visits_soc_pi,
SUM(visits11) as visits_soc_re,
SUM(visits12) as visits_soc_yt,
SUM(visits13) as visits_soc_su,
SUM(visits14) as visits_soc_gp,
SUM(visits15) as visits_soc_li,
SUM(visits16) as visits_soc_tb,
SUM(visits17) as visits_soc_ot,
CASE WHEN (SUM(v1) - SUM(v3) ) > 20 THEN ( SUM(v1) - SUM(v3) ) / 2 ELSE 0 END as trending
FROM temp1
Use PROC SQL when
possible for easier
translation to Amazon
Redshift for production
later on.
Phase 3b- Split Data Science into Development and Production
Clean Aggregate
API-ready Data
Data Science
Amazon Redshift
• Once Data Science
models were established,
we split the modeling and
• Production was moved to
Amazon Redshift which
provided much faster
ability to read Amazon S3
data and process the data
• Data Science processing
time went down to 100
Use S3 to store
data science
models and
apply them
using Amazon
Data Science
on EC2
Statistical Models
run once per day
Agg Data
clean_url as url,
trim(substring(max(proxyts||domain) from 20 for 1000)) as domain,
trim(substring(max(proxyts||clean_cnocl) from 20 for 1000)) as cnocl,
trim(substring(max(proxyts||img) from 20 for 1000)) as img,
trim(substring(max(proxyts||title) from 20 for 1000)) as title,
trim(substring(max(proxyts||section) from 20 for 1000)) as section,
approximate count(distinct ic_fpc) as visits,
count(1) as hits
from kinesis_hits
where bu='HMG' and (article_id is not null or author is not null or title is
not null)
group by 1;
Amazon Redshift Code Example
Cool trick to find the most
recent value of a
character field in one
pass through the data
Phase 4a- Elasticsearch Integration
Amazon EMR
Buzzing API
S3 Storage
Data Science
Amazon Redshift
Since we had the Amazon EMR cluster running already, we used a handy Pig jar
that made it easy to push data to Elasticsearch.
S3 Storage
Agg Data API Ready Data
REGISTER /home/hadoop/pig/lib/piggybank.jar;
REGISTER /home/hadoop/PROD/elasticsearch-hadoop-2.0.2.jar;
DEFINE EsStorageDEV org.elasticsearch.hadoop.pig.EsStorage
('es.nodes =',
'es.port = 9200',
'es.http.timeout = 5m',
' = true');
SECTIONS = load 's3://hearstkinesisdata/ss.tsv' USING PigStorage('t') as
STORE SECTIONS INTO 'content-sections-sync/content-sections-sync' USING
Pig Code – Push to ES Example
Use handy Pig jar to push data to Elasticsearch
The “Amazon EMR overhead” required to read small files added 2 min to latency
Phase 4b- Elasticsearch Integration Sped Up
Buzzing API
S3 Storage
Data Science
Amazon Redshift
Since the Amazon
Redshift code was
run in a Python
wrapper, solution
was to push data
directly into
Agg Data
# Converting file into bulk-insert compatible format
$bin/convert_json.php big.json create rowbyrow.json
# Get mapping file
${aws} s3 cp S3://hearst/es_mapping es_mapping
# Creating new ES index
$(curl -XPUT --data-binary es_mapping -s)
# Performing bulk API call
$(curl -XPOST --data-binary rowbyrow.json
-s) ""
Script to Push to Elasticsearch Directly
Converting one big input JSON
file to a row-by-row JSON is a
key step for making the data
batch compatible
Use a mapping file to manage
the formatting in your index…
very important for dates and
numeric values that look like
Final Data Pipeline
Buzzing API
S3 Storage
App- Proxy
Users to
Data Science
Amazon Redshift
100 seconds
30 seconds
5 seconds
Agg Data
Data Science
Amazon Redshift
A more “visual” representation of our pipeline!
Clickstream dataAmazon
Version Transport
E Analysis
E Exposure Latency
Kinesis S3 EMR-Pig S3 EC2-SAS S3
EMR to
ElasticSearch 1 hour
Kinesis Spark-Scala S3
Redshift ElasticSearch <5 min
Kinesis PySpark + SparkR ElasticSearch <2 min
Lessons learned
“No Duh’s” Removing “stoppage” points, speed up processing, and
combine processes improve latency.
Data Science Tool Box
Buzzing API
S3 Storage
App- Proxy
Users to
Data Science
Amazon Redshift
Agg Data
• IPython Notebook
• On Spark and Amazon Redshift
• Code sharing (and insights)
• User-friendly development
environment for data scientists
• Auto-convert .pynb  .py Data
Amazon Redshift
Data Science at Hearst – Notebook
Next Steps
• Amazon EMR 4.1.0 with Spark 1.5 released and we can do
more with pyspark, look at Apache Zeppelin on Amazon EMR
• Amazon Kinesis just release a new feature to retain data up
to 7 days - We could do more ETL “in the stream”
• Amazon Kinesis Firehose and Lambda – Zero touch (no
Amazon EC2 maintenance)
• More complex data science that requires…
• Amazon Redshift UDFs
• Python shell that calls Amazon Redshift but also allows for
complex statistical methods (e.g., using R or machine learning)
• Clickstreams are the new “data currency” of business
• AWS provides great technology to process data
• High speed
• Lower costs – Using Spot…
• Very agile
• Do more with less: this can all be done with a team
of 2 FTEs!
• 1 developer (well versed in AWS) + 1 data scientist
Ingest Store Process Analyze
Click Insight
Call To Action
Amazon S3
Amazon Kinesis
Amazon DynamoDB
Amazon RDS (Aurora)
AWS Lambda
KCL Apps
Use Amazon Kinesis, EMR and Amazon Redshift for
Open source connectors:
AWS Big Data blog
AWS re:Invent Big Data booth
AWS Big Data Marketplace and Partner ecosystem
Hearst Booth – Hall C1156: Learn more about the
interesting things we are doing with data!
Call To Action
Remember to complete
your evaluations!
Thank you!

More Related Content

What's hot

AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
Amazon Web Services
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Amazon Web Services
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
Amazon Web Services
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Amazon Web Services
Big data on aws
Big data on awsBig data on aws
Big data on aws
Serkan Özal
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
Amazon Web Services
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
Amazon Web Services
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Amazon Web Services
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
Amazon Web Services
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
Amazon Web Services
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012
Amazon Web Services
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution Showcase
Amazon Web Services
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
Amazon Web Services
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
Amazon Web Services
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Amazon Web Services
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Amazon Web Services
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
Amazon Web Services
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
Amazon Web Services
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
Amazon Web Services
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS
Amazon Web Services

What's hot (20)

AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
 AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
AWS Cloud Kata 2014 | Jakarta - 2-3 Big Data
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
Easy Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Mac...
AWS Analytics
AWS AnalyticsAWS Analytics
AWS Analytics
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Introducing Amazon Kinesis: Real-time Processing of Streaming Big Data (BDT10...
Big data on aws
Big data on awsBig data on aws
Big data on aws
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS Webcast - Managing Big Data in the AWS Cloud_20140924
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
AWS re:Invent 2016: Streaming ETL for RDS and DynamoDB (DAT315)
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012BDT201 AWS Data Pipeline - AWS re: Invent 2012
BDT201 AWS Data Pipeline - AWS re: Invent 2012
AWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution ShowcaseAWS Webcast - Tableau Big Data Solution Showcase
AWS Webcast - Tableau Big Data Solution Showcase
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
AWS Storage and Database Architecture Best Practices (DAT203) | AWS re:Invent...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Real-time Streaming and Querying with Amazon Kinesis and Amazon Elastic MapRe...
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftData warehousing in the era of Big Data: Deep Dive into Amazon Redshift
Data warehousing in the era of Big Data: Deep Dive into Amazon Redshift
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOTAWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS APAC Webinar Week - Big Data on AWS. RedShift, EMR, & IOT
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
AWS re:Invent 2016: Visualizing Big Data Insights with Amazon QuickSight (BDM...
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
(SDD405) Amazon Kinesis Deep Dive | AWS re:Invent 2014
Structured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWSStructured, Unstructured and Streaming Big Data on the AWS
Structured, Unstructured and Streaming Big Data on the AWS

Viewers also liked

Clickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customersClickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customers
Albert Hui
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
Amazon Web Services
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
Amazon Web Services
DPACC Acceleration Progress and Demonstration
DPACC Acceleration Progress and DemonstrationDPACC Acceleration Progress and Demonstration
DPACC Acceleration Progress and Demonstration
Analytics & Reporting for Amazon Cloud Logs
Analytics & Reporting for Amazon Cloud LogsAnalytics & Reporting for Amazon Cloud Logs
Analytics & Reporting for Amazon Cloud Logs
World's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management ToolWorld's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management Tool
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
Amazon Web Services
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
Amazon Web Services
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
Danilo Poccia
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKGDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
Nate Wiger
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
Amazon Web Services
AWS_Architecture_e-commerceSEONGTAEK OH
Web log & clickstream
Web log & clickstream Web log & clickstream
Web log & clickstream
Michel Bruley
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
Amazon Web Services
(GAM302) EA's Real-World Hurdles with Millions of Players in the Simpsons: Ta...
(GAM302) EA's Real-World Hurdles with Millions of Players in the Simpsons: Ta...(GAM302) EA's Real-World Hurdles with Millions of Players in the Simpsons: Ta...
(GAM302) EA's Real-World Hurdles with Millions of Players in the Simpsons: Ta...
Amazon Web Services
AWS ML and SparkML on EMR to Build Recommendation Engine
AWS ML and SparkML on EMR to Build Recommendation Engine AWS ML and SparkML on EMR to Build Recommendation Engine
AWS ML and SparkML on EMR to Build Recommendation Engine
Amazon Web Services
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
Amazon Web Services
(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk
(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk
(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk
Amazon Web Services
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
Matthew (정재화)
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
Amazon Web Services

Viewers also liked (20)

Clickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customersClickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customers
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
(BDT320) New! Streaming Data Flows with Amazon Kinesis Firehose
DPACC Acceleration Progress and Demonstration
DPACC Acceleration Progress and DemonstrationDPACC Acceleration Progress and Demonstration
DPACC Acceleration Progress and Demonstration
Analytics & Reporting for Amazon Cloud Logs
Analytics & Reporting for Amazon Cloud LogsAnalytics & Reporting for Amazon Cloud Logs
Analytics & Reporting for Amazon Cloud Logs
World's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management ToolWorld's best AWS Cloud Log Analytics & Management Tool
World's best AWS Cloud Log Analytics & Management Tool
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
AWS re:Invent 2016| GAM301 | How EA Leveraged Amazon Redshift and AWS Partner...
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
(MBL303) Get Deeper Insights Using Amazon Mobile Analytics | AWS re:Invent 2014
Data Analytics on AWS
Data Analytics on AWSData Analytics on AWS
Data Analytics on AWS
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDKGDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
GDC 2015 - Game Analytics with AWS Redshift, Kinesis, and the Mobile SDK
Big Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace WebinarBig Data Analytics with AWS and AWS Marketplace Webinar
Big Data Analytics with AWS and AWS Marketplace Webinar
Web log & clickstream
Web log & clickstream Web log & clickstream
Web log & clickstream
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(BDT306) Mission-Critical Stream Processing with Amazon EMR and Amazon Kinesi...
(GAM302) EA's Real-World Hurdles with Millions of Players in the Simpsons: Ta...
(GAM302) EA's Real-World Hurdles with Millions of Players in the Simpsons: Ta...(GAM302) EA's Real-World Hurdles with Millions of Players in the Simpsons: Ta...
(GAM302) EA's Real-World Hurdles with Millions of Players in the Simpsons: Ta...
AWS ML and SparkML on EMR to Build Recommendation Engine
AWS ML and SparkML on EMR to Build Recommendation Engine AWS ML and SparkML on EMR to Build Recommendation Engine
AWS ML and SparkML on EMR to Build Recommendation Engine
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS CloudAWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
AWS Data Transfer Services: Data Ingest Strategies Into the AWS Cloud
(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk
(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk
(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
스타트업 사례로 본 로그 데이터 분석 : Tajo on AWS
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014(WEB301) Operational Web Log Analysis | AWS re:Invent 2014
(WEB301) Operational Web Log Analysis | AWS re:Invent 2014

Similar to (BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS

Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
Amazon Web Services
Building a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWSBuilding a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWS
Injae Kwak
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Amazon Web Services
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
Amazon Web Services
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Amazon Web Services
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
Amazon Web Services
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
Amazon Web Services
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
Amazon Web Services
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
Amazon Web Services
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realPath to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Amazon Web Services LATAM
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Web Services
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
Amazon Web Services
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon Kinesis
Amazon Web Services
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
Amazon Web Services
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Web Services
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Amazon Web Services
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Amazon Web Services
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
Amazon Web Services
The Cloud - What's different
The Cloud - What's differentThe Cloud - What's different
The Cloud - What's different
Chen-Tien Tsai

Similar to (BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS (20)

Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
Building a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWSBuilding a Real-Time Data Platform on AWS
Building a Real-Time Data Platform on AWS
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Ask an Amazon Redshift Customer Anything (ANT389) - AWS re:Invent 2018
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
Real-Time Web Analytics with Amazon Kinesis Data Analytics (ADT401) - AWS re:...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo realPath to the future #4 - Ingestão, processamento e análise de dados em tempo real
Path to the future #4 - Ingestão, processamento e análise de dados em tempo real
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon Kinesis
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Social Media Analytics with Amazon QuickSight (ANT370) - AWS re:Invent 2018
Analyzing Streams
Analyzing StreamsAnalyzing Streams
Analyzing Streams
The Cloud - What's different
The Cloud - What's differentThe Cloud - What's different
The Cloud - What's different

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service

Recently uploaded

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School

Recently uploaded (20)

Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...

(BDT306) How Hearst Publishing Manages Clickstream Analytics with AWS

  • 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Roy Ben-Alta, Business Development Manager, AWS Rick McFarland, VP of Data Services, Hearst October 2015 BDT306 The Life of a Click How Hearst Publishing Manages Clickstream Analytics with AWS
  • 2. What to Expect from the Session • Common patterns for clickstream analytics • Tips on using Amazon Kinesis and Amazon EMR for clickstream processing • Hearst’s big data journey in building the Hearst analytics stack for clickstream • Lesson learned • Q&A
  • 3. Clickstream Analytics = Business Value Verticals/Use Cases Accelerated Ingest- Transform-Load to final destination Continual Metrics/ KPI Extraction Actionable Insights Ad Tech/ Marketing Analytics Advertising data aggregation Advertising metrics like coverage, yield, conversion, scoring webpages User activity engagement analytics, optimized bid/ buy engines Consumer Online/ Gaming Online customer engagement data aggregation Consumer/ app engagement metrics like page views, CTR Customer clickstream analytics, recommendation engines Financial Services Digital assets Improve customer experience on bank website Financial market data metrics Fraud monitoring, and value-at- risk assessment, auditing of market order data IoT / Sensor Data Fitness device , vehicle sensor, telemetry data ingestion Wearable sensor operational metrics, and dashboards Devices / sensor operational intelligence
  • 4. DataXu Records 68.198.92 - - [22/Dec/2013:23:08:37 -0400] "GET / HTTP/1.1" 200 6394 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1...)" "-" - - [22/Dec/2013:23:08:38 -0400] "GET /images/logo.gif HTTP/1.1" 200 807 "" "Mozilla/4.0 (compatible; MSIE 6...)" "-" - - [22/Dec/2013:23:32:14 -0400] "GET APACHE ACCESS LOG {"cId":"10049","cdid":"5961","campID":"8","loc":"b","ip_address":" ","icctm_ht_athr":"","icctm_ht_aid":"","icctm_ht_attl":"Family Circus","icctm_ht_dtpub":"2011-04-05","icctm_ht_stnm":"SEATTLE POST- INTELLIGENCER","icctm_ht_cnocl":" games/fun/Family_Circus","ts":"1422839422426","url":" m/comics-and- games/fun/Family_Circus","hash":"d98ace5874334232f6db3e1c0f8be3ab","load" :"5.096","ref":" games","bu":"HNP","brand":"SEATTLE POST- INTELLIGENCER","ref_type":"SAMESITE","ref_subtype":"SAMESITE","ua":"deskto p:chrome"} JSON Clickstream Record Number of fields is not fixed Tags names change Multiple pages/sites Format can be defined as we store the data AVRO, CSV, TSV, JSON
  • 5. Clickstream Analytics Is the New “Hello World” Hello World Word count Clickstream
  • 6. Clickstream Analytics – Common Patterns Flume HDFS Hive Batch high latency on retrieve SQLFlume HDFS Hive & Pig Batch low latency on retrieve Flume Sqoop HDFS Impala SparkSql Presto Other More options: Batch with lower latency when retrieve
  • 7. users Amazon Kinesis Kinesis- enabled app Amazon S3 Amazon EMR Web Servers Amazon S3 Amazon Redshift
  • 8. It’s All About the Pace, About the Pace… Big data Hourly server logs: were your systems misbehaving 1hr ago Weekly / monthly bill: what you spent this billing cycle Daily customer-preferences report from your web site’s click stream: what deal or ad to try next time Daily fraud reports: was there fraud yesterday Real-time big data •Amazon CloudWatch metrics: what went wrong now •Real-time spending alerts/caps: prevent overspending now •Real-time analysis: what to offer the current customer now •Real-time detection: block fraudulent use now
  • 9. Clickstream Storage and Processing with Amazon Kinesis Amazon Kinesis App N Live dashboard AWSendpoint App 1 Aggregate and ingest data to S3 App 2 Aggregate and ingest data to Amazon Redshift Data lake Amazon Redshift App 3 ETL/ELT Machine learning Availability Zone Shard 1 Shard 2 Shard N Availability Zone Availability Zone EMR DynamoDB
  • 10. Amazon EMR Managed, elastic Hadoop (1.x & 2.x) cluster Integrates with Amazon S3, Amazon DynamoDB, and Amazon Redshift Install Storm, Spark, Hive, Pig, Impala, and end user tools automatically Support for Spot instances Integrated HBase NOSQL database Amazon EMR with Apache Spark Apache Spark Spark SQL Spark Streaming Mllib GraphX
  • 11. Spot Integration with Amazon EMR aws emr create-cluster --name "Spot cluster" --ami-version 3.3 InstanceGroupType=MASTER, InstanceType=m3.xlarge,InstanceCount=1, InstanceGroupType=CORE, BidPrice=0.03,InstanceType=m3.xlarge,InstanceCount=2 InstanceGroupType=TASK, BidPrice=0.10,InstanceType=m3.xlarge,InstanceCount=3
  • 12. Spot Integration with Amazon EMR 10 node cluster running for 14 hours Cost = 1.0 * 10 * 14 = $140
  • 13. Resize Nodes with Spot Instances Add 10 more nodes on Spot
  • 14. Resize Nodes with Spot Instances 20 node cluster running for 7 hours Cost = 1.0 * 10 * 7 = $70 = 0.5 * 10 * 7 = $35 Total $105
  • 15. Resize Nodes with Spot Instances 50 % less run-time ( 14  7) 25% less cost (140  105)
  • 16. Amazon EMR and Amazon Kinesis for Batch and Interactive Processing • Streaming log analysis • Interactive ETL Amazon Kinesis Amazon EMR Amazon Redshift Amazon S3 Data scientist Amazon EMR for data scientists using Spot instances BI
  • 17. Amazon Beanstalk - App to push data into Amazon Kinesis
  • 18. • Amazon software license linking – Add ASL dependency to SBT/MAVEN project (artifactId = spark-streaming- kinesis-asl_2.10) • Shards - Include head-room to catching up with data in stream • Tracking Amazon Kinesis application state (DynamoDB) • Kinesis-Application:DynamoDB-table (1:1) • Created automatically • Make sure application name doesn’t conflict with existing DynamoDB tables. • Adjust DynamoDB provision throughput if necessary (default 10 reads per sec & 10 writes per second) Amazon Kinesis Applications – Tips
  • 19. Spark on Amazon EMR - Tips • Amazon EMR applications after version 3.8.0 (no need to run bootstrap actions) • Use Spot instances for time & cost saving especially when using Spark • Run in Yarn cluster mode (--master yarn-cluster) for production jobs – Spark driver runs in application master (high availability) • Data serialization – use Kryo if possible to boost performance (spark.serializer=org.apache.spark.serializer.KryoSerializer)
  • 20. The Life of a Click at Hearst • Hearst’s journey with their big data analytics platform on AWS • Demo • Clickstream analysis patterns • Lessons learned
  • 21.
  • 22. Have you heard of Hearst?
  • 23. BUSINESS MEDIA operates more than 20 business-to-businesses with significant holdings in the automotive, electronic, medical and finance industries MAGAZINES publishes 20 U.S. titles and close to 300 international editions BROADCASTING comprises 31 television and two radio stations NEWSPAPERS owns 15 daily and 34 weekly newspapers Hearst includes over 200 businesses in over 100 countries around the world
  • 24. Data Services at Hearst – Our Mission • Ensure that Hearst leverages its combined data assets • Unify Hearst’s data streams • Development of Big Data Analytics Platform using AWS services • Promote enterprise-wide product development • Example: product initiative led by all of Hearst’s editors – Buzzing@Hearst
  • 25. 1
  • 26. Business Value of Buzzing • Instant feedback on articles from our audiences • Incremental re-syndication of popular articles across properties (e.g. trending newspaper articles can be adopted by magazines) • Inform the editors to write articles that are more relevant to our audiences and what channels are our audiences leveraging to read our articles • Ultimately, drive incremental value • 25% more page views, 15% more visitors which lead to incremental revenue
  • 27. • Throughput goal: transport data from all 250+ Hearst properties worldwide • Latency goal: click-to-tool in under 5 minutes • Agile: easily add new data fields into clickstream • Unique metrics requirements defined by Data Science team (e.g., standard deviations, regressions, etc.) • Data reporting windows ranging from 1 hour to 1 week • Front-end developed “from scratch” so data exposed through API must support development team’s unique requirements Most importantly, operation of existing sites cannot be affected! Engineering Requirements of Buzzing…
  • 28. What we had to work with… a ”static” clickstream collection process on many Hearst sites Users to Hearst Properties Clickstream corporate data center Netezza Data Warehouse Once per day …now how do we get there? Used for ad hoc SQL-based reporting and analytics ~30 GB per day containing basic web log data (e.g., referrer, url, user agent, cookie, etc.)
  • 29. …and we own Hearst’s tag management system Users to Hearst Properties Clickstream This not only gave us access to the clickstream but also the JavaScript code that lives on our websites JavaScript on web pages
  • 30. Phase 1 – Ingest Clickstream Data Using AWS Amazon Kinesis Node.JS App- Proxy Kinesis S3 App – KCL Libraries Users to Hearst Properties Clickstream “Raw JSON” Raw data Use tag manager to easily deploy JavaScript to all sites Kinesis Client Libraries and Kinesis Connectors persist data to Amazon S3 for durability ElasticBeanstalk with Node.JS exposes an HTTP endpoint which asynchronously takes the data and feeds to Amazon Kinesis Implement JavaScript on sites that call an exposed endpoint and pass in query parameters
  • 31. Node.JS – Push clickstream to Amazon Kinesis function pushToKinesis(data) { var params = { Data: data, /* required */ PartitionKey: guid(), StreamName: streamName /* required */ }; kinesis.putRecord(params, function(err, data) { if (err) { console.log(err, err.stack); // an error occurred } }); } app.get('/hearstkin.gif', function(req, res){ async.series([function(callback){ var queryData = url.parse(req.url, true).query; queryData.proxyts = new Date().getTime().toString(); pushToKinesis(JSON.stringify(queryData)); callback(null); }]); res.writeHead(200,{'Content-Type': 'text/plain', 'Access- Control-Allow-Origin': '*'}); res.end(imageGIF, 'binary'); }); http.createServer(app).listen(app.get('port'), function(){ console.log('Express server listening on port ' + app.get('port')); }); Asynchronous calls – ensures no user experience interruption Server timestamp – to create a unified timestamp. Amazon Kinesis now offers this out-of-the box! JSON format – this helps us downstream Kinesis Partition Key – guid() is a good partition key to ensure even distribution across the shards
  • 32. Ingest Monitoring - AWS Amazon Kinesis Monitoring AWS Elastic Beanstalk Monitoring Auto Scaling triggered by network in > 20MB. Then scale up to 40 instances.
  • 33. Phase 1- Summary • Use JSON formatting for payloads so more fields can be easily added without impacting downstream processing • HTTP call requires minimal code introduced to the actual site implementations • Flexible to meet rollout and growing demand • Elastic Beanstalk can be scaled • Amazon Kinesis stream can be re-sharded • Amazon S3 provides high durability storage for raw data • Once a reliable, scalable onboarding platform is in place, we can now focus on ETL!
  • 34. Phase 2a- Data Processing First Version (EMR) ETL on Amazon EMR “Raw JSON” Raw Data Clean Aggregate Data • Amazon EMR was chosen initially for processing due to ease of Amazon EMR creation … and Pig because we knew how to code in PigLatin • 50+ UDFs were written using Python…also because we knew Python
  • 35. Unfortunately, Pig was not performing well – 15 min latency Processing Clickstream Data with Pig set output.compression.enabled true; set output.compression.codec; REGISTER '/home/hadoop/PROD/' USING jython as pyudf; REGISTER '/home/hadoop/PROD/' USING jython AS refs; AA0 = load 's3://BUCKET_NAME/rawjson/datadata.tsv.gz' using TextLoader as (line:chararray); A0 = FILTER AA0 BY ( pyudf.get_obj(line,'url') MATCHES '.*(ad+|gd+).*'; A1 = FOREACH A0 GENERATE ( pyudf.urlclean(pyudf.get_obj(line,'url')) as url:chararray, pyudf.get_obj(line,'hash') as hash:chararray, pyudf.get_obj(line,'icxid') as icxid:chararray, pyudf.pubclean(pyudf.get_obj(line,'icctm_ht_dtpub')) as pubdt:chararray, pyudf.get_obj(line,'icctm_ht_cnocl') as cnocl:chararray, pyudf.get_obj(line,'icctm_ht_athr') as author:chararray, pyudf.get_obj(line,'icctm_ht_attl') as title:chararray, pyudf.get_obj(line,'icctm_ht_aid') as cms_id:chararray, pyudf.num1(pyudf.get_obj(line,'mxdpth')) as mxdpth:double, pyudf.num2(pyudf.get_obj(line,'load')) as topsecs:double, refs.classy(pyudf.get_obj(line,'url'),1) as bu:chararray, pyudf.get_obj(line,'ip_address') as ip:chararray, pyudf.get_obj(line,'img') as img:chararray ; Gzip your output Regex in Pig! Python imports limited to what is allowed by Jython
  • 36. Phase 2b- Data Processing (SparkStreaming) Clean Aggregate Data Node.JS App- ProxyUsers to Hearst Properties Clickstream • Welcome Apache Spark– one framework for batch and realtime • Benefits – using same code for batch and real time ETL • Use Spot instances – cost savings • Drawbacks – Scala! Amazon Kinesis ETL on EMR
  • 37. Using SQL with Scala SparkSQL Since we knew SQL, we wrote Scala with embedded SQL Query endpointUrl = streamName= hearststream outputLoc.json.streaming = s3://hearstkinesisdata/processedsparkjson window.length = 300 sliding.interval = 300 outputLimit = 5000000 query1Table=hearst1 query1= SELECT simplestartq(proxyts, 5) as startq, urlclean(url) as url, hash, icxid, pubclean(icctm_ht_dtpub) as pubdt, classy(url,1) as bu, ip_address as ip, artcheck(classy(url,1),url) as artcheck, ref_type(ref,url) as ref_type, img, wc, contentSource FROM hearst1 val jsonRDD = sqlContext.jsonRDD(rdd1) jsonRDD.registerTempTable(query1Table.trim) val query1Result = sqlContext.sql(query1)//.limit(outputLimit.toInt) query1Result.registerTempTable(query2Table.trim) val query2Result = sqlContext.sql(query2) query2Result.registerTempTable(query3Table.trim) val query3Result = sqlContext.sql(query3).limit(outputLimit.toInt) val outPartitionFolder = UDFUtils.output60WithRolling(slidingInterval.toInt) query3Result.toJSON.saveAsTextFile("%s/%s".format(outputLocJSON, outPartitionFolder), classOf[])"New JSON file written to "+outputLoc+"/"+outPartitionFolder)
  • 38. Python UDF versus Scala Python def artcheck(bu,url): try: if url and bu: cleanurl = url[0:url.find("?")].strip('/') tailurl = url[findnth(url, '/', 3)+1:url.find("?")].strip('/') revurl=cleanurl[::-1] root=revurl[0:revurl.find('/')][::-1] if (bu=='HMI' or bu=='HMG') and re.compile('ad+|gd+').search(tailurl)!=None : return 'T' elif bu=='HTV' and root.isdigit()==True and re.compile('/search/').search(cleanurl)==None: return 'T' elif bu=='HNP' and re.compile('blog|fuelfix').search(url)!=None and re.compile(r'S*[0-9]{4,4}/[0-9]{2,2}/[0-9]{2,2}S*').search(tailurl)!=None : return 'T' elif bu=='HNP' and re.compile('businessinsider').search(url)!=None and re.compile(r'S*[0-9]{4,4}-[0-9]{2,2}').search(root)!=None : return 'T' elif bu=='HNP' and re.compile('blog|fuelfix|businessinsider').search(url)==None and re.compile('.php').search(url)!=None : return 'T' else : return 'F' else : return 'F' except: return 'F' def artcheck(bu:String, url: String )={ try{ val cleanurl = UDFUtils.utilurlclean(url.trim).stripSuffix("/") val pathClean = UDFUtils.pathURI(cleanurl) val lastContext = pathClean.split("/").last var resp = "F" if(("HMI"==bu||"HMG"==bu)&&Pattern.compile("/ad+|/gd+").matcher(pathClean).find()) resp="T" else if("HTV"==bu && StringUtils.isNumeric(lastContext) && !cleanurl.contains("/search/")) resp="T" else if("HNP"==bu && Pattern.compile("blog|fuelfix").matcher(url).find() && Pattern.compile("d{4}/d{2}/d{2}").matcher(pathClean).find()) resp="T" else if("HNP"==bu && Pattern.compile("businessinsider").matcher(url).find() && Pattern.compile("d{4}-d{2}").matcher(lastContext).find()) resp="T" else if("HNP"==bu && !Pattern.compile("blog|fuelfix|businessinsider").matcher(url).find()&& Pattern.compile(".php").matcher(url).find()) resp="T" resp} } catch{ case e:Exception => "F" } Scala Don’t be intimidated by Scala…if you know Python, the syntax can be similar re.compile('ad+|gd+'). Pattern.compile("ad+|gd+"). Try: Except: Try{} Catch{}
  • 39. Phase 3a- Data Science! Data Science on EC2 Clean Aggregate Data API-ready Data Amazon Kinesis ETL on EMR • We decided to perform our Data Science using SAS on Amazon EC2 initially because of the ability to perform both data manipulation and easily run complex data science techniques (e.g., regressions) • Great for exploration and initial development • Performing data science using this method took 3-5 minutes to complete
  • 40. SAS Code Example data _null_; call system("aws s3 cp s3://BUCKET_NAME/file.gz /home/ec2-user/LOGFILES/file.gz"); run; FILENAME IN pipe "gzip -dc /home/ec2-user/LOGFILES/file.gz" lrecl=32767; data temp1; FORMAT startq DATETIME19.; infile IN delimiter='09'x MISSOVER DSD lrecl=32767 firstobs=1; input startq :YMDDTTM. url :$1000. pageviews :best32. visits :best32. author :$100. cms_id :$100. img :$1000. title :$1000.; run; Use pipe to read in S3 data and keep it compressed proc sql; CREATE TABLE metrics AS SELECT url FORMAT=$1000., SUM(pageviews) as pageviews, SUM(visits) as visits, SUM(fvisits) as fvisits, SUM(evisits) as evisits, MIN(ttct) as rec, COUNT(distinct startq) as frq, AVG(visits) as avg_visits_pp, SUM(visits1) as visits_soc, SUM(visits2) as visits_dir, SUM(visits3) as visits_int, SUM(visits4) as visits_sea, SUM(visits5) as visits_web, SUM(visits6) as visits_nws, SUM(visits7) as visits_pd, SUM(visits8) as visits_soc_fb, SUM(visits9) as visits_soc_tw, SUM(visits10) as visits_soc_pi, SUM(visits11) as visits_soc_re, SUM(visits12) as visits_soc_yt, SUM(visits13) as visits_soc_su, SUM(visits14) as visits_soc_gp, SUM(visits15) as visits_soc_li, SUM(visits16) as visits_soc_tb, SUM(visits17) as visits_soc_ot, CASE WHEN (SUM(v1) - SUM(v3) ) > 20 THEN ( SUM(v1) - SUM(v3) ) / 2 ELSE 0 END as trending FROM temp1 GROUP BY 1; Use PROC SQL when possible for easier translation to Amazon Redshift for production later on.
  • 41. Phase 3b- Split Data Science into Development and Production Amazon Kinesis Clean Aggregate Data API-ready Data Data Science “Production” Amazon Redshift ETL on EMR • Once Data Science models were established, we split the modeling and production • Production was moved to Amazon Redshift which provided much faster ability to read Amazon S3 data and process the data • Data Science processing time went down to 100 seconds! Use S3 to store data science models and apply them using Amazon Redshift Data Science “Development” on EC2 Statistical Models run once per day Models Agg Data
  • 42. select clean_url as url, trim(substring(max(proxyts||domain) from 20 for 1000)) as domain, trim(substring(max(proxyts||clean_cnocl) from 20 for 1000)) as cnocl, trim(substring(max(proxyts||img) from 20 for 1000)) as img, trim(substring(max(proxyts||title) from 20 for 1000)) as title, trim(substring(max(proxyts||section) from 20 for 1000)) as section, approximate count(distinct ic_fpc) as visits, count(1) as hits from kinesis_hits where bu='HMG' and (article_id is not null or author is not null or title is not null) group by 1; Amazon Redshift Code Example Cool trick to find the most recent value of a character field in one pass through the data
  • 43. Phase 4a- Elasticsearch Integration Amazon EMR PUSH Buzzing API S3 Storage Data Science Amazon Redshift ETL on EMR Since we had the Amazon EMR cluster running already, we used a handy Pig jar that made it easy to push data to Elasticsearch. S3 Storage Models Agg Data API Ready Data
  • 44. REGISTER /home/hadoop/pig/lib/piggybank.jar; REGISTER /home/hadoop/PROD/elasticsearch-hadoop-2.0.2.jar; DEFINE EsStorageDEV org.elasticsearch.hadoop.pig.EsStorage ('es.nodes =', 'es.port = 9200', 'es.http.timeout = 5m', ' = true'); SECTIONS = load 's3://hearstkinesisdata/ss.tsv' USING PigStorage('t') as (sectionid:chararray,cnt:long,visits:long,sectionname:chararray); STORE SECTIONS INTO 'content-sections-sync/content-sections-sync' USING EsStoragePROD; Pig Code – Push to ES Example Use handy Pig jar to push data to Elasticsearch The “Amazon EMR overhead” required to read small files added 2 min to latency
  • 45. Phase 4b- Elasticsearch Integration Sped Up Buzzing API S3 Storage API Ready Data Data Science Amazon Redshift ETL on EMR Since the Amazon Redshift code was run in a Python wrapper, solution was to push data directly into Elasticsearch Models Agg Data
  • 46. # Converting file into bulk-insert compatible format $bin/convert_json.php big.json create rowbyrow.json # Get mapping file ${aws} s3 cp S3://hearst/es_mapping es_mapping # Creating new ES index $(curl -XPUT --data-binary es_mapping -s) # Performing bulk API call $(curl -XPOST --data-binary rowbyrow.json -s) "" Script to Push to Elasticsearch Directly Converting one big input JSON file to a row-by-row JSON is a key step for making the data batch compatible Use a mapping file to manage the formatting in your index… very important for dates and numeric values that look like strings
  • 47. Final Data Pipeline Buzzing API API Ready Data Amazon Kinesis S3 Storage Node.JS App- Proxy Users to Hearst Properties Clickstream Data Science Application Amazon Redshift ETL on EMR 100 seconds 1G/day 30 seconds 5GB/day 5 seconds 1G/day Milliseconds 100GB/day LATENCY THROUGHPUT Models Agg Data
  • 48. Data Science Amazon Redshift ETL A more “visual” representation of our pipeline! Clickstream dataAmazon Kinesis Results API
  • 49. Version Transport S T O R A G E ETL S T O R A G E Analysis S T O R A G E Exposure Latency V1 Amazon Kinesis S3 EMR-Pig S3 EC2-SAS S3 EMR to ElasticSearch 1 hour Today Amazon Kinesis Spark-Scala S3 Amazon Redshift ElasticSearch <5 min Tomorrow Amazon Kinesis PySpark + SparkR ElasticSearch <2 min Lessons learned “No Duh’s” Removing “stoppage” points, speed up processing, and combine processes improve latency.
  • 50. Data Science Tool Box Buzzing API API Ready Data Amazon Kinesis S3 Storage Node.JS App- Proxy Users to Hearst Properties Clickstream Data Science Application Amazon Redshift ETL on EMR Models Agg Data • IPython Notebook • On Spark and Amazon Redshift • Code sharing (and insights) • User-friendly development environment for data scientists • Auto-convert .pynb  .py Data Science Toolbox Data Models Amazon Redshift
  • 51. Data Science at Hearst – Notebook
  • 52. Next Steps • Amazon EMR 4.1.0 with Spark 1.5 released and we can do more with pyspark, look at Apache Zeppelin on Amazon EMR • Amazon Kinesis just release a new feature to retain data up to 7 days - We could do more ETL “in the stream” • Amazon Kinesis Firehose and Lambda – Zero touch (no Amazon EC2 maintenance) • More complex data science that requires… • Amazon Redshift UDFs • Python shell that calls Amazon Redshift but also allows for complex statistical methods (e.g., using R or machine learning)
  • 53. Conclusion • Clickstreams are the new “data currency” of business • AWS provides great technology to process data • High speed • Lower costs – Using Spot… • Very agile • Do more with less: this can all be done with a team of 2 FTEs! • 1 developer (well versed in AWS) + 1 data scientist
  • 54. Ingest Store Process Analyze Click Insight Time Call To Action Amazon S3 Amazon Kinesis Amazon DynamoDB Amazon RDS (Aurora) AWS Lambda KCL Apps Amazon EMR Amazon Redshift
  • 55. Use Amazon Kinesis, EMR and Amazon Redshift for Clickstream Open source connectors: • consumers-with-kcl.html AWS Big Data blog - AWS re:Invent Big Data booth AWS Big Data Marketplace and Partner ecosystem Hearst Booth – Hall C1156: Learn more about the interesting things we are doing with data! Call To Action