Trending Topics

•

1 like•340 views

This document discusses trending topics by geo-location using data from Twitter. It describes the data flow and pipeline, including streaming tweets from the Twitter API to Kafka and processing them with Spark on HDFS for hourly and daily trends. The cluster setup is outlined showing the various components. Challenges around scaling to millions of tweets per day are discussed, requiring upgrades to memory and server sizes. Storm topologies are used to consume from Kafka, write tweets to storage, and aggregate minute-based trends for a live page. Time discrepancies between servers are also noted.

Data & Analytics

Trending Topics
by Geo-Location
Insight Data Engineering
Tigran Antonyan
New York

Demo!
• TrendingTopics.info
‣ Source: Twitter Streaming API
‣ Topic is a #hashtag or a user mention
- Live page video

Data Transformation
Map Reduce Filter MapReduce

Data Flow / Pipeline
Message
Broker
Real-Time
Streaming
DB
No-SQL
Web-based UI
Camus
Batch Processing
〜～4M tweets / day
〜～1.7M with tags
Daily
Hourly
Real Time
Tick Tuples (1m)

Cluster Setup
HDFS Namenode
Camus (cron) /
Kafka Consumer
Spark Master
Kafka Broker
HDFS Datanode #1
Stream Reader /
Kafka Producer
Kafka Broker
Spark
HDFS Datanode #2
Kafka Broker
Spark
HDFS Datanode #2
Kafka Broker
Spark
m4.large
m4.large
Cassandra
Cassandra Cassandra
UI / Flask
t2.micro

Challenges/Issues
Scale
• JVM ran out of memory when Storm
consumed all available tweets from Kafka
• Cassandra crashed a few times
‣ ulimit needed to be increased
‣ JVM heap size needed to be increased
‣ Upgraded from m3.medium to m4.large
• UI server runs out of memory when dealing
with large number of results (100’s of K)

About Me
• Software Engineer
‣ Citrix
‣ UCONN VoTeR Center

Storm Topology
• Spout:
‣ Kafka consumer
• Real Time Bolt:
‣ Write tweet basic information (time, text, etc.)
‣ Write tweet locations (lat., long.) if available
• Tick Tuple Bolt:
‣ Aggregate minute based trends for live page

Time Discrepancies
• 30+ seconds difference between the servers.

What's hot

Apache Kafka® at Dropboxconfluent

Operational challenges behind Serverless architecturesLaurent Bernaille

Big Data Analytics InfrastructureMin Zhou

Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini

Using Kafka to scale database replicationVenu Ryali

Stream processing with Apache Flink @ OfferUpBowen Li

Spark Summit - Mobius C# Binding for Apache Sparkshareddatamsft

Serverless RealityLynn Langit

Netflix at-disney-09-26-2014Monal Daxini

Session 03 data_migration_at_scale_by_sameerAshish Pandey

From Three Nines to Five Nines - A Kafka JourneyAllen (Xiaozhong) Wang

Scaling big with Apache KafkaNikolay Stoitsev

Introduction To Streaming Data and Stream Processing with Apache Kafkaconfluent

Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...HostedbyConfluent

Building a derived data store using KafkaVenu Ryali

Data pipeline with kafkaMole Wong

What We Learned From Building a Modern Messaging and Streaming System for CloudStreamNative

Practical CloudLynn Langit

Apache Kafka - Patterns anti-patternsFlorent Ramiere

Real time dashboards with Kafka and DruidVenu Ryali

What's hot (20)

Apache Kafka® at Dropbox

Operational challenges behind Serverless architectures

Big Data Analytics Infrastructure

Beaming flink to the cloud @ netflix ff 2016-monal-daxini

Using Kafka to scale database replication

Stream processing with Apache Flink @ OfferUp

Spark Summit - Mobius C# Binding for Apache Spark

Serverless Reality

Netflix at-disney-09-26-2014

Session 03 data_migration_at_scale_by_sameer

From Three Nines to Five Nines - A Kafka Journey

Scaling big with Apache Kafka

Introduction To Streaming Data and Stream Processing with Apache Kafka

Tradeoffs in Distributed Systems Design: Is Kafka The Best? (Ben Stopford and...

Building a derived data store using Kafka

Data pipeline with kafka

What We Learned From Building a Modern Messaging and Streaming System for Cloud

Practical Cloud

Apache Kafka - Patterns anti-patterns

Real time dashboards with Kafka and Druid

Viewers also liked

Real-Time Web; Trending Social DataMatthew J. Kushin, Ph.D.

Geo-Trending ExampleDavid E Drummond

Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)MongoSF

What’s Trending in Social MediaJon Burgess - RedFusion Media

Python Development (MongoSF)Mike Dirolf

Introduction to Cassandra (June 2010)gdusbabek

Big data 5Vs 2014 - View from World to Vietnam by Dinh Le DatDinh Le Dat (Kevin D.)

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor ManagementMongoDB

MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...MongoDB

Facebook Ranking Factors - Toàn bộ mọi bí mật về FacebookVinalink Media JSC

Back to Basics Webinar 1: Introduction to NoSQLMongoDB

Viewers also liked (11)

Real-Time Web; Trending Social Data

Geo-Trending Example

Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)

What’s Trending in Social Media

Python Development (MongoSF)

Introduction to Cassandra (June 2010)

Big data 5Vs 2014 - View from World to Vietnam by Dinh Le Dat

MongoDB for Time Series Data Part 1: Setting the Stage for Sensor Management

MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...

Facebook Ranking Factors - Toàn bộ mọi bí mật về Facebook

Back to Basics Webinar 1: Introduction to NoSQL

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure

Midocean dropshipping via API with DroFx

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...

Probability Grade 10 Third Quarter Lessons

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...

April 2024 - Crypto Market Report's Analysis

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...

FESE Capital Markets Fact Sheet 2024 Q1.pdf

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...

Anomaly detection and data imputation within time series

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service

BigBuy dropshipping via API with DroFx.pptx

CebaBaby dropshipping via API with DroFX.pptx

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...

ELKO dropshipping via API with DroFx.pptx

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE

Trending Topics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Trending Topics

Similar to Trending Topics (20)

Recently uploaded

Recently uploaded (20)

Trending Topics