SlideShare a Scribd company logo
Garth Henson, November 2022
Telling Stories with Big Data
Striving for relevance and accuracy
Telling Stories with Big Data Striving for relevance and accuracy
Garth Henson, November 2022
Prerequisites
• GitHub access
• Docker (with docker-compose) installed
• Favorite IDE
Setup
Clone the project repo
https://github.com/guahanweb/twitter-relay-consumer
Create an environment
fi
le at .docker.env
LOG_LEVEL=debug
REDIS_HOST=redis
REDIS_PORT=6379
RELAY_AUTH_TOKEN=<token>
RELAY_AUTH_URL=http://<ip-address>:3000/connect
RELAY_WEBSOCKET_URL=ws://<ip-address>:3000/ws
Run the consumer locally
$ docker-compose up
Lucas
fi
lm
Disney Streaming
Microsoft
The Walt Disney Company
Amazon
Synacor, Inc.
@guahanweb
Garth Henson
Understanding Data
Relevant? Accurate?
High Level Data Analysis
Data Partitioning (i.e., by time)
Processing Data in Real Time
Our Use Case: Twitter data processing at scale
Twitter Stream
a study on data discovery
Redis Data Type: Sorted Sets
Preprocessing data: partitions and indexes
• Step 1: register rules with Twitter (watch the @Lucas
fi
lm user)
• Step 2: connect consumer to Twitter data stream (stream only
contains matching data sequentially)
• Step 3: monitor each event and process data accordingly
•Metadata for unique authors and tags
•Time partitions split into tweets, retweets, quotes, and replies
• Step 4: store partitioned data in Redis
Why time-based partitions?
• Finite partition life (aids in cleaning up old data)
• Supports multiple inbound data streams
• Real-time feed
• Slow or latent event handling
• Manual back
fi
ll of data queries
• Heavy query caching potential (most recent X hours)
• Reduced overhead of aggregate queries
• Cache summary results by partition
• Purge partition cache(s) on write
• Data loss (even a full partition) is manageable
Indexes are driven by your data retrieval model
(What do you need to know?)
Aggregate by author Aggregate by hashtag
Indexes may contain redundant data;
data normalization is another talk...
Querying the Data
What were the top 10 daily hashtags for the first week of May 2022?
Step 1: union hourly partitions by date
ZUNIONSTORE
20220501:tweet:tags
24
2022050100:tweet:tags
2022050101:tweet:tags
…
• Identify the time periods (daily)
• Calculate the partitions within each data point
(hours 00-23)
• UNION the values across all partitions for a
count
Step 2: create a scoring algorithm (what does TOP mean?)
• Hashtags used across tweets, retweets,
quotes, and replies
• Our algorithm assigns a value to each type:
• Tweets: 100 points
• Replies: 50 points
• Quotes: 25 points
• Retweets: 1 point
• UNION your previously calculated daily
summary across indexes for the
fi
nal daily
score
Step 3: script it together with Node.js + Lua for Redis
Create a function specifying algorithmic weights and pass it to a Lua script
Step 3: script it together with Node.js + Lua for Redis
Retrieve the arguments passed in via Node.js Set up a table with all matching partitions, by type
Step 3: script it together with Node.js + Lua for Redis
UNION all partitions, weighted by type JSON encode the response to the client
Redis Considerations
Why bother with Lua scripting?
• Atomic actions - all operations within the script are synchronous
• Performance - Lua scripts are evaluated in memory and can be pre-cached
• Separation of concerns - complex Redis interactions in one place
• ...
Redis Considerations
What should we keep in mind?
• Data types and design
• Our example in this talk uses primarily SORTED SETS
• GEOSEARCH available since 6.2
• Be aware when running in cluster mode
• Index hashing is unique to your shard
• Cannot dynamically reference keys from Lua
• Consider redundancy and syncing
Telling Stories with Big Data Striving for relevance and accuracy
Garth Henson, November 2022
Prerequisites
• GitHub access
• Docker (with docker-compose) installed
• Favorite IDE
Setup
Clone the project repo
https://github.com/guahanweb/twitter-relay-consumer
Create an environment
fi
le at .docker.env
LOG_LEVEL=debug
REDIS_HOST=redis
REDIS_PORT=6379
RELAY_AUTH_TOKEN=<token>
RELAY_AUTH_URL=http://<ip-address>:3000/connect
RELAY_WEBSOCKET_URL=ws://<ip-address>:3000/ws
Run the consumer locally
$ docker-compose up
High Level Architecture
Twitter Event Structure
Hands-on
Distribute access key and IP address
Hands-on
Overview of currently configured rules and message handling
Hands-on
Current state of reporting (and how to extend it)
Hands-on
How could we track and report ATO rule matches by minute?
Thank you
Questions, comments, recommendations, corrections...

More Related Content

Similar to Telling Stories with Big Data

Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2O
Sri Ambati
 
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBayStoring eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
MongoDB
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
Azure Data Studio Extension Development
Azure Data Studio Extension DevelopmentAzure Data Studio Extension Development
Azure Data Studio Extension Development
Drew Skwiers-Koballa
 
Webinar Docker Tri Series
Webinar Docker Tri SeriesWebinar Docker Tri Series
Webinar Docker Tri Series
Newt Global Consulting LLC
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightTillmann Eitelberg
 
Gerrit + Jenkins = Continuous Delivery For Big Data
Gerrit + Jenkins = Continuous Delivery For Big DataGerrit + Jenkins = Continuous Delivery For Big Data
Gerrit + Jenkins = Continuous Delivery For Big Data
Stefano Galarraga
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013dotCloud
 
DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2
Docker, Inc.
 
Data for all: Empowering teams with scalable Shiny applications @ useR 2019
Data for all: Empowering teams with scalable Shiny applications @ useR 2019Data for all: Empowering teams with scalable Shiny applications @ useR 2019
Data for all: Empowering teams with scalable Shiny applications @ useR 2019
Ruan Pearce-Authers
 
Docker datascience pipeline
Docker datascience pipelineDocker datascience pipeline
Docker datascience pipeline
DataWorks Summit
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
mathraq
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
Norberto Leite
 
James-Bucher-Resume
James-Bucher-ResumeJames-Bucher-Resume
James-Bucher-ResumeJames Bucher
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Denny Lee
 
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Docker, Inc.
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipeline
DataWorks Summit
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
Denny Lee
 

Similar to Telling Stories with Big Data (20)

Latest Developments in H2O
Latest Developments in H2OLatest Developments in H2O
Latest Developments in H2O
 
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBayStoring eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Azure Data Studio Extension Development
Azure Data Studio Extension DevelopmentAzure Data Studio Extension Development
Azure Data Studio Extension Development
 
Webinar Docker Tri Series
Webinar Docker Tri SeriesWebinar Docker Tri Series
Webinar Docker Tri Series
 
SQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsightSQL Server Konferenz 2014 - SSIS & HDInsight
SQL Server Konferenz 2014 - SSIS & HDInsight
 
Gerrit + Jenkins = Continuous Delivery For Big Data
Gerrit + Jenkins = Continuous Delivery For Big DataGerrit + Jenkins = Continuous Delivery For Big Data
Gerrit + Jenkins = Continuous Delivery For Big Data
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013
 
DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2
 
Data for all: Empowering teams with scalable Shiny applications @ useR 2019
Data for all: Empowering teams with scalable Shiny applications @ useR 2019Data for all: Empowering teams with scalable Shiny applications @ useR 2019
Data for all: Empowering teams with scalable Shiny applications @ useR 2019
 
Docker datascience pipeline
Docker datascience pipelineDocker datascience pipeline
Docker datascience pipeline
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
James-Bucher-Resume
James-Bucher-ResumeJames-Bucher-Resume
James-Bucher-Resume
 
Introduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop PrimerIntroduction to Microsoft's Big Data Platform and Hadoop Primer
Introduction to Microsoft's Big Data Platform and Hadoop Primer
 
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
Chugging Our Own "Craft Brew” – HPE’s Journey Towards Containers-as-a-Service...
 
Docker data science pipeline
Docker data science pipelineDocker data science pipeline
Docker data science pipeline
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 

More from All Things Open

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
All Things Open
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
All Things Open
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
All Things Open
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
All Things Open
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
All Things Open
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
All Things Open
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
All Things Open
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
All Things Open
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
All Things Open
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
All Things Open
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
All Things Open
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
All Things Open
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
All Things Open
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
All Things Open
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
All Things Open
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
All Things Open
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
All Things Open
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
All Things Open
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
All Things Open
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
All Things Open
 

More from All Things Open (20)

Building Reliability - The Realities of Observability
Building Reliability - The Realities of ObservabilityBuilding Reliability - The Realities of Observability
Building Reliability - The Realities of Observability
 
Modern Database Best Practices
Modern Database Best PracticesModern Database Best Practices
Modern Database Best Practices
 
Open Source and Public Policy
Open Source and Public PolicyOpen Source and Public Policy
Open Source and Public Policy
 
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
Weaving Microservices into a Unified GraphQL Schema with graph-quilt - Ashpak...
 
The State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil NashThe State of Passwordless Auth on the Web - Phil Nash
The State of Passwordless Auth on the Web - Phil Nash
 
Total ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScriptTotal ReDoS: The dangers of regex in JavaScript
Total ReDoS: The dangers of regex in JavaScript
 
What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?What Does Real World Mass Adoption of Decentralized Tech Look Like?
What Does Real World Mass Adoption of Decentralized Tech Look Like?
 
How to Write & Deploy a Smart Contract
How to Write & Deploy a Smart ContractHow to Write & Deploy a Smart Contract
How to Write & Deploy a Smart Contract
 
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
Spinning Your Drones with Cadence Workflows, Apache Kafka and TensorFlow
 
DEI Challenges and Success
DEI Challenges and SuccessDEI Challenges and Success
DEI Challenges and Success
 
Scaling Web Applications with Background
Scaling Web Applications with BackgroundScaling Web Applications with Background
Scaling Web Applications with Background
 
Supercharging tutorials with WebAssembly
Supercharging tutorials with WebAssemblySupercharging tutorials with WebAssembly
Supercharging tutorials with WebAssembly
 
Using SQL to Find Needles in Haystacks
Using SQL to Find Needles in HaystacksUsing SQL to Find Needles in Haystacks
Using SQL to Find Needles in Haystacks
 
Configuration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit InterceptConfiguration Security as a Game of Pursuit Intercept
Configuration Security as a Game of Pursuit Intercept
 
Scaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship ProgramScaling an Open Source Sponsorship Program
Scaling an Open Source Sponsorship Program
 
Build Developer Experience Teams for Open Source
Build Developer Experience Teams for Open SourceBuild Developer Experience Teams for Open Source
Build Developer Experience Teams for Open Source
 
Deploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache BeamDeploying Models at Scale with Apache Beam
Deploying Models at Scale with Apache Beam
 
Sudo – Giving access while staying in control
Sudo – Giving access while staying in controlSudo – Giving access while staying in control
Sudo – Giving access while staying in control
 
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML ApplicationsFortifying the Future: Tackling Security Challenges in AI/ML Applications
Fortifying the Future: Tackling Security Challenges in AI/ML Applications
 
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
Securing Cloud Resources Deployed with Control Planes on Kubernetes using Gov...
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Telling Stories with Big Data

  • 1. Garth Henson, November 2022 Telling Stories with Big Data Striving for relevance and accuracy
  • 2. Telling Stories with Big Data Striving for relevance and accuracy Garth Henson, November 2022 Prerequisites • GitHub access • Docker (with docker-compose) installed • Favorite IDE Setup Clone the project repo https://github.com/guahanweb/twitter-relay-consumer Create an environment fi le at .docker.env LOG_LEVEL=debug REDIS_HOST=redis REDIS_PORT=6379 RELAY_AUTH_TOKEN=<token> RELAY_AUTH_URL=http://<ip-address>:3000/connect RELAY_WEBSOCKET_URL=ws://<ip-address>:3000/ws Run the consumer locally $ docker-compose up
  • 3. Lucas fi lm Disney Streaming Microsoft The Walt Disney Company Amazon Synacor, Inc. @guahanweb Garth Henson
  • 6. High Level Data Analysis
  • 8. Processing Data in Real Time
  • 9. Our Use Case: Twitter data processing at scale
  • 10. Twitter Stream a study on data discovery
  • 11. Redis Data Type: Sorted Sets
  • 12. Preprocessing data: partitions and indexes • Step 1: register rules with Twitter (watch the @Lucas fi lm user) • Step 2: connect consumer to Twitter data stream (stream only contains matching data sequentially) • Step 3: monitor each event and process data accordingly •Metadata for unique authors and tags •Time partitions split into tweets, retweets, quotes, and replies • Step 4: store partitioned data in Redis
  • 13. Why time-based partitions? • Finite partition life (aids in cleaning up old data) • Supports multiple inbound data streams • Real-time feed • Slow or latent event handling • Manual back fi ll of data queries • Heavy query caching potential (most recent X hours) • Reduced overhead of aggregate queries • Cache summary results by partition • Purge partition cache(s) on write • Data loss (even a full partition) is manageable
  • 14. Indexes are driven by your data retrieval model (What do you need to know?) Aggregate by author Aggregate by hashtag Indexes may contain redundant data; data normalization is another talk...
  • 15. Querying the Data What were the top 10 daily hashtags for the first week of May 2022?
  • 16. Step 1: union hourly partitions by date ZUNIONSTORE 20220501:tweet:tags 24 2022050100:tweet:tags 2022050101:tweet:tags … • Identify the time periods (daily) • Calculate the partitions within each data point (hours 00-23) • UNION the values across all partitions for a count
  • 17. Step 2: create a scoring algorithm (what does TOP mean?) • Hashtags used across tweets, retweets, quotes, and replies • Our algorithm assigns a value to each type: • Tweets: 100 points • Replies: 50 points • Quotes: 25 points • Retweets: 1 point • UNION your previously calculated daily summary across indexes for the fi nal daily score
  • 18. Step 3: script it together with Node.js + Lua for Redis Create a function specifying algorithmic weights and pass it to a Lua script
  • 19. Step 3: script it together with Node.js + Lua for Redis Retrieve the arguments passed in via Node.js Set up a table with all matching partitions, by type
  • 20. Step 3: script it together with Node.js + Lua for Redis UNION all partitions, weighted by type JSON encode the response to the client
  • 21. Redis Considerations Why bother with Lua scripting? • Atomic actions - all operations within the script are synchronous • Performance - Lua scripts are evaluated in memory and can be pre-cached • Separation of concerns - complex Redis interactions in one place • ...
  • 22. Redis Considerations What should we keep in mind? • Data types and design • Our example in this talk uses primarily SORTED SETS • GEOSEARCH available since 6.2 • Be aware when running in cluster mode • Index hashing is unique to your shard • Cannot dynamically reference keys from Lua • Consider redundancy and syncing
  • 23. Telling Stories with Big Data Striving for relevance and accuracy Garth Henson, November 2022 Prerequisites • GitHub access • Docker (with docker-compose) installed • Favorite IDE Setup Clone the project repo https://github.com/guahanweb/twitter-relay-consumer Create an environment fi le at .docker.env LOG_LEVEL=debug REDIS_HOST=redis REDIS_PORT=6379 RELAY_AUTH_TOKEN=<token> RELAY_AUTH_URL=http://<ip-address>:3000/connect RELAY_WEBSOCKET_URL=ws://<ip-address>:3000/ws Run the consumer locally $ docker-compose up
  • 27. Hands-on Overview of currently configured rules and message handling
  • 28. Hands-on Current state of reporting (and how to extend it)
  • 29. Hands-on How could we track and report ATO rule matches by minute?
  • 30. Thank you Questions, comments, recommendations, corrections...