3. Streaming Data Source
• Anti-virus software companies
• Augmented data to scale
• 4000 - 6000 events per minute
• Scaled up to 100,000 events per minute
4. Streaming Data Source
• Content
–Attack Type
•Malware
•DDOS
•Backdoor
–Location
Information of
victim and attacker
6. Getis - Ord
• Used when you have geospatial data
• Calculates statistical significant clusters based
on a feature
• Estimate a Gi Score for every space in the
region
– Higher Gi score => Significant Hotspot
• Compares the feature score of current cell and
it’s neighbors with sum of all feature values
7. Getis Ord – Gi Score
• Steps to Calculate
– Divide the space into
cells
– Accumulate attack
counts in each cell
– Calculate Gi Score
• Blue vs Green
– Blue is surrounded
by cells of higher
attack count
5
3 2
4
1
5
10
14
9
9
10
1
2
8. Interactive Query
• Find events within a
radius of 10 miles
– Calculate Bounding
box
Bounding box
• min(x), max(x), min(y), max(y)
• Based on earth’s spherical
radius at that point
11. Kafka Streams Technical Challenges
• Streams application should provide Serializers
and Deserializers to materialize the data
– Read input from stream / Write to stream
• Built in serializers are: String, Integer, Long,
Double
15. Kafka Streams Technical Challenges
• Kafka Streams Errors
– Internal Topic Error - Cannot create internal
topics
• User permissions to create topics - Stack Overflow
• Set Group ID and Application ID
• Used for co-ordinating between instances
16. About Me - Shwetha Narayanan
• Recently graduated with
Masters in Computer Science
• Worked for 2 years as a
Software Engineer
• Co-authored a paper on
“Enabling Real time crime
intelligence using mobile GIS and
prediction methods”, EISIC, 2013