High overview lecture about analyzing time series data with Apache Spark micro service application @Windward Ltd.
The ways to handle the missing data problem and the tool that we've used to handle the problem.
Exploring iOS App Development: Simplifying the Process
Transform & Analyze Time Series Data via Apache Spark @Windward
1. TRANSFORM & ANALYZE
TIME SERIES DATA VIA
APACHE SPARK
DEMI BEN-ARI
SR. SOFTWARE ENGINEER
WINDWARD
17.05.2015
2. ABOUT ME
DEMI BEN-ARI
SENIOR SOFTWARE ENGINEER AT WINDWARD LTD.
BS’C COMPUTER SCIENCE – ACADEMIC COLLEGE TEL-AVIV YAFFO
IN THE PAST:
SOFTWARE TEAM LEADER & SENIOR JAVA SOFTWARE ENGINEER,
MISSILE DEFENSE AND ALERT SYSTEM - “OFEK” UNIT - IAF
3. WHAT DOES WINDWARD DO?
Windward is a maritime data and analytics
company, bringing unprecedented visibility to
the maritime domain. Windward has built the
world's first maritime data platform, the
Windward Mind,
which analyzes and organizes the world's
maritime data
4. WHERE DOES THE DATA COME FROM?
Other
Sources
Maritime
Databases
AIS
Automatic
Identificati
on
System
Port Agent
Reports
7. SPECIAL IN WINDWARD’S DOMAIN
Maritim
e Mind
Data Mining Scope
Market
Trends
Anomaly
Detection
• Single Data
point scope
• Going in Detail
• Fraud detection
• Sample / Total
Data scope
• Trends
• Data Sampling
problems
8. MISSING PARTS IN TIME SERIES DATA
• DATA ARRIVING FROM THE SATELLITES
• MIGHT CAUSE DELAYS BECAUSE OF BAD TRANSMISSION
• DATA VENDORS DELAYING THE DATA STREAM
• CALCULATION IN LAYERS MAY CAUSE HOLES IN THE DATA
9. THE PROBLEM - RECEIVING DATA
T = 0
Level 3 Entity
Level 2 Entity
Level 1 Entity
Beginning state, no data, and the time line begins
10. THE PROBLEM - RECEIVING DATA
T = 10
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Level 1 entities data
arrives and gets stored
11. THE PROBLEM - RECEIVING DATA
T = 10
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Level 2 entities are
created on top of Level
1’s Data
(Decreased amount of
data)
Level 3 entities are
created on top of Level
2’s Data
(Decreased amount of
data)
12. THE PROBLEM - RECEIVING DATA
T = 20
Level 3 Entity
Level 2 Entity
Level 1 Entity
Computation sliding window size
Level 1 entity's data
arriving late
Because of the sliding window’s
back size, level 2 and 3 entities
would not be created properly and
there would be “Holes” in the Data
13. SOLUTION TO THE PROBLEM
• CREATING DEPENDENT MICRO SERVICES FORMING A DATA PIPELINE
• OUR MICRO SERVICES ARE MAINLY APACHE SPARK APPLICATIONS
• SERVICES ARE ONLY DEPENDENT ON THE DATA - NOT THE PREVIOUS SERVICE’S RUN
• FORMING A STRUCTURE AND SCHEDULING OF “BACK SLIDING WINDOW”
• KNOW YOUR DATA AND IT’S RELEVANCE TROUGH TIME
• DON’T TRY TO FORESEE THE FUTURE – IT MIGHT BIAS THE RESULTS
14. WHY CHOOSING APACHE SPARK?
• IN MEMORY COMPUTATION (NOT ONLY)
• FULLY DISTRIBUTED FRAMEWORK – LINEAR SCALE OUT
• FAULT TOLERANT FRAMEWORK
• MULTIPLE LANGUAGE API
• HIGHER LEVEL ABSTRACTIONS (SPARKSQL, MLLIB, GRAPHX, SPARK STREAMING)
• FUNCTIONAL PROGRAMMING PARADIGM
• EASY TO USE AND MAINTAIN
15. THINGS TO TAKE IN CONSIDERATION
• AFTER WRITING THE SERVICE – HOW DO YOU BOOTSTRAP YOUR DATA?
• DO SO WITHOUT “KNOWING THE FUTURE”
• SEPARATE YOUR DATA -> SEPARATE YOUR SERVICES BY THE DATA TYPES
• AUTOMATE AS MUCH AS YOU CAN – DEPLOYMENT, MAINTENANCE
• MONITORING
• DATA AVAILABILITY
• PERFORMANCE
• RUNTIME
16. WHAT OTHER KIND OF PROBLEMS DO WE
HANDLE?
• FRAUD DETECTION
• CUSTOMIZE USER DOMAIN REQUESTS
• DATA FILTERING (MALFORMED / TAMPERED DATA)
• CREATING VESSEL’S PATTERN OF LIFE
• RELATIONSHIP BETWEEN VESSELS
• PROTOCOL RESEARCH AND ANALYSIS (AIS)
20. THANKS,
RESOURCES AND CONTACT
• DEMI BEN-ARI
• LINKEDIN
• TWITTER: @DEMIBENARI
• BLOG: HTTP://PROGEXC.BLOGSPOT.COM/
• EMAIL: DEMI.BENARI@GMAIL.COM
• WINDWARD LTD.
• BIG THINGS ARE HAPPENING HERE –
FACEBOOK GROUP
• MEETUP – BIG THINGS
jobs@windward.e
u