DataPlatform
Evolution
About OnebyAol.
About Our Team
About OurData
VideoTracking
AdTracking
UserTracking
LEGACY
PLATFORM
LegacySystem
DWH Cluster
SSIS Manager
External Data Providers
Event Collector
Caching
Reporting
DWH
Application Servers
LegacyScale
500TBStorage
40KEvents Processed
per Second
3.5BEvent Processed Daily
DailyProcessing
20GBData Daily
TheNeedToChange
Cost
ProcessingTime
Scale
DevelopmentROI
Testability
Accessibility
NEXT
STEPS
NextSteps
3Stages
Outcome
ComponentDescription
Examples
LegacySystem
DWH Cluster
SSIS Manager
External Data Providers
Event Collector
Caching
Reporting
DWH
Application Servers
First Stage
Data warehouse
Servers Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector Analytics
Reporting
Monitoring
Servers
sFTP
FTP
sFTP
FTP
Legacy DWH
Servers
FirstStageSummary
FullRedundancy
ComparisonLegacyvs.Batch
LinearScale
PartialTestCoverage
RawLevelDataAccess
CD
First Stage
Data warehouse
Servers Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector Analytics
Reporting
Monitoring
Servers
sFTP
FTP
sFTP
FTP
Legacy DWH
Servers
SecondStage
Data warehouse
Servers Servers
Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector
Scheduling
Reporting
Monitoring
Servers
S3
Azure
sFTP
FTP
Azure
S3
sFTP
FTP
Real Time DWH
Servers
Servers
Analytics
FirstStageSummary
NearRealtimeProcessing
ComparisonBatchvs.RealTime
FullMonitoring
FullTestCoverage
“Product”Event/ReportDefinition
DevOpsAutomation
MORE
DETAILS
BatchEventProcessing
Hadoop Cluster
Hadoop Monitoring
Aggregated data exporter
Processed data aggregator
Error Processing
Data Archivator
Data Collection Cluster
Raw data processing
Map-Reduce
Raw data files pushed to
Hadoop (WEB HDFS)
Vertica
ExternalInternal DWH Clusters
Data flow direction
Monitoring data
Raw data
processing
1. Cleaning/
Transformation/
Enrichment/
Validation of data
from main data
sources with Map-
Reduce
2. Month history
Aggregator Process
1. DSL for defining
new kind of
aggregation
Data exporter
1. Export
aggregated data
2. Export processed
data
ProcessedAggregated data
Logging Framework Elastic Search
Logs will be
exposed through
Kibana to monitor
data flow
Monitoring
Monitoring of data
flow inside and
outside of Event
Processing Cluster
Hadoop monitoring data
Error Processing
1. Automatic error
re-processing with
time window
S3
ExamplesEventProcessing
ExamplesEventProcessing
ExamplesEventProcessing
ExamplesEventProcessing
DataCollection
Data Collection Cluster
Servers
Servers
Servers
Video Tracking
Ad Tracking
User Tracking
3rd
Party Ad Tracking
SQL Server
CSV data received every hour
via FTP.
Raw Events and Dimensions.
Text files received every five
minutes.
From Public and Private
Cloud.
Raw Events.
Logging Framework Elastic Search
Hadoop Processing Cluster
Data about received files
events reported with logging
framework
Raw data files pushed to
Hadoop (WEB HDFS)
Dimension tables
Servers to acquire
Stage 1 :
.NET Application
will pull FTP, SQL
DWH server for
loggers and SQL
Replication for
dimension data
Stage 2:
Think to move to
other more
appropriate
technology like
Akka
Data flow direction
Logs will be
exposed through
Kibana to monitor
data flow
Monitoring data
Monitoring
Monitoring of data
flow inside and
outside of Data
Collection Cluster
MongoDb
DataDistribution
Data Distribution Cluster
Hive
Vertica
MongoDB
Report Distributor
Logging Framework Elastic Search
Reporting Platform
Data flow direction
Logs will be
exposed through
Kibana to monitor
data flow
Monitoring data
Monitoring
Monitoring of data
flow inside and
outside of Data
Distribution Cluster
Report S3 Storage
ExamplesData&DistributionCollection
ExamplesData&DistributionCollection
ExamplesData&DistributionCollection
ExamplesData&DistributionCollection
ReportingPlatform
Vertica
Hive
SQL Server
1. Distributed
2. Encapsulate
Repository
3. Versioning
4. Smart query
execution
5. Testable
MongoDb
Reporting Platform
Report Designer
Report Provider
Report Distributor
Reporting API
Statistics Provider
S3 Report Storage
Data sources of
Reporting platform
are in Private and
Public
Application Servers
ExamplesApplications
ExamplesApplications
ExamplesApplications
ExamplesApplications
Monitoring
Monitoring Cluster
Cloudera Manager
Elastic Search Cluster
Vertica Management
Kibana
Zabbix
Applications
Vertica
Hadoop
MongoDb
ExamplesMonitoring&Alerting
ExamplesMonitoring&Alerting
ExamplesMonitoring&Alerting
ExamplesMonitoring&Alerting
ExamplesMonitoring&Alerting
ExamplesMonitoring&Alerting
MigrationOutcome
15%Cost Reduction
LinearScale
90%Unit Test Coverage
x280Processing Time
x50Development ROI
CurrentScale
86BEvent Processed Daily
120TBData Daily
1MEvents Processed
per Second
NearRealTimeProcessing
Minimum Interval : 5 min
15+Event Sources
4.5PBHadoop
70TBVertica
ScaleGrowth
x15Event Processed Daily
x6000Daily Processed Data
x25Events Processed
per Second
x280
Processing Time
SecondStage
Data warehouse
Servers Servers
Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector
Scheduling
Reporting
Monitoring
Servers
S3
Azure
sFTP
FTP
Azure
S3
sFTP
FTP
Real Time DWH
Servers
Servers
Analytics
ThirdStage
Data warehouse
Servers
Servers
Servers
Servers
Data Collection
Servers
Data Distribution
Servers
DWH API
Servers
External Data Providers
Event Collector
Scheduling
Reporting
Monitoring
Servers
S3
Azure
sFTP
FTP
Azure
S3
sFTP
FTP
Real Time DWH
Servers ServersServers
Analytics
THANK
YOU

Data platform evolution