Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"


Published on

- Business goal
- What is Fast Data for us
- What is Fast & Big Data solution
- Reference Architecture
- Data Science for Big Data
- Technology Stack
- Solution Architecture
- Identity & Telemetry Data Processing Facts
- Continuous Deployment
- Quality Control

Published in: Technology
  • Be the first to comment

Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"

  1. 1. Big & Fast Data for Identity & Telemetry services
  2. 2. Business Goal Deliver a vendor-agnostic and open-source ready Big Data as a Service platform by using most up to date automation technologies and utilizing partnership with major Big Data software vendors and independent developers.
  3. 3. What is Fast Data for us? • Continuous data loading • Massively parallel processing • Data consolidation • Dimensional processing • Data normalization & denormalization (depends on tech stack) • Structured & dimensional data models • Hybrid distributed warehouses
  4. 4. What is Fast & Big Data Solution • Data WH • Processing • Analytics • Visualisation • Machine Learning • Data Virtualization • Data Ingestion As a Service
  5. 5. Portal Solution Architecture Event broker
 (Kafka) data split cache retention storage scenarios Master Storage (AVRO) Batch layer Speed layer Serving layer Consumer Dashboard Storage (JSON) Stream Data Warehouse Hadoop cluster Spark cluster Ad-Hoc 
 Queries BI Analytics Visualization engine Telemetry Identity ML ML
  6. 6. Reference Architecture All Data Real-time Data Processing Data Acquisition and Storing DataIntegration Data Warehousing Data Management (Governance, Security, Quality, MDM) Analytics Reporting and Analysis Predictive Modeling Data Mining Data Lake (Landing, Exploration and Archiving) UX and Visualization Applications Application data Media data Social data Enterprise content data Telemetry Other data Customer Analytics Marketing Analytics Web/Mobile/Social Analytics IT Operational Analytics Fraud and Risk Analytics Complex Event Processing Real-time Query and Search
  7. 7. Data Science for Big Data Artificial Intelligence Machine Learning High-Dimensional Data Big Data Apache Hadoop Infrastructure Data Collection Data Augmentation Real-time Data Processing Predictive Analytics Risk Analysis Direct Marketing Decision Support Systems Learning and Intelligent Optimization Data Analysis Data Exploration Data Visualization Business Intelligence
  8. 8. Technology stack
  9. 9. • ~ 100,000 Metrics & Events • ~100,000 events per min • 3TB per day • JSON • Independent Kafka clusters in scale • Independent Spark Streaming in scale • StreamSets • Distributed HDFS • Custom Analytics platform, cloud based. • Machine learning on flight Identity and Telemetry Data Processing: Facts
  10. 10. Continuous World • Continuous Deployment is the actual delivery of features and fixes to the customer as soon as they are ready • Continuous Delivery represents a philosophy and a commitment to ensuring that your code is always in a release-ready state • Continuous Integration allows automatically build and test your software on a regular basis
  11. 11. Continuous Deployment • Verify source code • Build artifacts • Run artifacts • Run automated testing • Publish package • Deploy production
  12. 12. Quality control areas • Application code • Automated tests code • Docker images • Functional requirements • Non-functional requirements • Integrations
  13. 13. Thank you Dmitry Lavrinenko, DevOps Solutions Architect