Engineering a Pipeline for
Data-Driven Analytics
Data-Driven Engineering
Tagged:
Speaker: Segzpair
Introduction
We’ve understood the need for data in making user
centered decisions through in-depth analysis.
Have you ever wondered how data processing platforms
are built to drive intelligence?
I will be taking you on a ride to further distill BIG data,
collection, transformation, modeling and analytics.
QUICK TIP
Setback relax and enjoy
the ride...
Let’s Get Started...
Ad Tracker
- Facebook Ad Engine, Google AdSense
Infrastructure Monitoring
- DataDog, NewRelic
Atlassian Products
- Confluent, Jira, Bitbucket
Collaboration Tools
- Slack, Skype
Your platform
- website, startup-tool etc.
Data Crunching platforms
Existing Platforms
Already crunching data (not considering the scale of data)
Sources of Data Generation
- First, Second and Third Party Data
Forms of Data Existence
- Text, Multimedia (Audio, Images and Video)
Speed of Data Generation
- batch and stream processes
Quick Insight
What is Big Data
Why, Where and Relevance of Big Data | The 3 Vs of Data
Segment of
Data Pipeline
Sources of data
Data ingestion
Collection and Extraction...
Batch Data Collection
- The use of Airflow
Stream Data Ingestion
- custom plugins into systems
- kafka as a streaming tool
Things to consider when choosing a technology
- Rate of data generation
- Processing rate
- Cost effective tool
Processing
Transformation and data Standardization
Data Catalog
- on-boarding principles
- data label categorization and standardization
Data Privacy and Security
- obfuscating PIIs
- infrastructure security and restricted access
Identity Resolution
- realtime correlation of identity attributes
Tracking
- process tracking and monitoring using CloudWatch
Executing to Scale
Storage Mechanism
Storage
Big data storage in cloud, cheapest approach
Data in Lake
- cloud infrastructure usage like S3
- optimized data storage format in Parquet
- security on network layer using private subnets and VPC
Data Validation
- queryable data using Athena
- first-level visibility with Quick Sight
Accessibility for ML and Other Processes
- using sagemaker
More Complex Approach
Data Presentation
Visualization and Presentation...
Optimized Access
- Data loaded on ElasticSearch and AeroSpike
Visualization
- any graph visualization tool of choice
Intelligent Reach
Meeting needs of users with ease
Adrenaline and Marketing Console
- offline targeting
Adatrix
- non-programmatic online targeting
Demand Side Platform (DSP)
- programmatic online reach
Meeting the right audience
Conclusion
Data in its raw state isn’t of much use, until it is refined and
well arranged.
This one of the many other things we do at Terragon.
Making Data Meaningful in a MarTech space.
Thanks
Got Questions?
Segzpair - oadetimehin@terragonltd.com07060514642

Engineering Data Pipeline for Data-Driven Analytics

  • 1.
    Engineering a Pipelinefor Data-Driven Analytics Data-Driven Engineering Tagged: Speaker: Segzpair
  • 2.
    Introduction We’ve understood theneed for data in making user centered decisions through in-depth analysis. Have you ever wondered how data processing platforms are built to drive intelligence? I will be taking you on a ride to further distill BIG data, collection, transformation, modeling and analytics.
  • 3.
    QUICK TIP Setback relaxand enjoy the ride... Let’s Get Started...
  • 4.
    Ad Tracker - FacebookAd Engine, Google AdSense Infrastructure Monitoring - DataDog, NewRelic Atlassian Products - Confluent, Jira, Bitbucket Collaboration Tools - Slack, Skype Your platform - website, startup-tool etc. Data Crunching platforms Existing Platforms Already crunching data (not considering the scale of data)
  • 5.
    Sources of DataGeneration - First, Second and Third Party Data Forms of Data Existence - Text, Multimedia (Audio, Images and Video) Speed of Data Generation - batch and stream processes Quick Insight What is Big Data Why, Where and Relevance of Big Data | The 3 Vs of Data
  • 6.
  • 8.
    Sources of data Dataingestion Collection and Extraction... Batch Data Collection - The use of Airflow Stream Data Ingestion - custom plugins into systems - kafka as a streaming tool Things to consider when choosing a technology - Rate of data generation - Processing rate - Cost effective tool
  • 9.
    Processing Transformation and dataStandardization Data Catalog - on-boarding principles - data label categorization and standardization Data Privacy and Security - obfuscating PIIs - infrastructure security and restricted access Identity Resolution - realtime correlation of identity attributes Tracking - process tracking and monitoring using CloudWatch Executing to Scale
  • 10.
    Storage Mechanism Storage Big datastorage in cloud, cheapest approach Data in Lake - cloud infrastructure usage like S3 - optimized data storage format in Parquet - security on network layer using private subnets and VPC Data Validation - queryable data using Athena - first-level visibility with Quick Sight Accessibility for ML and Other Processes - using sagemaker
  • 11.
    More Complex Approach DataPresentation Visualization and Presentation... Optimized Access - Data loaded on ElasticSearch and AeroSpike Visualization - any graph visualization tool of choice
  • 12.
    Intelligent Reach Meeting needsof users with ease Adrenaline and Marketing Console - offline targeting Adatrix - non-programmatic online targeting Demand Side Platform (DSP) - programmatic online reach Meeting the right audience
  • 13.
    Conclusion Data in itsraw state isn’t of much use, until it is refined and well arranged. This one of the many other things we do at Terragon. Making Data Meaningful in a MarTech space.
  • 14.
    Thanks Got Questions? Segzpair -oadetimehin@terragonltd.com07060514642