SlideShare a Scribd company logo
Evolving a premium raw data product
from simple spark script in 3 month
Avi Perez, Big Data Team Leader @AppsFlyer
AppsFlyer
• 28M raised top VC
• 200M To 13B Daily Events [3 Years]
• 40GB To 5TB [gz] daily text data
• 25 → 60ppl R&D during 2016
• Top 15 Israeli startups by inc.com
What We Do
Media SourcesApp Developers App Users
X10
9
4B$ in media payments annually
measured
AppsFlyer Raw Data Channels
Raw vs Aggregated
• Real Time Stream From kafka
• Online Query Data API (csv)
HTTP
Columnar DB
S3
New Use Case
• Big Clients with BI Systems
• Very large files / large number of files
• Tackling current limitations
secor
Amazon S3
Rapid Prototype ...
write
notify
read
Naive SPARK SQL
Challenges ...
•Scale in #clients
•Client monitoring
•Security
•Schema
•Flow & Control
Requests keep coming...
• More Clients
• More Events Types
• Customizable Columns
What are we facing here...
What was missing?
Improving Data Format
• Scanning a lot of data is easy...but not that fast
• Being a big data company is not necessarily
saying you need to read all your data fast
Moving to Parquet . . .
Twitter & Cloudera
• Columnar storage (load only what you need)
• Space efficient (50% improvement)
• Read Time efficient (98% improvement )
Stateful S3 Bucket Structure
For automatic bots parsing
View Layer
• Flatten fields mapping
• Versions
From script to Micro Service
• Tasks creation (Buckets, IAM, Credentials etc)
• Search on Task Executions
• Access to the report files
• Get statuses from the Job HTTP
• Highly available
Abstraction . . .
Moving Toward A Product . . .
• Clients want SLA . . .
Service transparency
Push notification to slack once there
is an issue
Data Segregation
Results
Loading data for specific clients
Load specific clients raw
data from 2.5TB
compressed topic
Same load with
partitioning
1.5
min
30sec
From hard coded List to RDS
Client
A
Client
B
... ...
Secured Email Notifications
click Get link
Vault
• Secure Secret Storage
• Dynamics Secrets
• Data Encryption
• Leasing and renewal
• Revocation
Cost Optimization
Helping our clients with download
Daily sessions output file
for one of the clients
The same report
compressed
(.gz)
60G
B
2.1
GB
Moving to YARN
Prioritizing spark Jobs
Support keep asking the same
questions….
Monitoring . . .
Monitor, monitor, monitor….
• Metrics
• Re-tries
• PDs
Going premium . .
• On boarding
• Well defined schema fields
• Self Serve and pricing
What we learn . . .
Thank you
And…
We are
hiring!!
avi@appsflyer.com

More Related Content

What's hot

Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Fwdays
 
Rich Internet Applications and Flex - 3
Rich Internet Applications and Flex - 3Rich Internet Applications and Flex - 3
Rich Internet Applications and Flex - 3
Vijay Kalangi
 
MongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewMongoDB 3.2 Feature Preview
MongoDB 3.2 Feature Preview
Norberto Leite
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB
 
MongoDB Atlas
MongoDB AtlasMongoDB Atlas
MongoDB Atlas
MongoDB
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
Fwdays
 
SharePoint UserGroup Stuttgart - Martina Grom - Office 365 News
SharePoint UserGroup Stuttgart - Martina Grom - Office 365 NewsSharePoint UserGroup Stuttgart - Martina Grom - Office 365 News
SharePoint UserGroup Stuttgart - Martina Grom - Office 365 News
atwork
 
Workshop 2: Building a streaming data platform on AWS
Workshop 2: Building a streaming data platform on AWSWorkshop 2: Building a streaming data platform on AWS
Workshop 2: Building a streaming data platform on AWS
Amazon Web Services
 
Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18
MongoDB
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a Week
Databricks
 
MongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineMongoDB: Agile Combustion Engine
MongoDB: Agile Combustion Engine
Norberto Leite
 
Tech UG - Newcastle 09-17 - logic apps
Tech UG - Newcastle 09-17 -   logic appsTech UG - Newcastle 09-17 -   logic apps
Tech UG - Newcastle 09-17 - logic apps
Michael Stephenson
 
Securing an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active DirectorySecuring an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active Directory
Rick van den Bosch
 
Sitecore Symposium: DMS Where is the data at?
Sitecore Symposium: DMS Where is the data at?Sitecore Symposium: DMS Where is the data at?
Sitecore Symposium: DMS Where is the data at?
Pieter Brinkman
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message Store
Evan Rodd
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Amazon Web Services
 
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Lucidworks
 
APIdays Helsinki 2019 - GraphQL API Management with Amit P. Acharya, IBM
APIdays Helsinki 2019 - GraphQL API Management with Amit P. Acharya, IBMAPIdays Helsinki 2019 - GraphQL API Management with Amit P. Acharya, IBM
APIdays Helsinki 2019 - GraphQL API Management with Amit P. Acharya, IBM
apidays
 
Scribe insight 01 publisher deep dive
Scribe insight 01   publisher deep diveScribe insight 01   publisher deep dive
Scribe insight 01 publisher deep dive
Scribe Software Corp.
 
Maximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWSMaximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWS
MongoDB
 

What's hot (20)

Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
 
Rich Internet Applications and Flex - 3
Rich Internet Applications and Flex - 3Rich Internet Applications and Flex - 3
Rich Internet Applications and Flex - 3
 
MongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewMongoDB 3.2 Feature Preview
MongoDB 3.2 Feature Preview
 
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB  present...
MongoDB San Francisco 2013: Storing eBay's Media Metadata on MongoDB present...
 
MongoDB Atlas
MongoDB AtlasMongoDB Atlas
MongoDB Atlas
 
Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"Дмитрий Попович "How to build a data warehouse?"
Дмитрий Попович "How to build a data warehouse?"
 
SharePoint UserGroup Stuttgart - Martina Grom - Office 365 News
SharePoint UserGroup Stuttgart - Martina Grom - Office 365 NewsSharePoint UserGroup Stuttgart - Martina Grom - Office 365 News
SharePoint UserGroup Stuttgart - Martina Grom - Office 365 News
 
Workshop 2: Building a streaming data platform on AWS
Workshop 2: Building a streaming data platform on AWSWorkshop 2: Building a streaming data platform on AWS
Workshop 2: Building a streaming data platform on AWS
 
Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18Solving your Backup Needs - Ben Cefalo mdbe18
Solving your Backup Needs - Ben Cefalo mdbe18
 
Effective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a WeekEffective AIOps with Open Source Software in a Week
Effective AIOps with Open Source Software in a Week
 
MongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineMongoDB: Agile Combustion Engine
MongoDB: Agile Combustion Engine
 
Tech UG - Newcastle 09-17 - logic apps
Tech UG - Newcastle 09-17 -   logic appsTech UG - Newcastle 09-17 -   logic apps
Tech UG - Newcastle 09-17 - logic apps
 
Securing an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active DirectorySecuring an Azure Function REST API with Azure Active Directory
Securing an Azure Function REST API with Azure Active Directory
 
Sitecore Symposium: DMS Where is the data at?
Sitecore Symposium: DMS Where is the data at?Sitecore Symposium: DMS Where is the data at?
Sitecore Symposium: DMS Where is the data at?
 
MongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message StoreMongoDB Atlas - eHarmony’s New Message Store
MongoDB Atlas - eHarmony’s New Message Store
 
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...
 
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
Multi-Tenant Log Analytics SaaS Service using Solr: Presented by Chirag Gupta...
 
APIdays Helsinki 2019 - GraphQL API Management with Amit P. Acharya, IBM
APIdays Helsinki 2019 - GraphQL API Management with Amit P. Acharya, IBMAPIdays Helsinki 2019 - GraphQL API Management with Amit P. Acharya, IBM
APIdays Helsinki 2019 - GraphQL API Management with Amit P. Acharya, IBM
 
Scribe insight 01 publisher deep dive
Scribe insight 01   publisher deep diveScribe insight 01   publisher deep dive
Scribe insight 01 publisher deep dive
 
Maximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWSMaximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWS
 

Similar to Evolving s3 story

Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache Spark
Oren Raboy
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
Amazon Web Services
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
Elasticsearch
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisiones
Elasticsearch
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
JeremyOtt5
 
Automation options with Office 365
Automation options with Office 365Automation options with Office 365
Automation options with Office 365
Robert Crane
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
Amazon Web Services
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insights
Elasticsearch
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Tech Triveni
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Web Services
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
2nd Watch
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
Amazon Web Services
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
Amazon Web Services
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Matt Stubbs
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
Amazon Web Services
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
Jampp
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
Hakka Labs
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
Krishna Gade
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
Lars Albertsson
 

Similar to Evolving s3 story (20)

Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
How Totango uses Apache Spark
How Totango uses Apache SparkHow Totango uses Apache Spark
How Totango uses Apache Spark
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Comment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitablesComment transformer vos données en informations exploitables
Comment transformer vos données en informations exploitables
 
Cómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisionesCómo transformar los datos en análisis con los que tomar decisiones
Cómo transformar los datos en análisis con los que tomar decisiones
 
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & BeyondAutomated Data Synchronization: Data Loader, Data Mirror & Beyond
Automated Data Synchronization: Data Loader, Data Mirror & Beyond
 
Automation options with Office 365
Automation options with Office 365Automation options with Office 365
Automation options with Office 365
 
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use CasesBDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
BDA307 Real-time Streaming Applications on AWS, Patterns and Use Cases
 
Transforming data into actionable insights
Transforming data into actionable insightsTransforming data into actionable insights
Transforming data into actionable insights
 
Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...Apache CarbonData+Spark to realize data convergence and Unified high performa...
Apache CarbonData+Spark to realize data convergence and Unified high performa...
 
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
Amazon Kinesis Platform – The Complete Overview - Pop-up Loft TLV 2017
 
Big data and Analytics on AWS
Big data and Analytics on AWSBig data and Analytics on AWS
Big data and Analytics on AWS
 
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the CloudFSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
FSI201 FINRA’s Managed Data Lake – Next Gen Analytics in the Cloud
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
 
Getting started with Amazon Kinesis
Getting started with Amazon KinesisGetting started with Amazon Kinesis
Getting started with Amazon Kinesis
 
Getting started with amazon kinesis
Getting started with amazon kinesisGetting started with amazon kinesis
Getting started with amazon kinesis
 
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at PinterestDataEngConf SF16 - Scalable and Reliable Logging at Pinterest
DataEngConf SF16 - Scalable and Reliable Logging at Pinterest
 
Scalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at PinterestScalable and Reliable Logging at Pinterest
Scalable and Reliable Logging at Pinterest
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 

Recently uploaded

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 

Recently uploaded (20)

Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 

Evolving s3 story

Editor's Notes

  1. sales come to r&d and asked a way to get organic data
  2. Big data analytics נותנים כלים למשתמשים שלנו למדוד כמה איכותי הטרפיק שהם מביאים מערוצי פירסום שונים מאיפה מגיע אותו טראפיק איכותי וכלים לעזור להם לקבל החלטות מכמויות אדירות של מידע שמפפיעות באופן ישיר על הככנסות שלהם
  3. Raw vs aggregate
  4. Not always using out dashboard We asked them what we do with our API’s Jobs ETL to run on s3 High load on AF systems How we can solve Many queries per day We have inherint limit of 200k rows CMS big clients, remove limitations. Very large companies want all their data Script “issue” that cost us 50k
  5. פתרון: נדרשנו לקבל החלטות קשות ב r&d בידיעה שאנחנו נצטרך לשלם בתחזוקה ידנית, אבל לא היתה ממש ברירה ורצינו שהלקוח האסטרטגי הזה יהיה לנו. וזה הפתרון שהצגנו 13B events → kafka → secor (service for persisting kafka log to S3) As sequence files SparkSQL on top on that Creating manually a bucket on our production S3 for that account with only List \ READ permissions.creating IAM specifc user manually and Providing him the credentails And running the process with chrons \ mesos each morning עלינו לפרודקשיין בתוך כמה ימים, ואפשרנו גישה רק לטופיק הקטן ביותר של התקנות. הלקוח חתם.
  6. Mobile App Letgo Raises $100 Million From Naspers To Take Over Classifieds In The U.S.
  7. reports/<Home Folder>/account /<event-type>-<date YYYY-MM-dd> reports/<Home Folder>/apps/app-id /<event-type>-<date YYYY-MM-dd>
  8. Flatten the schema Schema on write Schema on read Code reuse Versioning Readability \ Simplification
  9. Tell the story of lets go Which build the entire marketier team work flow base on the dasgbaord they are creating
  10. Analytics process which calculate each day to X app (partiton keys) And saved that as meta-data on the files bucket
  11. על מנת להגן על הלקוחות שלנו וגם להגן עלינו מעצמנו מטעויות. הטמענו שירות שנתן לנו דרכים להפיק keys \ secret באופן שרירותי
  12. Helping our clients to improve their download time from our S3
  13. Scheduled tasks were not executed Same job executed twice Not trivial to maintance DAG Dynamic allocation