Submit Search
Upload
Distilling Insights @ Appsflyer (Data Architecture)
•
3 likes
•
1,888 views
Arnon Rotem-Gal-Oz
Follow
Appsflyer's data architecture
Read less
Read more
Technology
Report
Share
Report
Share
1 of 25
Download now
Download to read offline
Recommended
Distilling insights @ AppsFlyer
Distilling insights @ AppsFlyer
Arnon Rotem-Gal-Oz
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Databricks
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
confluent
Hyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
AI at Scale
AI at Scale
Adi Polak
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
Databricks
Recommended
Distilling insights @ AppsFlyer
Distilling insights @ AppsFlyer
Arnon Rotem-Gal-Oz
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Spark SQL Adaptive Execution Unleashes The Power of Cluster in Large Scale wi...
Databricks
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Apache Spark Listeners: A Crash Course in Fast, Easy Monitoring
Databricks
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Watching Pigs Fly with the Netflix Hadoop Toolkit (Hadoop Summit 2013)
Jeff Magnusson
EDA Meets Data Engineering – What's the Big Deal?
EDA Meets Data Engineering – What's the Big Deal?
confluent
Hyperspace for Delta Lake
Hyperspace for Delta Lake
Databricks
AI at Scale
AI at Scale
Adi Polak
Keeping Identity Graphs In Sync With Apache Spark
Keeping Identity Graphs In Sync With Apache Spark
Databricks
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
Databricks
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
Databricks
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
Processing genetic data at scale
Processing genetic data at scale
Mark Schroering
Sparkflows Use Cases
Sparkflows Use Cases
Jayant Shekhar
Building the Autodesk Design Graph-(Yotto Koga, Autodesk)
Building the Autodesk Design Graph-(Yotto Koga, Autodesk)
Spark Summit
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
Spark Worshop
Spark Worshop
Juan Pedro Moreno
Unifying your data management with Hadoop
Unifying your data management with Hadoop
Jayant Shekhar
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
Spark Summit
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
Denny Lee
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Databricks
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the Job
Databricks
Shifting Data Science into High Gear
Shifting Data Science into High Gear
Spark Summit
DMM.comラボはなぜSparkを採用したのか?レコメンドエンジン開発の裏側をお話します!
DMM.comラボはなぜSparkを採用したのか?レコメンドエンジン開発の裏側をお話します!
leverages_event
Vectorized R Execution in Apache Spark
Vectorized R Execution in Apache Spark
Databricks
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Spark Summit
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Spark Summit
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
Introduction to apache spark
Introduction to apache spark
Aakashdata
More Related Content
What's hot
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
Databricks
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Spark Summit
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
Databricks
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Databricks
Processing genetic data at scale
Processing genetic data at scale
Mark Schroering
Sparkflows Use Cases
Sparkflows Use Cases
Jayant Shekhar
Building the Autodesk Design Graph-(Yotto Koga, Autodesk)
Building the Autodesk Design Graph-(Yotto Koga, Autodesk)
Spark Summit
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Databricks
Spark Worshop
Spark Worshop
Juan Pedro Moreno
Unifying your data management with Hadoop
Unifying your data management with Hadoop
Jayant Shekhar
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
Spark Summit
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
Denny Lee
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Databricks
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the Job
Databricks
Shifting Data Science into High Gear
Shifting Data Science into High Gear
Spark Summit
DMM.comラボはなぜSparkを採用したのか?レコメンドエンジン開発の裏側をお話します!
DMM.comラボはなぜSparkを採用したのか?レコメンドエンジン開発の裏側をお話します!
leverages_event
Vectorized R Execution in Apache Spark
Vectorized R Execution in Apache Spark
Databricks
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Spark Summit
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
Spark Summit
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Databricks
What's hot
(20)
Bridging the Gap Between Datasets and DataFrames
Bridging the Gap Between Datasets and DataFrames
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
The Little Warehouse That Couldn't Or: How We Learned to Stop Worrying and Mo...
Data Warehousing with Spark Streaming at Zalando
Data Warehousing with Spark Streaming at Zalando
Cloud Experience: Data-driven Applications Made Simple and Fast
Cloud Experience: Data-driven Applications Made Simple and Fast
Processing genetic data at scale
Processing genetic data at scale
Sparkflows Use Cases
Sparkflows Use Cases
Building the Autodesk Design Graph-(Yotto Koga, Autodesk)
Building the Autodesk Design Graph-(Yotto Koga, Autodesk)
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
Spark Worshop
Spark Worshop
Unifying your data management with Hadoop
Unifying your data management with Hadoop
Big Data Meets Learning Science: Keynote by Al Essa
Big Data Meets Learning Science: Keynote by Al Essa
Jump Start into Apache Spark (Seattle Spark Meetup)
Jump Start into Apache Spark (Seattle Spark Meetup)
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...
Stream Processing: Choosing the Right Tool for the Job
Stream Processing: Choosing the Right Tool for the Job
Shifting Data Science into High Gear
Shifting Data Science into High Gear
DMM.comラボはなぜSparkを採用したのか?レコメンドエンジン開発の裏側をお話します!
DMM.comラボはなぜSparkを採用したのか?レコメンドエンジン開発の裏側をお話します!
Vectorized R Execution in Apache Spark
Vectorized R Execution in Apache Spark
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
Field Notes from Expeditions in the Cloud-(Matt Wood, Amazon Web Services)
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
Similar to Distilling Insights @ Appsflyer (Data Architecture)
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Walaa Hamdy Assy
Introduction to apache spark
Introduction to apache spark
Aakashdata
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
Rui Liu
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
Spark Summit
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Chetan Khatri
Spark4
Spark4
poovarasu maniandan
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
Hakka Labs
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Databricks
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
Chetan Khatri
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
Kristian Alexander
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
Lightbend
What's New in Spark 2?
What's New in Spark 2?
Eyal Ben Ivri
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
jeykottalam
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
DataWorks Summit
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
Eva Tse
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
Amazon Web Services
Parallelizing Existing R Packages
Parallelizing Existing R Packages
Craig Warman
20170126 big data processing
20170126 big data processing
Vienna Data Science Group
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Jen Aman
Similar to Distilling Insights @ Appsflyer (Data Architecture)
(20)
Apache spark - Architecture , Overview & libraries
Apache spark - Architecture , Overview & libraries
Introduction to apache spark
Introduction to apache spark
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
Vertica And Spark: Connecting Computation And Data
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
HKOSCon18 - Chetan Khatri - Scaling TB's of Data with Apache Spark and Scala ...
Spark4
Spark4
DataEngConf SF16 - Spark SQL Workshop
DataEngConf SF16 - Spark SQL Workshop
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
Fossasia 2018-chetan-khatri
Fossasia 2018-chetan-khatri
Spark SQL - 10 Things You Need to Know
Spark SQL - 10 Things You Need to Know
A Tale of Two APIs: Using Spark Streaming In Production
A Tale of Two APIs: Using Spark Streaming In Production
What's New in Spark 2?
What's New in Spark 2?
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale
SparkR: Enabling Interactive Data Science at Scale on Hadoop
SparkR: Enabling Interactive Data Science at Scale on Hadoop
Running Presto and Spark on the Netflix Big Data Platform
Running Presto and Spark on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
Parallelizing Existing R Packages
Parallelizing Existing R Packages
20170126 big data processing
20170126 big data processing
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
Netflix - Productionizing Spark On Yarn For ETL At Petabyte Scale
More from Arnon Rotem-Gal-Oz
Taking ML to production - a journey
Taking ML to production - a journey
Arnon Rotem-Gal-Oz
Apache spark
Apache spark
Arnon Rotem-Gal-Oz
Fallacies of Distributed Computing
Fallacies of Distributed Computing
Arnon Rotem-Gal-Oz
Docker & Kubernetes intro
Docker & Kubernetes intro
Arnon Rotem-Gal-Oz
Docker Intro
Docker Intro
Arnon Rotem-Gal-Oz
Data security @ the personal level
Data security @ the personal level
Arnon Rotem-Gal-Oz
Microservices - it's déjà vu all over again
Microservices - it's déjà vu all over again
Arnon Rotem-Gal-Oz
Big data in the cloud - welcome to cost oriented design
Big data in the cloud - welcome to cost oriented design
Arnon Rotem-Gal-Oz
Big data Overview
Big data Overview
Arnon Rotem-Gal-Oz
Hadoop YARN overview
Hadoop YARN overview
Arnon Rotem-Gal-Oz
SAF
SAF
Arnon Rotem-Gal-Oz
REST presentation
REST presentation
Arnon Rotem-Gal-Oz
SOA & Big Data
SOA & Big Data
Arnon Rotem-Gal-Oz
Why the JVM?
Why the JVM?
Arnon Rotem-Gal-Oz
Building reliable systems from unreliable components
Building reliable systems from unreliable components
Arnon Rotem-Gal-Oz
Azure migration
Azure migration
Arnon Rotem-Gal-Oz
Things to think about while architecting azure solutions
Things to think about while architecting azure solutions
Arnon Rotem-Gal-Oz
Soa
Soa
Arnon Rotem-Gal-Oz
Rest
Rest
Arnon Rotem-Gal-Oz
SOA patterns
SOA patterns
Arnon Rotem-Gal-Oz
More from Arnon Rotem-Gal-Oz
(20)
Taking ML to production - a journey
Taking ML to production - a journey
Apache spark
Apache spark
Fallacies of Distributed Computing
Fallacies of Distributed Computing
Docker & Kubernetes intro
Docker & Kubernetes intro
Docker Intro
Docker Intro
Data security @ the personal level
Data security @ the personal level
Microservices - it's déjà vu all over again
Microservices - it's déjà vu all over again
Big data in the cloud - welcome to cost oriented design
Big data in the cloud - welcome to cost oriented design
Big data Overview
Big data Overview
Hadoop YARN overview
Hadoop YARN overview
SAF
SAF
REST presentation
REST presentation
SOA & Big Data
SOA & Big Data
Why the JVM?
Why the JVM?
Building reliable systems from unreliable components
Building reliable systems from unreliable components
Azure migration
Azure migration
Things to think about while architecting azure solutions
Things to think about while architecting azure solutions
Soa
Soa
Rest
Rest
SOA patterns
SOA patterns
Recently uploaded
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
BookNet Canada
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
null - The Open Security Community
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
Sergiu Bodiu
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
The Digital Insurer
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Alex Barbosa Coqueiro
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Enterprise Knowledge
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Zilliz
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Zilliz
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Commit University
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Addepto
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Mattias Andersson
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
Manik S Magar
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
Hervé Boutemy
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Fwdays
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Kalema Edgar
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
Fwdays
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Fwdays
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
gvaughan
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Mark Simos
Recently uploaded
(20)
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
Training state-of-the-art general text embedding
Training state-of-the-art general text embedding
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Distilling Insights @ Appsflyer (Data Architecture)
1.
Distilling insights @
Arnon Rotem-‐Gal-‐Oz Chief Data Officer
2.
3.
4.
Kafka Columnar Database (Redshift- evaluating
Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
5.
Data’s hierarchy of
needs* *With apologies to Maslow Acted upon presented Distilled Usable Accessible Exist
6.
Exist
7.
Kafka Columnar Database (Redshift- evaluating
Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
8.
Kafka Columnar Database (Redshift- evaluating
Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
9.
Working off of
RAW data
10.
“Malting” Just slap SQL
on everything Accessible
11.
Kafka Columnar Database (Redshift- evaluating
Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
12.
Fermenting Usable
13.
Kafka Columnar Database (Redshift- evaluating
Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
14.
Distilling Distilled
15.
Kafka Columnar Database (Redshift- evaluating
Vertica) IMDG (Ignite - evaluating Geode) Secor Spark Aggregations SparkSQL (evaluating Drill, Presto) SQL SQL Raw (sequence files) DW (parquet files) DM (Aggregations) Application dashboard Self-serve BI (TBD) Spark ETL Spark Spark ML Latest Events Scoring exploration Agg. logic Internal tools installs clicksinapplaunches Accounts
16.
RT insights Predictive Prescriptive Dashboards whatnot presented
17.
Sidetrack: On use of
Spark
18.
Hadoop & Mesos
19.
20.
Land data in
a queue
21.
All data is
time-‐series
22.
Enrich with foreign keys
before persisting
23.
Analyze and balance
jobs
24.
Not everything is
big data
25.
We’re hiring…. jobs@appsflyer.com
Download now