SlideShare a Scribd company logo
1 of 8
Building an intelligent big data 
app in 30 minutes 
1 Atigeo Confidential 
Strata Barcelona 
Nov 2014 
David Talby Claudiu Barbura 
SVP Engineering Sr. Director, Engineering 
@davidtalby @claudiubarbura
Agenda 
• Demo: FindFraud application 
• Demo: data pipeline, apis 
• Architecture 
• BDAS++ 
2 Atigeo Confidential
Infrastructure 
Whole System 
Starting Point 
Hadoop Spark Tachyon Mesos ZooKeeper Cassandra 
3 Atigeo Confidential 
Ingestion 
Data 
Workflows 
Spark SQL 
REST API 
Unified 
Security 
Unified 
Config. 
Monitoring 
& Logging 
Design Principles 
• Build, run & monitor end-to-end workflows 
• Same code for batch and streaming modes 
• Abstract away direct access to infrastructure 
• Unified REST API’s for cross-cutting concerns
Analytics 
Starting Point 
IPython Pandas 
4 Atigeo Confidential 
Scikit-learn, 
pybrain, 
nltk, … 
Hive, 
Shark, 
Spark SQL 
Whole System 
Feature 
Engineering 
Framework 
Visualized 
Metrics 
Distributed 
Featurization 
& Training 
Proprietary 
Modeling 
Algorithms 
Publish 
Models as 
REST API’s 
Online 
Learning 
REST API’s 
Design Principles 
• Declarative feature & model generation 
• Same code for local, distributed & serving layers 
• Experimentation & model optimization is key 
• Provide a very broad algorithmic toolbox
Application 
Starting Point 
5 Atigeo Confidential 
Cassandra 
Lucene & 
SolrCloud 
PostgreSQL 
TomCat, 
Spray.io, 
Node.js 
Whole System 
Feature 
Engineering 
Framework 
Modeling 
Workflows 
Visualized 
Metrics 
Distributed 
Feature 
Generation 
Distributed 
Training 
Publish 
Models as 
REST API’s 
Design Principles 
• Build app UI against REST API’s from Day One 
• Separate servers for analytics & app end users 
• Abstract away direct access to infrastructure
Let’s Build Something
Open Source Contribs 
• Jaws, http spark sql rest service 
• http://github.com/Atigeo/http-spark-sql-server 
 Backward compatible with Shark and Spark 0.x stack 
7 Atigeo Confidential 
• Spark Job Server 
 multiple Spark contexts in same JVM, job submission in Java + Scala 
https://github.com/Atigeo/spark-job-rest 
• Mesos framework starvation bug 
 submitted patch… detailed Tech Blog link soon at http://xpatterns.com 
• Tachyon patch (https://github.com/amplab/tachyon/pull/482)
Thank you! 
@atigeo Blog: 
8 Atigeo Confidential 
xpatterns.com 
linkedin.com/ 
company/atigeo 
David Talby Claudiu Barbura 
SVP Engineering Sr. Director, Engineering 
@davidtalby @claudiubarbura

More Related Content

What's hot

From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowDatabricks
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkMatt Ingenthron
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Spark Summit
 
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...HostedbyConfluent
 
Cloud native data platform
Cloud native data platformCloud native data platform
Cloud native data platformLi Gao
 
Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering RolesAdam Doyle
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Databricks
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
 
Architectural Best Practices to Master + Pitfalls to Avoid (P)
Architectural Best Practices to Master + Pitfalls to Avoid (P) Architectural Best Practices to Master + Pitfalls to Avoid (P)
Architectural Best Practices to Master + Pitfalls to Avoid (P) Elasticsearch
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...Alluxio, Inc.
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderDatabricks
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraDatabricks
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeDatabricks
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksDatabricks
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraDatabricks
 
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)Elasticsearch
 
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...HostedbyConfluent
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...Alluxio, Inc.
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackSylvain Wallez
 

What's hot (20)

From Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache AirflowFrom Idea to Model: Productionizing Data Pipelines with Apache Airflow
From Idea to Model: Productionizing Data Pipelines with Apache Airflow
 
Spark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with SparkSpark and Couchbase– Augmenting the Operational Database with Spark
Spark and Couchbase– Augmenting the Operational Database with Spark
 
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
Fighting Cybercrime: A Joint Task Force of Real-Time Data and Human Analytics...
 
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
Cloud-Based Event Stream Processing Architectures and Patterns with Apache Ka...
 
Cloud native data platform
Cloud native data platformCloud native data platform
Cloud native data platform
 
Data Engineering Roles
Data Engineering RolesData Engineering Roles
Data Engineering Roles
 
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
Architectural Best Practices to Master + Pitfalls to Avoid (P)
Architectural Best Practices to Master + Pitfalls to Avoid (P) Architectural Best Practices to Master + Pitfalls to Avoid (P)
Architectural Best Practices to Master + Pitfalls to Avoid (P)
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
Accelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks AutoloaderAccelerating Data Ingestion with Databricks Autoloader
Accelerating Data Ingestion with Databricks Autoloader
 
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen ShapiraStream All Things—Patterns of Modern Data Integration with Gwen Shapira
Stream All Things—Patterns of Modern Data Integration with Gwen Shapira
 
Machine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta LakeMachine Learning Data Lineage with MLflow and Delta Lake
Machine Learning Data Lineage with MLflow and Delta Lake
 
Scaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber AttacksScaling ML-Based Threat Detection For Production Cyber Attacks
Scaling ML-Based Threat Detection For Production Cyber Attacks
 
Accelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & PrivaceraAccelerate Data Science Initiatives: Databricks & Privacera
Accelerate Data Science Initiatives: Databricks & Privacera
 
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
Replicate Elasticsearch Data with Cross-Cluster Replication (CCR)
 
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
 
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
ALLUXIO (formerly Tachyon): Unify Data at Memory Speed - Effective using Spar...
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
Kibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stackKibana + timelion: time series with the elastic stack
Kibana + timelion: time series with the elastic stack
 

Similar to Building an intelligent big data application in 30 minutes

Sergii Bielskyi "Azure Logic App and building modern cloud native apps"
Sergii Bielskyi "Azure Logic App and building modern cloud native apps"Sergii Bielskyi "Azure Logic App and building modern cloud native apps"
Sergii Bielskyi "Azure Logic App and building modern cloud native apps"Fwdays
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)Daniel Toomey
 
2015-12-02 - WebCamp - Microsoft Azure Logic Apps
2015-12-02 - WebCamp - Microsoft Azure Logic Apps2015-12-02 - WebCamp - Microsoft Azure Logic Apps
2015-12-02 - WebCamp - Microsoft Azure Logic AppsSandro Pereira
 
Logic apps and PowerApps - Integrate across your APIs
Logic apps and PowerApps - Integrate across your APIsLogic apps and PowerApps - Integrate across your APIs
Logic apps and PowerApps - Integrate across your APIsSriram Hariharan
 
Zure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayZure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayOkko Oulasvirta
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Sergii Baidachnyi ITEM 2018
Sergii Baidachnyi ITEM 2018Sergii Baidachnyi ITEM 2018
Sergii Baidachnyi ITEM 2018ITEM
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Michael Rys
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019Istvan Rath
 
.NET for Azure Synapse (and viceversa)
.NET for Azure Synapse (and viceversa).NET for Azure Synapse (and viceversa)
.NET for Azure Synapse (and viceversa)Marco Parenzan
 
OpenStack for VMware Administrators
OpenStack for VMware AdministratorsOpenStack for VMware Administrators
OpenStack for VMware AdministratorsTrevor Roberts Jr.
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Tokyo Azure Meetup #4 - Build 2016 Overview
Tokyo Azure Meetup #4 -  Build 2016 OverviewTokyo Azure Meetup #4 -  Build 2016 Overview
Tokyo Azure Meetup #4 - Build 2016 OverviewTokyo Azure Meetup
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in AzureValdas Maksimavičius
 
Microsoft Tech Series 2019 - Azure DevOps
Microsoft Tech Series 2019 - Azure DevOpsMicrosoft Tech Series 2019 - Azure DevOps
Microsoft Tech Series 2019 - Azure DevOpsTomasz Wisniewski
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Tyler Nguyen
 

Similar to Building an intelligent big data application in 30 minutes (20)

Sergii Bielskyi "Azure Logic App and building modern cloud native apps"
Sergii Bielskyi "Azure Logic App and building modern cloud native apps"Sergii Bielskyi "Azure Logic App and building modern cloud native apps"
Sergii Bielskyi "Azure Logic App and building modern cloud native apps"
 
First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)First Look at Azure Logic Apps (BAUG)
First Look at Azure Logic Apps (BAUG)
 
Elastic-Engineering
Elastic-EngineeringElastic-Engineering
Elastic-Engineering
 
2015-12-02 - WebCamp - Microsoft Azure Logic Apps
2015-12-02 - WebCamp - Microsoft Azure Logic Apps2015-12-02 - WebCamp - Microsoft Azure Logic Apps
2015-12-02 - WebCamp - Microsoft Azure Logic Apps
 
Logic apps and PowerApps - Integrate across your APIs
Logic apps and PowerApps - Integrate across your APIsLogic apps and PowerApps - Integrate across your APIs
Logic apps and PowerApps - Integrate across your APIs
 
Zure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training dayZure Azure PaaS Zero to Hero - DevOps training day
Zure Azure PaaS Zero to Hero - DevOps training day
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Sergii Baidachnyi ITEM 2018
Sergii Baidachnyi ITEM 2018Sergii Baidachnyi ITEM 2018
Sergii Baidachnyi ITEM 2018
 
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
Building data pipelines for modern data warehouse with Apache® Spark™ and .NE...
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019IncQuery Server for Teamwork Cloud - Talk at IW2019
IncQuery Server for Teamwork Cloud - Talk at IW2019
 
.NET for Azure Synapse (and viceversa)
.NET for Azure Synapse (and viceversa).NET for Azure Synapse (and viceversa)
.NET for Azure Synapse (and viceversa)
 
OpenStack for VMware Administrators
OpenStack for VMware AdministratorsOpenStack for VMware Administrators
OpenStack for VMware Administrators
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Tokyo Azure Meetup #4 - Build 2016 Overview
Tokyo Azure Meetup #4 -  Build 2016 OverviewTokyo Azure Meetup #4 -  Build 2016 Overview
Tokyo Azure Meetup #4 - Build 2016 Overview
 
Making Data Scientists Productive in Azure
Making Data Scientists Productive in AzureMaking Data Scientists Productive in Azure
Making Data Scientists Productive in Azure
 
Microsoft Tech Series 2019 - Azure DevOps
Microsoft Tech Series 2019 - Azure DevOpsMicrosoft Tech Series 2019 - Azure DevOps
Microsoft Tech Series 2019 - Azure DevOps
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Building a Better BaaS
Building a Better BaaSBuilding a Better BaaS
Building a Better BaaS
 
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
Deep Dive Into Elasticsearch: Establish A Powerful Log Analysis System With E...
 

More from Claudiu Barbura

Overkill Analytics Seattle Spark Meetup
Overkill Analytics Seattle Spark MeetupOverkill Analytics Seattle Spark Meetup
Overkill Analytics Seattle Spark MeetupClaudiu Barbura
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming dataClaudiu Barbura
 
Tachyon meetup San Francisco Oct 2014
Tachyon meetup San Francisco Oct 2014Tachyon meetup San Francisco Oct 2014
Tachyon meetup San Francisco Oct 2014Claudiu Barbura
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonClaudiu Barbura
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)Claudiu Barbura
 

More from Claudiu Barbura (6)

Overkill Analytics Seattle Spark Meetup
Overkill Analytics Seattle Spark MeetupOverkill Analytics Seattle Spark Meetup
Overkill Analytics Seattle Spark Meetup
 
Overkill Analytics
Overkill AnalyticsOverkill Analytics
Overkill Analytics
 
Autonomous analytics on streaming data
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
 
Tachyon meetup San Francisco Oct 2014
Tachyon meetup San Francisco Oct 2014Tachyon meetup San Francisco Oct 2014
Tachyon meetup San Francisco Oct 2014
 
xPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, TachyonxPatterns on Spark, Shark, Mesos, Tachyon
xPatterns on Spark, Shark, Mesos, Tachyon
 
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
xPatterns ... beyond Hadoop (Spark, Shark, Mesos, Tachyon)
 

Recently uploaded

Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Building an intelligent big data application in 30 minutes

  • 1. Building an intelligent big data app in 30 minutes 1 Atigeo Confidential Strata Barcelona Nov 2014 David Talby Claudiu Barbura SVP Engineering Sr. Director, Engineering @davidtalby @claudiubarbura
  • 2. Agenda • Demo: FindFraud application • Demo: data pipeline, apis • Architecture • BDAS++ 2 Atigeo Confidential
  • 3. Infrastructure Whole System Starting Point Hadoop Spark Tachyon Mesos ZooKeeper Cassandra 3 Atigeo Confidential Ingestion Data Workflows Spark SQL REST API Unified Security Unified Config. Monitoring & Logging Design Principles • Build, run & monitor end-to-end workflows • Same code for batch and streaming modes • Abstract away direct access to infrastructure • Unified REST API’s for cross-cutting concerns
  • 4. Analytics Starting Point IPython Pandas 4 Atigeo Confidential Scikit-learn, pybrain, nltk, … Hive, Shark, Spark SQL Whole System Feature Engineering Framework Visualized Metrics Distributed Featurization & Training Proprietary Modeling Algorithms Publish Models as REST API’s Online Learning REST API’s Design Principles • Declarative feature & model generation • Same code for local, distributed & serving layers • Experimentation & model optimization is key • Provide a very broad algorithmic toolbox
  • 5. Application Starting Point 5 Atigeo Confidential Cassandra Lucene & SolrCloud PostgreSQL TomCat, Spray.io, Node.js Whole System Feature Engineering Framework Modeling Workflows Visualized Metrics Distributed Feature Generation Distributed Training Publish Models as REST API’s Design Principles • Build app UI against REST API’s from Day One • Separate servers for analytics & app end users • Abstract away direct access to infrastructure
  • 7. Open Source Contribs • Jaws, http spark sql rest service • http://github.com/Atigeo/http-spark-sql-server  Backward compatible with Shark and Spark 0.x stack 7 Atigeo Confidential • Spark Job Server  multiple Spark contexts in same JVM, job submission in Java + Scala https://github.com/Atigeo/spark-job-rest • Mesos framework starvation bug  submitted patch… detailed Tech Blog link soon at http://xpatterns.com • Tachyon patch (https://github.com/amplab/tachyon/pull/482)
  • 8. Thank you! @atigeo Blog: 8 Atigeo Confidential xpatterns.com linkedin.com/ company/atigeo David Talby Claudiu Barbura SVP Engineering Sr. Director, Engineering @davidtalby @claudiubarbura