SlideShare a Scribd company logo
1 of 10
Architecture Overview
Architecture Overview
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
• The architecture consists of 5 basic
components, a HTML5 Client and a file
backend
• Each instance of a component auto-
registers in the metadata master
• Every component defined here
• Is horizontally scalable
• Has load balancing
• And has failover capabilities
• All external communication goes
through the fully REST-ful api, where
each request is checked against a role-
based security system
• Next to the restful interface, it can also
deliver and retrieve results and data
through indirect methods (mail, sftp)
1
2
4
B
3
5
Web ClientA
1) Web Server
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The web server receives all requests,
checks them against the security model
and metadata, after which it sets out the
actions in the queuing system
• The setup of the security model,
metadata (including data descriptions
there) and the entire API (calls and
actions) are proprietary code
• Dependencies:
• Nginx, for the scalable http server
• uWSGI, for running python code
behind nginx
• Flask, a web framework for
handling sockets and sessions
1
2
4
B
A
3
5
2) Routing & Queuing
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The queue server receives all action
requests from the API, finds where it
can execute them and load balances
requests over these resources
• We have created the queues and auto-
registering setup to create the generic
framework functionality and to ensure
load balancing and fail over capabilities
• Dependencies:
• Celery, for the Python library
• RabbitMQ, the distribution broker
• Redis, for exchanging results
between the processes
1
2
4
B
A
3
5
3) Metadata
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The metadata server contains all
general data on users, databases and
security, as well the metadata on
available data for users (measures,
dimensions, tables and how these all
related to each other)
• Dependencies:
• MongoDB, for containing the
metadata
1
2
4
B
A
3
5
4) Dynamic Query Engine
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The dynamic query engine server
contains a number of data files (which it
automatically downloads and
synchronizes from the backend) and
can analyze and aggregate
• It can also auto-join tables on
commonalities, perform a wide range of
calculations and do several distributed
analytics operations on row-level
• Dependencies:
• Bcolz, for containing the data files
in a compressed, columnar format
• Pandas, for higher end operations
for the result data set (joins, sorts,
etc.)
1
2
4
B
A
3
5
5) Processing & Analytics
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The processing & analytics server
handles (asynchronous) calls to
perform file loading, exporting and
analytics calls
• This includes the creation and execution
of machine learning and statistical
models
• It also handles the conversion of raw
data files into the binary files and
updating relevant metadata
• Dependencies:
• Scikit-learn for machine learning
• Statsmodel for statistical models
• Pandas, for data manipulation
• Bcolz, for converting the data files
into a compressed, columnar
format
1
2
4
B
A
3
5
A) Web Client
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The web client is a full, web-based
HTML5 client that gives access to all
• Reporting
• Analytics
• File import
• User and Security Mgmt
• Server Mgmt
• The files are server by the webserver as
a static, with all calls go through the
standard API
• Dependencies:
• Jquery, for cross-browser javascript
simplification and ui
• Bootstrap, for layout
• D3.js, a library for visualizations
1
2
4
B
A
3
5
B) File Backend
Web Server
API
Routing & Queuing Metadata
Dynamic Query
Engine
Processing &
Analytics
File Backend
Web Client
• The file backend contains all raw files
and the processed (compressed,
columnar) files
• DQE instances automatically retrieve
their assigned files from the backend
when a file has been updated.
• Dependencies:
• AWS S3 for saving files
1
2
4
B
A
3
5
Architecture Comparison
Area Hadoop Cassandra Best In Class visualfabriq Difference
Data Non-structured & structured Structured, wide-column Teradata (structured, columnar) Structured, columnar,
compressed
Optimized for numerical data (means: no text analytics etc.)
Architecture Rack-aware, daemon based
Cluster
Peer-to-peer cluster Horizontally scaling, container-
based microservices
communicating through
rabbitmq queues
Easier to monitor & scale
Setup Complex Complex Up & running in one minute Much, much easier to setup and rollout
Cluster
Maintenance
Node creation and assignment
usually through commercial
cluster mgmt software
Peer-to-peer network; auto-
configures
Self-registering nodes that can be
assigned specific tasks and data in
a web interface
ETL Flume, Sqoop Bulk Loader Informatica, Talend Web based, drag & drop with
wizards
Web based, easy to use
Language Map/Reduce; add-ons for sql (pig,
hive, impala, etc.)
CQL SQL MOLAP-like; sql interface to be
build
SQL is the standard, but because of the built-in reporting
and analytics this is not something users will need
Compression No No MongoDb/WiredTiger Blosc-based Saves on average 20x in disk space while speeding up reads
Performance Slow, batch based; Spark can add
in-memory capability (speeds up
100x)
High, in-memory options High, disk-based with
compression delivering in 2-3x
range of in-memory
Out-of-the-box near in-memory performance with file-
based scaling; with advances of CPU speed, this might even
surpass traditional in-memory performance
Interface Restful API Restful API Restful API Restful API
Reporting Only in external tools (that
connect to sql-connector)
Only in external tools (that
connect to 3rd party connectors)
Tableau (HTML5, interactive,
beautiful)
Built-in HTML5, interactive,
extensible (d3.js based)
Only solution with out-of-the-box reporting with an easy-
to-use, modern web-based interface
Analytics Distributed map/reduce analytics
through Mahout
Only as optional, paid-for module SAS, SPSS Built-in HTML5, interactive
environment that incorporates
leading OS machine learning (sci-
kit learn), statistics (statsmodel)
and propietary (POS-analytics)
functionality; nb: the analytics
load is not fully distributed yet
Only solution with out-of-the-box analytics with an easy-to-
use, modern web-based interface
Security Kerberos-based security Data object security General, role-based security One point to manage all security from data access to
functionality (reporting, accessibility, etc.)
Open source Core is open source; several
performance acceleration &
mgmt tools are paid
Core is open source; analytics,
backup and other options are
paid
Core is open source; large cluster
mgmt tools and vertical-specific
analytics options are paid
Language Java Java Python (and Cython & C)

More Related Content

What's hot

How to integrate your database with kafka & CDC
How to integrate your database with kafka & CDCHow to integrate your database with kafka & CDC
How to integrate your database with kafka & CDCAbdallah Mahmoud
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FutureDataWorks Summit
 
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEPBIOVIA
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
MongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewMongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewNorberto Leite
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Pat Patterson
 
Google App Engine At A Glance
Google App Engine At A GlanceGoogle App Engine At A Glance
Google App Engine At A GlanceStefan Christoph
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in RetailHari Shreedharan
 
PPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecturePPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architectureRiccardo Perico
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookDatabricks
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
An Introduction to Pentaho Kettle
An Introduction to Pentaho KettleAn Introduction to Pentaho Kettle
An Introduction to Pentaho KettleDan Moore
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemZohar Elkayam
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...Ashnikbiz
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Prestokbajda
 

What's hot (20)

Vip2p
Vip2pVip2p
Vip2p
 
How to integrate your database with kafka & CDC
How to integrate your database with kafka & CDCHow to integrate your database with kafka & CDC
How to integrate your database with kafka & CDC
 
Presto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and FuturePresto @ Facebook: Past, Present and Future
Presto @ Facebook: Past, Present and Future
 
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
(ATS4-PLAT05) Accelrys Catalog: A Search Index for AEP
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
MongoDB 3.2 Feature Preview
MongoDB 3.2 Feature PreviewMongoDB 3.2 Feature Preview
MongoDB 3.2 Feature Preview
 
Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!Open Source Big Data Ingestion - Without the Heartburn!
Open Source Big Data Ingestion - Without the Heartburn!
 
Google App Engine At A Glance
Google App Engine At A GlanceGoogle App Engine At A Glance
Google App Engine At A Glance
 
Streamsets and spark in Retail
Streamsets and spark in RetailStreamsets and spark in Retail
Streamsets and spark in Retail
 
NoSql
NoSqlNoSql
NoSql
 
PPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecturePPWT2019 - EmPower your BI architecture
PPWT2019 - EmPower your BI architecture
 
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at FacebookTangram: Distributed Scheduling Framework for Apache Spark at Facebook
Tangram: Distributed Scheduling Framework for Apache Spark at Facebook
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
An Introduction to Pentaho Kettle
An Introduction to Pentaho KettleAn Introduction to Pentaho Kettle
An Introduction to Pentaho Kettle
 
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop EcosystemThings Every Oracle DBA Needs to Know about the Hadoop Ecosystem
Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem
 
Streamsets and spark
Streamsets and sparkStreamsets and spark
Streamsets and spark
 
AzureDocumentDB
AzureDocumentDBAzureDocumentDB
AzureDocumentDB
 
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
FOSSASIA 2015 - 10 Features your developers are missing when stuck with Propr...
 
Presto Summit 2018 - 01 - Facebook Presto
Presto Summit 2018  - 01 - Facebook PrestoPresto Summit 2018  - 01 - Facebook Presto
Presto Summit 2018 - 01 - Facebook Presto
 
NATE-Central-Log
NATE-Central-LogNATE-Central-Log
NATE-Central-Log
 

Similar to Bquery Reporting & Analytics Architecture

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Microsoft Sentinel Deployment V1.pptx
Microsoft Sentinel Deployment V1.pptxMicrosoft Sentinel Deployment V1.pptx
Microsoft Sentinel Deployment V1.pptxsaadatali65
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data PlatformShu-Jeng Hsieh
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...DataWorks Summit
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWSAmazon Web Services
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
Building Real World Application with Azure
Building Real World Application with AzureBuilding Real World Application with Azure
Building Real World Application with Azuredivyapisces
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationGennady Baranov
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architectureMatsuo Sawahashi
 
Day 1 - Module 1 - Introduction to Big Data MVA.pptx
Day 1 - Module 1 - Introduction to Big Data MVA.pptxDay 1 - Module 1 - Introduction to Big Data MVA.pptx
Day 1 - Module 1 - Introduction to Big Data MVA.pptxAhsanFazalQureshi1
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platformgiventocode
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataWes McKinney
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream ProcessingLuis Gonzalez
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingBEEVA_es
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)Amazon Web Services Korea
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Zhenxiao Luo
 

Similar to Bquery Reporting & Analytics Architecture (20)

Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Microsoft Sentinel Deployment V1.pptx
Microsoft Sentinel Deployment V1.pptxMicrosoft Sentinel Deployment V1.pptx
Microsoft Sentinel Deployment V1.pptx
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Serverless Data Platform
Serverless Data PlatformServerless Data Platform
Serverless Data Platform
 
An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...An architecture for federated data discovery and lineage over on-prem datasou...
An architecture for federated data discovery and lineage over on-prem datasou...
 
(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS(BDT317) Building A Data Lake On AWS
(BDT317) Building A Data Lake On AWS
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
Building Real World Application with Azure
Building Real World Application with AzureBuilding Real World Application with Azure
Building Real World Application with Azure
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentation
 
Service quality monitoring system architecture
Service quality monitoring system architectureService quality monitoring system architecture
Service quality monitoring system architecture
 
Day 1 - Module 1 - Introduction to Big Data MVA.pptx
Day 1 - Module 1 - Introduction to Big Data MVA.pptxDay 1 - Module 1 - Introduction to Big Data MVA.pptx
Day 1 - Module 1 - Introduction to Big Data MVA.pptx
 
Azure - Data Platform
Azure - Data PlatformAzure - Data Platform
Azure - Data Platform
 
Apache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory DataApache Arrow: Cross-language Development Platform for In-memory Data
Apache Arrow: Cross-language Development Platform for In-memory Data
 
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines,  API, Messaging and Stream ProcessingJustGiving – Serverless Data Pipelines,  API, Messaging and Stream Processing
JustGiving – Serverless Data Pipelines, API, Messaging and Stream Processing
 
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream ProcessingJustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
JustGiving | Serverless Data Pipelines, API, Messaging and Stream Processing
 
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
AWS CLOUD 2017 - Amazon Athena 및 Glue를 통한 빠른 데이터 질의 및 처리 기능 소개 (김상필 솔루션즈 아키텍트)
 
Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019Real time analytics at uber @ strata data 2019
Real time analytics at uber @ strata data 2019
 

Recently uploaded

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 

Recently uploaded (20)

RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 

Bquery Reporting & Analytics Architecture

  • 2. Architecture Overview Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend • The architecture consists of 5 basic components, a HTML5 Client and a file backend • Each instance of a component auto- registers in the metadata master • Every component defined here • Is horizontally scalable • Has load balancing • And has failover capabilities • All external communication goes through the fully REST-ful api, where each request is checked against a role- based security system • Next to the restful interface, it can also deliver and retrieve results and data through indirect methods (mail, sftp) 1 2 4 B 3 5 Web ClientA
  • 3. 1) Web Server Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The web server receives all requests, checks them against the security model and metadata, after which it sets out the actions in the queuing system • The setup of the security model, metadata (including data descriptions there) and the entire API (calls and actions) are proprietary code • Dependencies: • Nginx, for the scalable http server • uWSGI, for running python code behind nginx • Flask, a web framework for handling sockets and sessions 1 2 4 B A 3 5
  • 4. 2) Routing & Queuing Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The queue server receives all action requests from the API, finds where it can execute them and load balances requests over these resources • We have created the queues and auto- registering setup to create the generic framework functionality and to ensure load balancing and fail over capabilities • Dependencies: • Celery, for the Python library • RabbitMQ, the distribution broker • Redis, for exchanging results between the processes 1 2 4 B A 3 5
  • 5. 3) Metadata Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The metadata server contains all general data on users, databases and security, as well the metadata on available data for users (measures, dimensions, tables and how these all related to each other) • Dependencies: • MongoDB, for containing the metadata 1 2 4 B A 3 5
  • 6. 4) Dynamic Query Engine Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The dynamic query engine server contains a number of data files (which it automatically downloads and synchronizes from the backend) and can analyze and aggregate • It can also auto-join tables on commonalities, perform a wide range of calculations and do several distributed analytics operations on row-level • Dependencies: • Bcolz, for containing the data files in a compressed, columnar format • Pandas, for higher end operations for the result data set (joins, sorts, etc.) 1 2 4 B A 3 5
  • 7. 5) Processing & Analytics Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The processing & analytics server handles (asynchronous) calls to perform file loading, exporting and analytics calls • This includes the creation and execution of machine learning and statistical models • It also handles the conversion of raw data files into the binary files and updating relevant metadata • Dependencies: • Scikit-learn for machine learning • Statsmodel for statistical models • Pandas, for data manipulation • Bcolz, for converting the data files into a compressed, columnar format 1 2 4 B A 3 5
  • 8. A) Web Client Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The web client is a full, web-based HTML5 client that gives access to all • Reporting • Analytics • File import • User and Security Mgmt • Server Mgmt • The files are server by the webserver as a static, with all calls go through the standard API • Dependencies: • Jquery, for cross-browser javascript simplification and ui • Bootstrap, for layout • D3.js, a library for visualizations 1 2 4 B A 3 5
  • 9. B) File Backend Web Server API Routing & Queuing Metadata Dynamic Query Engine Processing & Analytics File Backend Web Client • The file backend contains all raw files and the processed (compressed, columnar) files • DQE instances automatically retrieve their assigned files from the backend when a file has been updated. • Dependencies: • AWS S3 for saving files 1 2 4 B A 3 5
  • 10. Architecture Comparison Area Hadoop Cassandra Best In Class visualfabriq Difference Data Non-structured & structured Structured, wide-column Teradata (structured, columnar) Structured, columnar, compressed Optimized for numerical data (means: no text analytics etc.) Architecture Rack-aware, daemon based Cluster Peer-to-peer cluster Horizontally scaling, container- based microservices communicating through rabbitmq queues Easier to monitor & scale Setup Complex Complex Up & running in one minute Much, much easier to setup and rollout Cluster Maintenance Node creation and assignment usually through commercial cluster mgmt software Peer-to-peer network; auto- configures Self-registering nodes that can be assigned specific tasks and data in a web interface ETL Flume, Sqoop Bulk Loader Informatica, Talend Web based, drag & drop with wizards Web based, easy to use Language Map/Reduce; add-ons for sql (pig, hive, impala, etc.) CQL SQL MOLAP-like; sql interface to be build SQL is the standard, but because of the built-in reporting and analytics this is not something users will need Compression No No MongoDb/WiredTiger Blosc-based Saves on average 20x in disk space while speeding up reads Performance Slow, batch based; Spark can add in-memory capability (speeds up 100x) High, in-memory options High, disk-based with compression delivering in 2-3x range of in-memory Out-of-the-box near in-memory performance with file- based scaling; with advances of CPU speed, this might even surpass traditional in-memory performance Interface Restful API Restful API Restful API Restful API Reporting Only in external tools (that connect to sql-connector) Only in external tools (that connect to 3rd party connectors) Tableau (HTML5, interactive, beautiful) Built-in HTML5, interactive, extensible (d3.js based) Only solution with out-of-the-box reporting with an easy- to-use, modern web-based interface Analytics Distributed map/reduce analytics through Mahout Only as optional, paid-for module SAS, SPSS Built-in HTML5, interactive environment that incorporates leading OS machine learning (sci- kit learn), statistics (statsmodel) and propietary (POS-analytics) functionality; nb: the analytics load is not fully distributed yet Only solution with out-of-the-box analytics with an easy-to- use, modern web-based interface Security Kerberos-based security Data object security General, role-based security One point to manage all security from data access to functionality (reporting, accessibility, etc.) Open source Core is open source; several performance acceleration & mgmt tools are paid Core is open source; analytics, backup and other options are paid Core is open source; large cluster mgmt tools and vertical-specific analytics options are paid Language Java Java Python (and Cython & C)