SlideShare a Scribd company logo
1 of 22
Download to read offline
ALTIC Big Data Stack
Charly Clairmont, ALTIC
@egwada
charly.clairmont@altic.org
http://www.altic.org
smart #OpenSource Software
#BusinessIntelligence

assembler

www.ow2.org

Twitter #ow2con @egwada
Our historical tools

• ETL : Talend
• Reporting : JasperReports, Birt
• OLAP : Mondrian, Palo
• BI platform : SpagoBI

www.ow2.org

Twitter #ow2con @egwada
Smart assembling
Innovation & customers'needs
●

●

●

Identify when applied research
is an opportunity for us, our
solutions and our customers.

➔

Understand the business
process of our customer &
assess the impact of Open IT
on their activities

➔

Offer an approach of the project
both a technical and a operative

➔

➔

➔

Altic projects
Allows our customer to optimize
their business process
Takes the customer job into
account
Offers perennial solutions
Follows the customer present
needs and not the editors'
agenda

www.ow2.org

Twitter #ow2con @egwada
Identify Big Data potential / Hadoop

www.ow2.org

Twitter #ow2con @egwada
Our first Big Data project at Altic
●

eFraudBox project (2010 – 2013)
●

Goal : predict frauds on Internet

●

Context :
–
–
–

●

Customer : GIE carte bancaire
European Research and Development project
Lot of industrial and academic partners

Data :
–
–

Type : Banking transactions
Volume : One GB per day

www.ow2.org

Twitter #ow2con @egwada
How did we start our first BigData project ?

www.ow2.org

Twitter #ow2con @egwada
« In data mining processing is done
line by line »
… [ there's not about a data volume
issue ]

www.ow2.org

Twitter #ow2con @egwada
But we have too much data !

www.ow2.org

Twitter #ow2con @egwada
Let's have a look at Hadoop ?
●

Open Source

●

MPP compute platform
●

●

●

Distributed file system
MapReduce processing

Cost efficient
●

Fault tolerant

●

Infinite scale

●

Enterprise Information System ready

●

Continuous Improvement

●

« Even transactions are possible
on Hadoop - it's inevitable that ALL
kinds of workloads will move there
in the future »

Growing community

Doug CUTTING
Hadoop Creator
Octobre 2013

www.ow2.org

Twitter #ow2con @egwada
How do we query Hadoop ?

Java
● Very optimised
● Very customisable
●

Pig Latin
● Easy syntax
● Support
unstructured data
●

www.ow2.org

SQL like
● Easy development
●

Twitter #ow2con @egwada
How do we query Hadoop ?

Need to code
evertything
●

●

Why not ?

www.ow2.org

We already
know SQL !
●

Twitter #ow2con @egwada
Ok, we have our storage and
computation engine, but how can we
manage data ?
By using our Swiss Army Knife !

www.ow2.org

Twitter #ow2con @egwada
Now our Hadoop / Hive platform is filled
with Big Data,
but It's a little bit too slow to query for
end users...

http://ih2.redbubble.net/image.13088996.5766/sticker,375x360.png

www.ow2.org

Twitter #ow2con @egwada
Aggregate data
Processing data with Hive and store results in
fast databases

www.ow2.org

Twitter #ow2con @egwada
Ok, now we have our fast queryable
datasets, but how can we visualize these ?
To manage users and visualizations

To quickly have a vision of your data

To go deeper in your visualizations

www.ow2.org

Twitter #ow2con @egwada
BigData and Datamining : tMahout

+
+

= tMahout
www.ow2.org

Twitter #ow2con @egwada
BigData and Datamining v2
●

Spark : new InMemory data processing framework
●

Very appropriate for Machine learning

●

MLBase : Machine learning library

●

Spark-clustering : Implementation of SOM algorithm

●

Proof Of Concept : Analysis of mobile
telecommunications

www.ow2.org

Twitter #ow2con @egwada
We have now a Big Data stack !

www.ow2.org

Twitter #ow2con @egwada
BI & Big Data for Altic
●

Eventually, we still do BI as usual
●

Tools evolve :
–
–

●

New storage and processing
We do not change our tools, fortunately THEY progress
for us and we contribute

Fundamental does not really change, only
technologies do
–
–

Hadoop
Spark
www.ow2.org

Twitter #ow2con @egwada
We improve our Big Data stack and its
approach...
And support Big Analytic customer project

Our Big Data Stack

Our Big Data Approach

www.ow2.org

Twitter #ow2con @egwada
Questions ?
Thanks !

Charly CLAIRMONT
CTO at ALTIC
@egwada
charly.clairmont@altic.org
http://altic.org
www.ow2.org

Twitter #ow2con @egwada

More Related Content

What's hot

What's hot (15)

Traveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analyticsTraveloka's journey to no ops streaming analytics
Traveloka's journey to no ops streaming analytics
 
Wizualne budowanie aplikacji na Sparku przy pomocy narzędzia Seahorse
Wizualne budowanie aplikacji na Sparku przy pomocy narzędzia SeahorseWizualne budowanie aplikacji na Sparku przy pomocy narzędzia Seahorse
Wizualne budowanie aplikacji na Sparku przy pomocy narzędzia Seahorse
 
Big Data in the Cloud - Montreal April 2015
Big Data in the Cloud - Montreal April 2015Big Data in the Cloud - Montreal April 2015
Big Data in the Cloud - Montreal April 2015
 
ASPgems - kappa architecture
ASPgems - kappa architectureASPgems - kappa architecture
ASPgems - kappa architecture
 
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
Visual, scalable, and manageable data loading to and from Neo4j with Apache Hop
 
Lambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big dataLambda Architecture and open source technology stack for real time big data
Lambda Architecture and open source technology stack for real time big data
 
Neptune - narzędzie do monitorowania i zarządzania eksperymentami Machine Lea...
Neptune - narzędzie do monitorowania i zarządzania eksperymentami Machine Lea...Neptune - narzędzie do monitorowania i zarządzania eksperymentami Machine Lea...
Neptune - narzędzie do monitorowania i zarządzania eksperymentami Machine Lea...
 
Converging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven PoutsyConverging Big Data and Application Infrastructure by Steven Poutsy
Converging Big Data and Application Infrastructure by Steven Poutsy
 
Webinar - SpagoBI 5: here comes the Social Network analysis
Webinar - SpagoBI 5: here comes the Social Network analysis Webinar - SpagoBI 5: here comes the Social Network analysis
Webinar - SpagoBI 5: here comes the Social Network analysis
 
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
Webinar: SpagoBI 5 - Self-build your interactive cockpits, get instant insigh...
 
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDBMongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
MongoDB World 2019: Near Real-Time Analytical Data Hub with MongoDB
 
Druid meetup 2018-03-13
Druid meetup 2018-03-13Druid meetup 2018-03-13
Druid meetup 2018-03-13
 
What's new with SpagoBI 4.0 - Business Intelligence at your fingertips!
What's new with SpagoBI 4.0 - Business Intelligence at your fingertips!What's new with SpagoBI 4.0 - Business Intelligence at your fingertips!
What's new with SpagoBI 4.0 - Business Intelligence at your fingertips!
 
Openhab Grafana and Influxdb
Openhab Grafana and InfluxdbOpenhab Grafana and Influxdb
Openhab Grafana and Influxdb
 
Webinar: BI Mobile with SpagoBI: be aware everywhere!
Webinar: BI Mobile with SpagoBI: be aware everywhere!Webinar: BI Mobile with SpagoBI: be aware everywhere!
Webinar: BI Mobile with SpagoBI: be aware everywhere!
 

Viewers also liked

Big Data Benchmarking Tutorial
Big Data Benchmarking TutorialBig Data Benchmarking Tutorial
Big Data Benchmarking Tutorial
Tilmann Rabl
 
Jaspersoft Open Source Business Intelligence
Jaspersoft Open Source Business IntelligenceJaspersoft Open Source Business Intelligence
Jaspersoft Open Source Business Intelligence
OW2
 
OS Approach Industrializing Research Tools
OS Approach Industrializing Research ToolsOS Approach Industrializing Research Tools
OS Approach Industrializing Research Tools
OW2
 

Viewers also liked (20)

IEEE 2014 DOTNET DATA MINING PROJECTS Data mining with big data
IEEE 2014 DOTNET DATA MINING PROJECTS Data mining with big dataIEEE 2014 DOTNET DATA MINING PROJECTS Data mining with big data
IEEE 2014 DOTNET DATA MINING PROJECTS Data mining with big data
 
Building k-nn Graphs From Large Text Data
Building k-nn Graphs From Large Text DataBuilding k-nn Graphs From Large Text Data
Building k-nn Graphs From Large Text Data
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big Data Benchmarking Tutorial
Big Data Benchmarking TutorialBig Data Benchmarking Tutorial
Big Data Benchmarking Tutorial
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler BigData - Hadoop -by 侯圣文@secooler
BigData - Hadoop -by 侯圣文@secooler
 
IBM Bluemix Paris Meetup #20 - 20161214
IBM Bluemix Paris Meetup #20 - 20161214IBM Bluemix Paris Meetup #20 - 20161214
IBM Bluemix Paris Meetup #20 - 20161214
 
Data minig with Big data analysis
Data minig with Big data analysisData minig with Big data analysis
Data minig with Big data analysis
 
Textual Robot programming
Textual Robot programmingTextual Robot programming
Textual Robot programming
 
A data analyst view of Bigdata
A data analyst view of Bigdata A data analyst view of Bigdata
A data analyst view of Bigdata
 
BlueMind : next gen mail and collaboration solution, OW2con'16, Paris.
BlueMind : next gen mail and collaboration solution, OW2con'16, Paris. BlueMind : next gen mail and collaboration solution, OW2con'16, Paris.
BlueMind : next gen mail and collaboration solution, OW2con'16, Paris.
 
Indexing Still and Moving Images
Indexing Still and Moving ImagesIndexing Still and Moving Images
Indexing Still and Moving Images
 
Jaspersoft Open Source Business Intelligence
Jaspersoft Open Source Business IntelligenceJaspersoft Open Source Business Intelligence
Jaspersoft Open Source Business Intelligence
 
Mobile integration
Mobile integrationMobile integration
Mobile integration
 
Emerginov, A Telco Web PaaS for African Cloud, Open Cloud Forum at Cloud Expo...
Emerginov, A Telco Web PaaS for African Cloud, Open Cloud Forum at Cloud Expo...Emerginov, A Telco Web PaaS for African Cloud, Open Cloud Forum at Cloud Expo...
Emerginov, A Telco Web PaaS for African Cloud, Open Cloud Forum at Cloud Expo...
 
Palacio Gobierno del Ecuador
Palacio Gobierno del EcuadorPalacio Gobierno del Ecuador
Palacio Gobierno del Ecuador
 
OS Approach Industrializing Research Tools
OS Approach Industrializing Research ToolsOS Approach Industrializing Research Tools
OS Approach Industrializing Research Tools
 
Logic Circuit Project Final Presentation
Logic Circuit Project Final PresentationLogic Circuit Project Final Presentation
Logic Circuit Project Final Presentation
 
Jasmine Probe, OW2con11, Nov 24-25, Paris
Jasmine Probe, OW2con11, Nov 24-25, ParisJasmine Probe, OW2con11, Nov 24-25, Paris
Jasmine Probe, OW2con11, Nov 24-25, Paris
 
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris. DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
 

Similar to Altic's big analytics stack, Charly Clairmont, Altic.

Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
Provectus
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Stefan Krawczyk
 

Similar to Altic's big analytics stack, Charly Clairmont, Altic. (20)

Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive AnalyticsBig Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
Big Data for Oracle Devs - Towards Spark, Real-Time and Predictive Analytics
 
Satisfaction hadoop meetup presentation
Satisfaction hadoop meetup presentationSatisfaction hadoop meetup presentation
Satisfaction hadoop meetup presentation
 
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
IoT databases - review and challenges - IoT, Hardware & Robotics meetup - onl...
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
Unifying Events and Logs into the Cloud
Unifying Events and Logs into the CloudUnifying Events and Logs into the Cloud
Unifying Events and Logs into the Cloud
 
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
IlOUG Tech Days 2016 - Big Data for Oracle Developers - Towards Spark, Real-T...
 
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch FixData Day Texas 2017: Scaling Data Science at Stitch Fix
Data Day Texas 2017: Scaling Data Science at Stitch Fix
 
Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Feature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine LearningFeature Store as a Data Foundation for Machine Learning
Feature Store as a Data Foundation for Machine Learning
 
Elastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @DatadogElastic Data Analytics Platform @Datadog
Elastic Data Analytics Platform @Datadog
 
When it all GOes right
When it all GOes rightWhen it all GOes right
When it all GOes right
 
Hadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter PointHadoop and the Data Warehouse: Point/Counter Point
Hadoop and the Data Warehouse: Point/Counter Point
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016Dirty data? Clean it up! - Datapalooza Denver 2016
Dirty data? Clean it up! - Datapalooza Denver 2016
 
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online BootcampBuilding Modern Data Pipelines on GCP via a FREE online Bootcamp
Building Modern Data Pipelines on GCP via a FREE online Bootcamp
 
Serverless for High Performance Computing
Serverless for High Performance ComputingServerless for High Performance Computing
Serverless for High Performance Computing
 
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch FixData Day Seattle 2017: Scaling Data Science at Stitch Fix
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
 
Data science tools of the trade
Data science tools of the tradeData science tools of the trade
Data science tools of the trade
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 

More from OW2

OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2
 
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
OW2
 

More from OW2 (20)

OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in RomaOW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
OW2 and RIOS teaming up to boost the open source impact, Nov. 2022 in Roma
 
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
The Open Source Good Governance Initiative presented at RIOS OS Week, Nov. 20...
 
GLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloudGLPi v.10, les fonctionnalités principales et l'offre cloud
GLPi v.10, les fonctionnalités principales et l'offre cloud
 
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
Centreon: superviser le Cloud et le Legacy à partir d'une même plateforme, po...
 
FusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open sourceFusionIAM : la gestion des identités et des accés open source
FusionIAM : la gestion des identités et des accés open source
 
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
OW2 Association Européenne aux racines grenobloises, transformer l'industrie ...
 
SFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the EquationSFScon'20 Bringing the User into the Equation
SFScon'20 Bringing the User into the Equation
 
Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...Towards a sustainable solution to open source sustainability, OW2online20, Ju...
Towards a sustainable solution to open source sustainability, OW2online20, Ju...
 
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
Advanced proactive and polymorphing cloud application adaptation with MORPHEM...
 
Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020Open Source governance and the Eclipse Foundation, OW2online, June 2020
Open Source governance and the Eclipse Foundation, OW2online, June 2020
 
Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020Open source contribution policies, OW2online, June 2020
Open source contribution policies, OW2online, June 2020
 
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
Software development at scale, pandemic lockdown and oss ecosystems, OW2onlin...
 
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
Overview of the OpenChain Reference Tooling Work Group, OW2online20, June 2020
 
Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020Open Source Compliance at Orange, OW2online, June 2020
Open Source Compliance at Orange, OW2online, June 2020
 
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
Ideas, methods and tools for OSS Compliance assessment, OW2online, June 2020
 
Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020Intelligent package management with FASTEN, OW2online, June 2020
Intelligent package management with FASTEN, OW2online, June 2020
 
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
DECODER, a Smarter Environment for DevOps Teams , OW2online, June 2020
 
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
Enabling DevOps for IoT software development, powered by Open Source, OW2onli...
 
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
Upcoming Challenges in Artificial Intelligence Research and Development, OW2o...
 
Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020Cacti and Big Data at Orange France, OW2online, June 2020
Cacti and Big Data at Orange France, OW2online, June 2020
 

Recently uploaded

“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
Muhammad Subhan
 

Recently uploaded (20)

Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 

Altic's big analytics stack, Charly Clairmont, Altic.

  • 1. ALTIC Big Data Stack Charly Clairmont, ALTIC @egwada charly.clairmont@altic.org http://www.altic.org
  • 3. Our historical tools • ETL : Talend • Reporting : JasperReports, Birt • OLAP : Mondrian, Palo • BI platform : SpagoBI www.ow2.org Twitter #ow2con @egwada
  • 4. Smart assembling Innovation & customers'needs ● ● ● Identify when applied research is an opportunity for us, our solutions and our customers. ➔ Understand the business process of our customer & assess the impact of Open IT on their activities ➔ Offer an approach of the project both a technical and a operative ➔ ➔ ➔ Altic projects Allows our customer to optimize their business process Takes the customer job into account Offers perennial solutions Follows the customer present needs and not the editors' agenda www.ow2.org Twitter #ow2con @egwada
  • 5. Identify Big Data potential / Hadoop www.ow2.org Twitter #ow2con @egwada
  • 6. Our first Big Data project at Altic ● eFraudBox project (2010 – 2013) ● Goal : predict frauds on Internet ● Context : – – – ● Customer : GIE carte bancaire European Research and Development project Lot of industrial and academic partners Data : – – Type : Banking transactions Volume : One GB per day www.ow2.org Twitter #ow2con @egwada
  • 7. How did we start our first BigData project ? www.ow2.org Twitter #ow2con @egwada
  • 8. « In data mining processing is done line by line » … [ there's not about a data volume issue ] www.ow2.org Twitter #ow2con @egwada
  • 9. But we have too much data ! www.ow2.org Twitter #ow2con @egwada
  • 10. Let's have a look at Hadoop ? ● Open Source ● MPP compute platform ● ● ● Distributed file system MapReduce processing Cost efficient ● Fault tolerant ● Infinite scale ● Enterprise Information System ready ● Continuous Improvement ● « Even transactions are possible on Hadoop - it's inevitable that ALL kinds of workloads will move there in the future » Growing community Doug CUTTING Hadoop Creator Octobre 2013 www.ow2.org Twitter #ow2con @egwada
  • 11. How do we query Hadoop ? Java ● Very optimised ● Very customisable ● Pig Latin ● Easy syntax ● Support unstructured data ● www.ow2.org SQL like ● Easy development ● Twitter #ow2con @egwada
  • 12. How do we query Hadoop ? Need to code evertything ● ● Why not ? www.ow2.org We already know SQL ! ● Twitter #ow2con @egwada
  • 13. Ok, we have our storage and computation engine, but how can we manage data ? By using our Swiss Army Knife ! www.ow2.org Twitter #ow2con @egwada
  • 14. Now our Hadoop / Hive platform is filled with Big Data, but It's a little bit too slow to query for end users... http://ih2.redbubble.net/image.13088996.5766/sticker,375x360.png www.ow2.org Twitter #ow2con @egwada
  • 15. Aggregate data Processing data with Hive and store results in fast databases www.ow2.org Twitter #ow2con @egwada
  • 16. Ok, now we have our fast queryable datasets, but how can we visualize these ? To manage users and visualizations To quickly have a vision of your data To go deeper in your visualizations www.ow2.org Twitter #ow2con @egwada
  • 17. BigData and Datamining : tMahout + + = tMahout www.ow2.org Twitter #ow2con @egwada
  • 18. BigData and Datamining v2 ● Spark : new InMemory data processing framework ● Very appropriate for Machine learning ● MLBase : Machine learning library ● Spark-clustering : Implementation of SOM algorithm ● Proof Of Concept : Analysis of mobile telecommunications www.ow2.org Twitter #ow2con @egwada
  • 19. We have now a Big Data stack ! www.ow2.org Twitter #ow2con @egwada
  • 20. BI & Big Data for Altic ● Eventually, we still do BI as usual ● Tools evolve : – – ● New storage and processing We do not change our tools, fortunately THEY progress for us and we contribute Fundamental does not really change, only technologies do – – Hadoop Spark www.ow2.org Twitter #ow2con @egwada
  • 21. We improve our Big Data stack and its approach... And support Big Analytic customer project Our Big Data Stack Our Big Data Approach www.ow2.org Twitter #ow2con @egwada
  • 22. Questions ? Thanks ! Charly CLAIRMONT CTO at ALTIC @egwada charly.clairmont@altic.org http://altic.org www.ow2.org Twitter #ow2con @egwada