ALTIC Big Data Stack
Charly Clairmont, ALTIC
@egwada
charly.clairmont@altic.org
http://www.altic.org
smart #OpenSource Software
#BusinessIntelligence

assembler

www.ow2.org

Twitter #ow2con @egwada
Our historical tools

• ETL : Talend
• Reporting : JasperReports, Birt
• OLAP : Mondrian, Palo
• BI platform : SpagoBI

ww...
Smart assembling
Innovation & customers'needs
●

●

●

Identify when applied research
is an opportunity for us, our
soluti...
Identify Big Data potential / Hadoop

www.ow2.org

Twitter #ow2con @egwada
Our first Big Data project at Altic
●

eFraudBox project (2010 – 2013)
●

Goal : predict frauds on Internet

●

Context :
...
How did we start our first BigData project ?

www.ow2.org

Twitter #ow2con @egwada
« In data mining processing is done
line by line »
… [ there's not about a data volume
issue ]

www.ow2.org

Twitter #ow2c...
But we have too much data !

www.ow2.org

Twitter #ow2con @egwada
Let's have a look at Hadoop ?
●

Open Source

●

MPP compute platform
●

●

●

Distributed file system
MapReduce processin...
How do we query Hadoop ?

Java
● Very optimised
● Very customisable
●

Pig Latin
● Easy syntax
● Support
unstructured data...
How do we query Hadoop ?

Need to code
evertything
●

●

Why not ?

www.ow2.org

We already
know SQL !
●

Twitter #ow2con ...
Ok, we have our storage and
computation engine, but how can we
manage data ?
By using our Swiss Army Knife !

www.ow2.org
...
Now our Hadoop / Hive platform is filled
with Big Data,
but It's a little bit too slow to query for
end users...

http://i...
Aggregate data
Processing data with Hive and store results in
fast databases

www.ow2.org

Twitter #ow2con @egwada
Ok, now we have our fast queryable
datasets, but how can we visualize these ?
To manage users and visualizations

To quick...
BigData and Datamining : tMahout

+
+

= tMahout
www.ow2.org

Twitter #ow2con @egwada
BigData and Datamining v2
●

Spark : new InMemory data processing framework
●

Very appropriate for Machine learning

●

M...
We have now a Big Data stack !

www.ow2.org

Twitter #ow2con @egwada
BI & Big Data for Altic
●

Eventually, we still do BI as usual
●

Tools evolve :
–
–

●

New storage and processing
We do ...
We improve our Big Data stack and its
approach...
And support Big Analytic customer project

Our Big Data Stack

Our Big D...
Questions ?
Thanks !

Charly CLAIRMONT
CTO at ALTIC
@egwada
charly.clairmont@altic.org
http://altic.org
www.ow2.org

Twitt...
Upcoming SlideShare
Loading in …5
×

Altic's big analytics stack, Charly Clairmont, Altic.

953 views

Published on

For a long time Altic has been an active member of the OW2 BI Initiative. Since a few years, Altic has taken a deep interest in the Big Data technologies, like many others actors of the OW2 consortium. Some of them even have added new features related to Big Data in their offers. Altic and its partners, Talend and Engeeniring Informatica (SpagoBI), have decided to create a Big Data Stack using their own solutions. The magic thing : Altic hasn't changed the way its projects are done but only learnt how to store Big Data and compute them. In this presentation we will propose to discover our Big Data stack with : * Hadoop and Spark to store and compute data * Talend DI to create Big Data tasks published and scheduled in SpagoBI * SpagoBI to manage security and allow end users to access to data visualization. For a more user friendly Big Data stack we provide new components : * tMahout talend component which helps us to create Datamining job inside Hadoop without code development in MapReduce paradingm, * SpagoBID3Engine which provides easy manipulation of data to develop beautiful data visualizations

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
953
On SlideShare
0
From Embeds
0
Number of Embeds
257
Actions
Shares
0
Downloads
16
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Altic's big analytics stack, Charly Clairmont, Altic.

  1. 1. ALTIC Big Data Stack Charly Clairmont, ALTIC @egwada charly.clairmont@altic.org http://www.altic.org
  2. 2. smart #OpenSource Software #BusinessIntelligence assembler www.ow2.org Twitter #ow2con @egwada
  3. 3. Our historical tools • ETL : Talend • Reporting : JasperReports, Birt • OLAP : Mondrian, Palo • BI platform : SpagoBI www.ow2.org Twitter #ow2con @egwada
  4. 4. Smart assembling Innovation & customers'needs ● ● ● Identify when applied research is an opportunity for us, our solutions and our customers. ➔ Understand the business process of our customer & assess the impact of Open IT on their activities ➔ Offer an approach of the project both a technical and a operative ➔ ➔ ➔ Altic projects Allows our customer to optimize their business process Takes the customer job into account Offers perennial solutions Follows the customer present needs and not the editors' agenda www.ow2.org Twitter #ow2con @egwada
  5. 5. Identify Big Data potential / Hadoop www.ow2.org Twitter #ow2con @egwada
  6. 6. Our first Big Data project at Altic ● eFraudBox project (2010 – 2013) ● Goal : predict frauds on Internet ● Context : – – – ● Customer : GIE carte bancaire European Research and Development project Lot of industrial and academic partners Data : – – Type : Banking transactions Volume : One GB per day www.ow2.org Twitter #ow2con @egwada
  7. 7. How did we start our first BigData project ? www.ow2.org Twitter #ow2con @egwada
  8. 8. « In data mining processing is done line by line » … [ there's not about a data volume issue ] www.ow2.org Twitter #ow2con @egwada
  9. 9. But we have too much data ! www.ow2.org Twitter #ow2con @egwada
  10. 10. Let's have a look at Hadoop ? ● Open Source ● MPP compute platform ● ● ● Distributed file system MapReduce processing Cost efficient ● Fault tolerant ● Infinite scale ● Enterprise Information System ready ● Continuous Improvement ● « Even transactions are possible on Hadoop - it's inevitable that ALL kinds of workloads will move there in the future » Growing community Doug CUTTING Hadoop Creator Octobre 2013 www.ow2.org Twitter #ow2con @egwada
  11. 11. How do we query Hadoop ? Java ● Very optimised ● Very customisable ● Pig Latin ● Easy syntax ● Support unstructured data ● www.ow2.org SQL like ● Easy development ● Twitter #ow2con @egwada
  12. 12. How do we query Hadoop ? Need to code evertything ● ● Why not ? www.ow2.org We already know SQL ! ● Twitter #ow2con @egwada
  13. 13. Ok, we have our storage and computation engine, but how can we manage data ? By using our Swiss Army Knife ! www.ow2.org Twitter #ow2con @egwada
  14. 14. Now our Hadoop / Hive platform is filled with Big Data, but It's a little bit too slow to query for end users... http://ih2.redbubble.net/image.13088996.5766/sticker,375x360.png www.ow2.org Twitter #ow2con @egwada
  15. 15. Aggregate data Processing data with Hive and store results in fast databases www.ow2.org Twitter #ow2con @egwada
  16. 16. Ok, now we have our fast queryable datasets, but how can we visualize these ? To manage users and visualizations To quickly have a vision of your data To go deeper in your visualizations www.ow2.org Twitter #ow2con @egwada
  17. 17. BigData and Datamining : tMahout + + = tMahout www.ow2.org Twitter #ow2con @egwada
  18. 18. BigData and Datamining v2 ● Spark : new InMemory data processing framework ● Very appropriate for Machine learning ● MLBase : Machine learning library ● Spark-clustering : Implementation of SOM algorithm ● Proof Of Concept : Analysis of mobile telecommunications www.ow2.org Twitter #ow2con @egwada
  19. 19. We have now a Big Data stack ! www.ow2.org Twitter #ow2con @egwada
  20. 20. BI & Big Data for Altic ● Eventually, we still do BI as usual ● Tools evolve : – – ● New storage and processing We do not change our tools, fortunately THEY progress for us and we contribute Fundamental does not really change, only technologies do – – Hadoop Spark www.ow2.org Twitter #ow2con @egwada
  21. 21. We improve our Big Data stack and its approach... And support Big Analytic customer project Our Big Data Stack Our Big Data Approach www.ow2.org Twitter #ow2con @egwada
  22. 22. Questions ? Thanks ! Charly CLAIRMONT CTO at ALTIC @egwada charly.clairmont@altic.org http://altic.org www.ow2.org Twitter #ow2con @egwada

×