BigData and Machine Learning: Usage and Opportunities for your IT department
Talk presented at The Developer Conference in São Paulo - 12/0713
Mathieu DESPRIEE
As we move into a new era of ITSM computing, new big data and machine learning tools and methodologies are being developed to support IT staff by intelligently extracting insights and making predictions from the enormous amounts of data accumulated from the organization. According to Gartner, I&O leaders must take a comprehensive approach to incorporate advanced big data and machine learning technologies into their organizations or risk becoming irrelevant. But what exactly is big data and machine learning all about? How can you introduce these concepts into your existing Service Desk?
Join USF’s distinguished Computer Science and Engineering Professor Lawrence Hall and SunView Software’s VP of Marketing and Product Strategy John Prestridge as they break down the fundamentals of big data and machine learning and provide real-world examples of the impact the technologies will have on ITSM.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
As we move into a new era of ITSM computing, new big data and machine learning tools and methodologies are being developed to support IT staff by intelligently extracting insights and making predictions from the enormous amounts of data accumulated from the organization. According to Gartner, I&O leaders must take a comprehensive approach to incorporate advanced big data and machine learning technologies into their organizations or risk becoming irrelevant. But what exactly is big data and machine learning all about? How can you introduce these concepts into your existing Service Desk?
Join USF’s distinguished Computer Science and Engineering Professor Lawrence Hall and SunView Software’s VP of Marketing and Product Strategy John Prestridge as they break down the fundamentals of big data and machine learning and provide real-world examples of the impact the technologies will have on ITSM.
machine learning in the age of big data: new approaches and business applicat...Armando Vieira
Presentation at University of Lisbon on Machine Learning and big data.
Deep learning algorithms and applications to credit risk analysis, churn detection and recommendation algorithms
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
A presentation delivered by Mohammed Barakat on the 2nd Jordanian Continuous Improvement Open Day in Amman. The presentation is about Data Science and was delivered on 3rd October 2015.
Deep Learning Use Cases - Data Science Pop-up SeattleDomino Data Lab
Companies like Google, Microsoft, Amazon and Facebook are in fierce competition for teams that can build deep-learning applications. Because of deep learning's general usefulness in pattern recognition, those applications are surprisingly diverse, ranging from image recognition to machine translation. This talk will explore deep learning use cases for the major data types -- image, sound, text and time series -- as they're emerging in the private sector. Presented by Chris Nicholson, Co-Founder and CEO at Skymind.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
Machine learning with Big Data power point presentationDavid Raj Kanthi
This is an article made form the articles of IEEE published in the year 2017
The following presentation has the slides for the Title called the
Machine Learning with Big data. that following presentation which has the challenges and approaches of machine learning with big data.
The integration of the Big Data with Machine Learning has so many challenges that Big data has and what is the approach made by the machine learning mechanism for those challenges.
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
This Data Science Presentation will help you in understanding what is Data Science, why we need Data Science, prerequisites for learning Data Science, what does a Data Scientist do, Data Science lifecycle with an example and career opportunities in Data Science domain. You will also learn the differences between Data Science and Business intelligence. The role of a data scientist is one of the sexiest jobs of the century. The demand for data scientists is high, and the number of opportunities for certified data scientists is increasing. Every day, companies are looking out for more and more skilled data scientists and studies show that there is expected to be a continued shortfall in qualified candidates to fill the roles. So, let us dive deep into Data Science and understand what is Data Science all about.
This Data Science Presentation will cover the following topics:
1. Need for Data Science?
2. What is Data Science?
3. Data Science vs Business intelligence
4. Prerequisites for learning Data Science
5. What does a Data scientist do?
6. Data Science life cycle with use case
7. Demand for Data scientists
This Data Science with Python course will establish your mastery of data science and analytics techniques using Python. With this Python for Data Science Course, you’ll learn the essential concepts of Python programming and become an expert in data analytics, machine learning, data visualization, web scraping and natural language processing. Python is a required skill for many data science positions, so jumpstart your career with this interactive, hands-on course.
Why learn Data Science?
Data Scientists are being deployed in all kinds of industries, creating a huge demand for skilled professionals. Data scientist is the pinnacle rank in an analytics organization. Glassdoor has ranked data scientist first in the 25 Best Jobs for 2016, and good data scientists are scarce and in great demand. As a data you will be required to understand the business problem, design the analysis, collect and format the required data, apply algorithms or techniques using the correct tools, and finally make recommendations backed by data.
The Data Science with python is recommended for:
1. Analytics professionals who want to work with Python
2. Software professionals looking to get into the field of analytics
3. IT professionals interested in pursuing a career in analytics
4. Graduates looking to build a career in analytics and data science
5. Experienced professionals who would like to harness data science in their fields
A presentation delivered by Mohammed Barakat on the 2nd Jordanian Continuous Improvement Open Day in Amman. The presentation is about Data Science and was delivered on 3rd October 2015.
Deep Learning Use Cases - Data Science Pop-up SeattleDomino Data Lab
Companies like Google, Microsoft, Amazon and Facebook are in fierce competition for teams that can build deep-learning applications. Because of deep learning's general usefulness in pattern recognition, those applications are surprisingly diverse, ranging from image recognition to machine translation. This talk will explore deep learning use cases for the major data types -- image, sound, text and time series -- as they're emerging in the private sector. Presented by Chris Nicholson, Co-Founder and CEO at Skymind.
Introduction to various data science. From the very beginning of data science idea, to latest designs, changing trends, technologies what make then to the application that are already in real world use as we of now.
Introduction to Data Science and AnalyticsSrinath Perera
This webinar serves as an introduction to WSO2 Summer School. It will discuss how to build a pipeline for your organization and for each use case, and the technology and tooling choices that need to be made for the same.
This session will explore analytics under four themes:
Hindsight (what happened)
Oversight (what is happening)
Insight (why is it happening)
Foresight (what will happen)
Recording http://t.co/WcMFEAJHok
How to Become a Data Scientist
SF Data Science Meetup, June 30, 2014
Video of this talk is available here: https://www.youtube.com/watch?v=c52IOlnPw08
More information at: http://www.zipfianacademy.com
Zipfian Academy @ Crowdflower
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
Big Data and Data Science have become increasingly imperative areas in both industry and academia to the extent that every company wants to hire a Data Scientist and every university wants to start dedicated degree programs and centres of excellence in Data Science. Big Data and Data Science have led to technologies that have already shaped different aspects of our lives such as learning, working, travelling, purchasing, social relationships, entertainments, physical activities, medical treatments, etc. This talk will attempt to cover the landscape of some of the important topics in these exponentially growing areas of Data Science and Big Data including the state-of-the-art processes, commercial and open-source platforms, data processing and analytics algorithms (specially large scale Machine Learning), application areas in academia and industry, the best industry practices, business challenges and what it takes to become a Data Scientist.
Machine learning with Big Data power point presentationDavid Raj Kanthi
This is an article made form the articles of IEEE published in the year 2017
The following presentation has the slides for the Title called the
Machine Learning with Big data. that following presentation which has the challenges and approaches of machine learning with big data.
The integration of the Big Data with Machine Learning has so many challenges that Big data has and what is the approach made by the machine learning mechanism for those challenges.
A presentation to the Research Vessel Users Workshop at the Marine Institute, Ireland on 28th April 2016. Highlighting recent progress and future directions in managing data from the fleet.
INNAV - VTMIS
Information Navigation System e Vessel Traffic Management Information System (System Management and Vessel Traffic Information). This is a significant technological advance for our ports, for the VTMIS team the main and busiest terminals in the world.
The VTMIS is an electronic aid to navigation system, able to provide active monitoring of maritime traffic.
The Victoria Harbour was the first in Brazil to hire the system. The Dock Company of the Holy Spirit (Codesa).
Marine salvage and the protection of the marine environmentTiago Zanella
Marine salvage is one of the most important institutes of Maritime Law. Arisen from customary law, it came to be regulated by law and currently has specific rules.One of the principal rules inmarine salvage is the ‘no cure no pay’ principle.In this, the savior only will receive the payment by the salvage work if a useful result is achieved.The ‘no cure no pay’ was always an imperative principle in marine salvage. However, currently it is gradually being relativized.This practice occurs because of the needs to protectthe marine environment. Thus, even without a useful result, the savior can receive a payment for their services if they protected the marine environment. This article examines this development in principle ‘no cure no pay’ and how the maritime law applies it.
SQLearn’s vLMS is a web based e-learning system specifically designed for the shipping industry. vLMS can be used from any modern PC and helps the crew as well as the officers to access training resources and assessments https://www.sqlearn.com
Rolta Vessel Traffic Management System (VTMS) makes significant contribution to more secure route, efficient traffic flow, and insurance of the environment in confined and busy waterways by active monitoring, provision of information services, traffic management and navigational help to vessels. To achieve above goals Rolta offers accurate components like radar, day/night vision tracking system, AIS, DF, ENC, hydrological and meteorological sensors and many more. Because of these components, VTMS encourages fast and efficient handling of Incidents and crisis circumstances.
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
Don't forget! You can watch the full Datameer recording here:
http://info.datameer.com/Online-Slideshare-Big-Data-Analytics-Machine-Learning-OnDemand.html
Learn through industry use cases, how to empower users to identify patterns & relationships for recommendations using big data analytics.
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsArpana Awasthi
BCA Department of JIMS Vasant Kunj-II is one of the best BCA colleges in Delhi NCR. The curriculum is well updated and the subjects included all the latest technologies which are in demand.
JIMS BCA course teaches Python to II semester students and Artificial Intelligence Using Python to Sixth Semester students.
Here is a small article on the Future of Machine Learning, hope you will find it useful.
Machine Learning is a field of Computer science in which computer systems are able to learn from past experiences, examples, environments. With help of various Machine Learning Algorithms, Computers are provided with the ability to sense the data and produce some relevant results.
Machine learning Algorithms provide the technique of predicting the future outcomes or classifying information from the given input to the Machines so that the appropriate decisions can be taken.
Slide presentasi ini dibawakan oleh Imron Zuhri dalam acara Seminar & Workshop Pengenalan & Potensi Big Data & Machine Learning yang diselenggarakan oleh KUDO pada tanggal 14 Mei 2016.
Slides to a talk from @Chris_Betz, (https://data42.de) on AI, artificial intelligence, machine learning. What's driving AI hype and what's behind it. Understand general concepts and dig deep into explainability, debugging, verification, testing of machine learning solutions.
This presentation gives an overview of machine learning, discusses its potential in an industrial setting and presents two industrial applications (energy load forecasting, understanding academic publications) based on the speaker's experience.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/02/introducing-machine-learning-and-how-to-teach-machines-to-see-a-presentation-from-tryolabs/
Facundo Parodi, Research and Machine Learning Engineer at Tryolabs, presents the “Introduction to Machine Learning and How to Teach Machines to See” tutorial at the September 2020 Embedded Vision Summit.
What is machine learning? How can machines distinguish a cat from a dog in an image? What’s the magic behind convolutional neural networks? These are some of the questions Parodi answers in this introductory talk on machine learning in computer vision.
Parodi introduces machine learning and explores the different types of problems it can solve. He explains the main components of practical machine learning, from data gathering and training to deployment. He then focuses on deep learning as an important machine learning technique and provides an introduction to convolutional neural networks and how they can be used to solve image classification problems. Parodi will also touches on recent advancements in deep learning and how they have revolutionized the entire field of computer vision.
Machine Learning with Azure and Databricks Virtual WorkshopCCG
Join CCG and Microsoft for a hands-on demonstration of Azure’s machine learning capabilities. During the workshop, we will:
- Hold a Machine Learning 101 session to explain what machine learning is and how it fits in the analytics landscape
- Demonstrate Azure Databricks’ capabilities for building custom machine learning models
- Take a tour of the Azure Machine Learning’s capabilities for MLOps, Automated Machine Learning, and code-free Machine Learning
By the end of the workshop, you’ll have the tools you need to begin your own journey to AI.
Machine learning for sensor Data AnalyticsMATLABISRAEL
במצגת זאת נראה כיצד עושים Machine Learning בסביבת MATLAB. נציג מספר יכולות ואפליקציות מובנות ההופכות את תהליך למידת המכונה ליעיל ומהיר יותר – כלים כמו ה-Classification Learner, ה-Regression Learner ו-Bayesian Optimization. בהסתמך על מידע המתקבל מחיישני סמארטפון, נבנה מערכת סיווג המזהה את הפעילות שמבצע המשתמש – הליכה, טיפוס במדרגות, שכיבה, וכו'
Le Comptoir OCTO - Qu'apporte l'analyse de cycle de vie d'un audit d'éco-conc...OCTO Technology
par Nicolas Bordier (Consultant numérique responsable @OCTO Technology) et Alaric Rougnon-Glasson (Sustainable Tech Consultant @OCTO Technology)
Sur un exemple très concret d’audit d’éco-conception de l’outil de bilan carbone C’Bilan développé par ICDC (Filiales de la Caisse des dépôts et consignations) nous allons expliquer en quoi l’ACV (analyse de cycle de vie) a été déterminante pour identifier les pistes d’actions pour réduire jusqu'à 82% de l’empreinte environnementale du service.
Le Comptoir OCTO - Se conformer à la CSRD : un levier d'action insoupçonnéOCTO Technology
Se conformer à la CSRD : un levier d'action insoupçonné
par Bintou Diarra (Manager EPM @OCTO Technology), Chloé Wibaux (Consulting & stratégie @Accenture) et Frédéric Lenci (Partner @OCTO Technology)
À partir de 2024, plus de 50 000 entreprises en Europe rendront compte de leurs impacts environnementaux et sociaux, ainsi que sur les mesures prises pour les améliorer. Pour répondre à cette obligation, elles devront se conformer à la Corporate Sustainability Reporting Directive (CSRD) en exploitant toutes leurs données, ce qui représente un défi majeur. Grâce à notre expertise dans le domaine de la donnée, pilotage opérationnel et solutions technologiques, nous sommes prêts à aider nos clients à relever ces défis liés au reporting CSRD. Lors de ce Comptoir, nous exposerons notre approche de gouvernance des données ESG, nos méthodes pour piloter les actions ainsi que les solutions pour générer le rapport et superviser les initiatives ESG de manière opérationnelle.
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloudOCTO Technology
Comment choisir son architecture MLOps dans le cloud ?
par Baptiste Courbe (Senior Data Consultant & MLOps @OCTO Technology)
Choisir une architecture aura de grands impacts en termes de mise en œuvre, de maintenabilité, d'évolutivité, de passage à l’échelle, etc. À travers nos retours d’expérience sur les différents cloud providers, venez découvrir les différents niveaux de complexité de telles architectures et les critères de décision.
Que vous soyez débutant ou expert en MLOps, nous vous donnerons les clés pour faire les bons choix techniques.Vous repartirez avec une vue d’ensemble des bonnes pratiques et des pièges à éviter pour déployer vos applications de Machine Learning XGBoost ou LLM dans le cloud.
Vidéo Youtube : https://www.youtube.com/watch?v=j_5pI6iYRs4&list=PLBD8R108T9T4D3mcLiDpT67f9ERg1Hm2r&index=57
Compte-rendu :
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...OCTO Technology
Par Philippe Stepniewski - ML Engineer
Et si nous développions ensemble un moteur de recherche multimodal texte-image ? Imaginons un moteur qui, à partir d'une simple description textuelle d'un produit sur un site d'e-commerce, puisse trouver instantanément les images correspondantes, sans nécessiter la saisie préalable de textes descriptifs pour nos produits ! Les bases de données vectorielles seront au centre de cet atelier. Il serait tout à fait possible de prendre une solution clé en main, mais où serait le fun là-dedans ? Rien de tel que de mettre les mains dans le code pour comprendre le fonctionnement de tels concepts, alors implémentons en une nous-mêmes ! Pré-requis : Pour assister à cet atelier et comprendre ce qui sera affiché à l’écran, il est recommandé d’être à l’aise dans la lecture de code Python manipulant des données vectorielles (type Numpy). Même si nous effectuerons des rappels en début d’atelier, des rudiments en data science vous aideront à comprendre les concepts manipulés : CNN, embedding, distance/similarités entre vecteurs.
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...OCTO Technology
Par Philippe Prados - Pionnier de l'informatique
Un des usages les plus fréquents des Larges Languages Models (LLM) consiste à répondre à des questions à partir d’une base documentaire : le fameux Retrieval Augmented Generation (RAG). Les démonstrations font leur effet wahou ! Les douleurs arrivent lorsque la solution est vraiment utilisée : le modèle répond à côté, ignore des informations présentes dans les documents… Comment aller plus loin ? Comment rendre la solution plus robuste ? Plus fiable ? Pour répondre à ces questions, nous allons mettre les mains dans le code, dans l’architecture, pour appliquer les concepts classiques de l’informatique aux RAG. Pré-requis : Même si un rappel sera fait au début, pour assister à cet atelier et comprendre ce qui sera affiché à l’écran, il est recommandé d’être à l’aise dans la lecture de code Python et les principes de base des modèles de langage et des bases vectorielles.
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...OCTO Technology
Maîtriser le RAG : connecter les modèles d’IA génératives aux données de l’entreprise
par Nicolas Cavallo (Head of Natural Language Processing @OCTO Technology)
Les chatbots intelligents qui répondent directement aux clients, des tâches accélérées et simplifiées pour les salariés via des services d’assistance helpdesk automatisés, etc. Après plusieurs mois de développement et d’implémentation de projets de génération augmentée de récupération (Retrieval Augmented Generation - RAG),faisons le bilan sur ce principal cas d’usage à base d’IA génératives.
Nous détaillerons le fonctionnement du RAG qui permet de connecter la puissance de l’IA générative au patrimoine informationnel des entreprises. Nous examinerons plus particulièrement les méthodologies pour les évaluer et les améliorer. Grâce à nos retours d’expérience, nous détaillerons des stratégies d’intégration dans un environnement souverain.
Vidéo Youtube : https://www.youtube.com/watch?v=9tmlseutQM8
Compte-rendu : https://blog.octo.com/maitriser-le-rag-retrieval-augmented-generation
OCTO Talks - Les IA s'invitent au chevet des développeursOCTO Technology
Les IA s'invitent au chevet des développeurs : rêve ou cauchemar ?
par Alain Faure (Architecte @OCTO Technology) et Alexandre Jeambrun (Programmeur, Crafter & Coach tech @OCTO Technology)
L’année 2023 marque le retour en force de l’intelligence artificielle avec la démocratisation des IA génératives qui ont le potentiel de perturber de nombreuses activités et en particulier le développement d'applications. La révolution de l’IA dans le développement de code n’a pas attendu le buzz de chatGPT. La société TabNine est créée en 2017 et dès 2018 Microsoft lance Intellicode, puis copilot en 2022. AWS se joint au mouvement avec CodeWhisperer. Tous ces outils sont opérationnels et leur base d'utilisateurs s'accroît jour après jour.
Effet de mode, évolution ou révolution, est ce que bientôt il y aura moins de développeurs ? Que savent vraiment faire ces IA ? Quels types de gains en attendre et comment les utiliser ? Comment les équipes accueillent elles ces outils ? Faut-il les former ? Quels risques y a-t-il à utiliser ces outils ? Bienvenue à un tour complet de l'IA dans le monde du développement
Lancement du livre Culture Test Vol.2
par Sylvie Ponthus (développeuse, chef de projet et coach agile @ OCTO Technology), Stéphane Bedeau (développeur et formateur @OCTO Technology) et Christophe Breheret-Girardin (Coach Craft, formateur et conférencier @OCTO Technology)
Et si tester, c’était mieux faire, faire plus rapidement ? À l'occasion de la sortie du premier volume de notre trilogie Culture Test, on vous donne rendez-vous le mardi 5 décembre dans les locaux d'OCTO Technology pour rencontrer les auteurs, confronter les points de vue, et vous dévoiler en exclusivité le prochain tome
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...OCTO Technology
Green AI, comment éviter que votre votre potion magique d’IA ne se transforme en poison ?
par Eric Biernat (Directeur Big Data Analytics @OCTO Technology) et Reynald Riviere (Manager Sénior Data Science @OCTO Technology)
Après l’électricité et l’Internet, nous sommes maintenant dans l’ère de l’IA avec des modèles qui optimisent l’usage de nos ressources … sans avoir conscience que ces modèles d’IA sont eux aussi énergivores. Venir découvrir comment l’écologie de l'IA est devenu notre quête avec le Green AI selon 3 angles de vue : le software, le hardware et le processus.
Vidéo Youtube : https://www.youtube.com/watch?v=7nWADBWA22c
Compte-rendu : https://blog.octo.com/comptoir-green-ai
OCTO Talks - State of the art Architecture dans les frontend webOCTO Technology
Vous avez dit architecture front 💅 ? Du CSS au CDN, personne ne sera épargné !
par Pierrette Bertrand (Head of Web Front Development @OCTO Technology), David Ostermann (Developpeur Front End @OCTO Technology) et Florian Leroy (Consultant Senior @OCTO Technology)
Qu’est-ce qu’une architecture front ? Selon que vous demandiez à un intégrateur, un développeur frontend, un développeur d’API ou tout simplement, à un architecte, la réponse sera bien différente. Car selon notre expérience, ce dernier, n’a en réalité qu’une partie de la réponse. Dans ce talk, nous regarderons ensemble les avantages et inconvénients de la multitude des choix possibles, à chaque étage, afin de donner une carte utile pour s’y retrouver. N’en déplaise aux développeurs front, le concept d’architecture front dépasse de loin le choix de leur framework préféré !
Cette Refcard est un condensé de bonnes pratiques qui s'adresse autant aux consommateurs qu'aux développeurs d'API GraphQL.
Elle aborde notamment la documentation, le versioning, le code first, le monitoring, la découvrabilité, la sécurité et le design de schéma.
Comment la culture d'entreprise peut faire la différence lors d'une fusion acquisition ?
par Lucie quach, Vanessa Govi et Frédéric Lenci
Comment la Culture s'est retrouvée parmi les sujets d'intégration clés lors d'une fusion ? Venez découvrir les coulisses de 6 mois de la co-construction entre ALD/Leaseplan pour définir la culture commune de l'entreprise de 15 700 personnes couvrant 60 pays et les supports que l'on a co-créés pour l'implémenter aussi bien au COMEX que sur le terrain.
Vidéo Youtube : https://www.youtube.com/watch?v=smnpq7Ey9pk
Compte-rendu : https://blog.octo.com/compte-rendu-du-comptoir-definition-de-la-culture-dentreprise-issue-dune-fusion
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ? OCTO Technology
Par Antoine Moreau (Head of Data & AI @OCTO Technology), Pierre Sabrié (Directeur Prévision @Groupe Casino) et Nicolas Gery (Retail Strategy & Consulting Senior Manager @Accenture)
Comment sécuriser la disponibilité des produits en rayon, réduire la casse et les stocks, et gagner en efficacité en centrale et en magasins ?
Casino a réussi rapidement ce pari grâce à une solution algorithmique capable de traiter des données à la maille la plus fine (articles x magasins), de manière quotidienne en se basant sur les assets Cloud.
Rejoignez Pierre, Antoine et Nicolas qui vous partageront les réussites, les difficultés rencontrées et la démarche Casino.
Vidéo Youtube : https://www.youtube.com/watch?v=6oX4NvXZkTk&list=PLBD8R108T9T4D3mcLiDpT67f9ERg1Hm2r&index=47
Compte-rendu : https://blog.octo.com/compte-rendu-le-comptoir-x-casino-comment-optimiser-les-stocks-en-lineaire-par-la-data/
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...OCTO Technology
Par Julie François (Consultante et formatrice RGPD @OCTO Technology)
Le RGPD a fêté ses 5 ans de mise en application et vous pensez toujours que votre équipe ne manipule pas “vraiment” de données personnelles ? Alors ce Comptoir OCTO est fait pour vous !
Chez OCTO, nous avons la conviction que le RGPD n'est pas qu'une affaire de juristes. Alors embarquez avec nous pour une sensibilisation rythmée et parlante. Au programme des retours d’expérience sur 5 années de mise en œuvre, qui vous feront découvrir le sujet de la protection des données autrement.
Vidéo Youtube : https://www.youtube.com/watch?v=uum3Qxisuu0&list=PLBD8R108T9T4D3mcLiDpT67f9ERg1Hm2r&index=51
Compte-rendu : https://blog.octo.com/compte-rendu-du-comptoir-retour-sur-5-ans-de-mise-en-oeuvre-comment-le-rgpd-a-reinvente-la-protection-des-donnees-personnelles/
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...OCTO Technology
par Wilde Diogene (Manager EPM @OCTO Technology), Samir Benyoucef (Consultant @OCTO Technology) et Elghali Guessous (Delivery Manager EPM @OCTO Technology)
Les approches traditionnelles de planification, basées sur un consensus entre différents départements (ventes, marketing, finance), peuvent être consommatrices de temps et aboutir à des prévisions inexactes. Découvrez comment exploiter l'IA et le Machine Learning pour créer une plateforme de prévision du chiffre d'affaires intelligente. En associant le planning distribué de Pigment (EPM) et la puissance prédictive de Dataiku (Auto ML), vous bénéficiez d'un gain de temps significatif dans votre planification, d'une prise de décision éclairée et d'une meilleure gestion de vos ressources (humaines, production, stocks...).
Surmontez les incertitudes et pilotez votre entreprise vers le succès avec confiance.
Vidéo Youtube : https://www.youtube.com/watch?v=tBwlWAksFik&list=PLBD8R108T9T4D3mcLiDpT67f9ERg1Hm2r&index=48
Compte-rendu : https://blog.octo.com/affinez-vos-forecast-avec-le-planning-distribue-et-lautoml/
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conceptionOCTO Technology
Par Brice Le Roux (GreenOps @OCTO Technology) et Frédéric Menetreux (Architecte d’entreprise @CA-GIP)
Vous souhaitez acquérir les leviers d’action pour mettre en œuvre la sobriété numérique et mesurer les impacts de votre infrastructure ? Rejoignez Brice et Frédéric qui vous partageront les réussites et améliorations de la formation réalisée au Crédit Agricole par OCTO Academy
Vidéo Youtube : https://www.youtube.com/watch?v=efrJT_ZJ5fk&list=PLBD8R108T9T4D3mcLiDpT67f9ERg1Hm2r&index=50
Compte-rendu : https://blog.octo.com/les-comptoirs-octo-la-formation-au-coeur-de-la-strategie-deco-conception-de-linfra/
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...OCTO Technology
Par Wassel Alazhar (Architecte @OCTO Technology), François-Xavier Bouffant (Engineering Manager @Wakam )et Etienne Debost (Head of Architecture @Wakam)
La littérature promeut les plateformes digitales comme un levier de croissance pour les entreprises et un vrai avantage stratégique dans l’économie numérique.
Force est de constater que les entreprises qui se lancent dans cette aventure échouent : elles n’arrivent pas à dépasser le Proof Of Concept ou bien s’enlisent dans la paralysis analysis après des millions d’euros dépensés.
Nous vous partageons un retour sur l'expérience Wakam. Nous avons réussi à amorcer une dynamique pour construire une plateforme (tunnel de distribution en marque blanche, APIs, web apps, blockchain...) qui permet d’innover, de fournir des capacités métiers sous forme de commodité et d’assurer une expérience hyper personnalisable aux partenaires, en moins de 6 mois
Vidéo Youtube : https://www.youtube.com/watch?v=tfioZZTfX1M&list=PLBD8R108T9T4D3mcLiDpT67f9ERg1Hm2r&index=49
Compte-rendu : https://blog.octo.com/compte-rendu-du-comptoir-une-vision-de-plateforme-sans-leadership-tech-nest-quhallucination/
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...OCTO Technology
Par Wilde Diogene (Manager EPM @OCTO Technology), Samir Benyoucef (Consultant @OCTO Technology) et Matthieu Mlatac (Consultant sénior @OCTO Technology)
Plongez dans les bénéfices des solutions EPM pour améliorer la gestion du bilan carbone de votre entreprise. En simplifiant la collecte et l’analyse, ces solutions offrent une vision claire de votre empreinte environnementale et permettent d’identifier les opportunités de réductions de vos émissions. Les bénéfices pour votre entreprise incluent une meilleure efficacité opérationnelle, des coûts réduits, une réputation renforcée et une contribution significative aux efforts de lutte contre le changement climatique.
Vidéo Youtube : https://www.youtube.com/watch?v=ak--ftSio-I&list=PLBD8R108T9T4D3mcLiDpT67f9ERg1Hm2r&index=46
Compte-rendu : https://blog.octo.com/lavenir-de-la-gestion-du-bilan-carbone-les-solutions-epm-au-service-de-la-performance-environnementale/
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...OCTO Technology
Par Mehdi Houacine (Consultant Senior @OCTO Technology), Sofia Calcagno (Machine Learning Engineer @OCTO Technology) et Thomas Dobrzelewski (Lead Product Manager B2C @Wakam)
Wakam a comme ambition de réinventer le métier de l'assurance en y introduisant plus de transparence et de sécurisation via le blockchain. Or, ce type d'innovation structurante pose plusieurs questions : qui seront ses utilisateurs cibles ? Quel sera son impact sur le processus métier ? Nous vous présenterons ici une démarche liant expérimentation et déploiement via les outils du DDD permettant de faire pivoter un produit rapidement.
Vidéo Youtube : https://www.youtube.com/watch?v=Q3ElzHtV40s&list=PLBD8R108T9T4D3mcLiDpT67f9ERg1Hm2r&index=45
Compte-rendu : https://blog.octo.com/compte-rendu-du-comptoir-continuous-delivery-et-continuous-discovery-pour-construire-lassurance-de-demain/
L’état de l’art des tests front-end
Maîtriser et fiabiliser son code sont aujourd’hui devenus incontournables pour tout développeur devant faire face à des architectures Web de plus en plus riches et complexes.
Il existe des outils pour réaliser des tests front-end d’applications Web et répondre aux besoins d’un développement de qualité.
Nous vous invitons ici à parcourir l’écosystème de ces tests front-end d’applications Web. Que vous soyez déjà convaincus par les tests ou tout simplement curieux, ce document vous guidera pour les mettre en place sur vos projets.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
3. 3
What a buzzword !!!
Google trends on “big data”
Gartner hype cycle
2012
4. 4
WEB
Google, Amazon,
Facebook, Twitter,
…
IT Vendors
IBM, Teradata,
Vmware, EMC,
…
Management
McKinsey, BCG,
Gartner, …
Web giants gave some reality to a concept anticipated by Gartner.
This software evolution didn’t come from traditional software vendors
(which is quite unusual)
Origins of Big Data
Web giants implement BigData
solutions for their owns needs
Vendors are followers in
this movement. They try to
take a hold on this very
promising business
Consulting firms predicted a
big economic change, and
Big Data is part of it
6. 6
There’s no clear definition of Big Data
It is altogether a business ambition and many technological opportunities
Is there a clear definition ?
Super
datawarehouse?
Low cost storage
?
NoSQL?
Cloud?
Internet
Intelligence?
Real-time
analysis ?
Unstructured
data?
Open Data?
Big
databases?
12. 12
Big Data aims at getting an
economical advantage
from the quantitative analysis of
internal and external data
Big Data : proposed definition
13. 13
Some real use-cases
studied with OCTO
Telecom
• Analyze behavior of
customers (calls to service
center, opinion about the
brand on social networks
…) to identify a risk of
churn
• Analyze the huge amount
of data quality metrics from
network infrastructure in
real-time to proactively
inform the call-center about
network quality of service
Insurance
• Crawl the web (especially
forums) to identify
correlation between
damages, and center of
interests in communities
(health, household
insurance, car
insurance…)
• Improve datamining
models, and risk models
e-Commerce
• Analyze weblogs and
customer reviews to
improve product
recommendation
• Analyze data from call-
center (calls, emails) to
improve customer loyalty
16. 16
« Machine Learning » is not new. A first definition of it was given in 1959 :
Field of study that gives computers the ability to
learn without being explicitly programmed
Arthur Samuel
1959
A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience ETom Mitchell
1998
Machine Learning : a definition
We prefer this definition, more recent, and more precise :
17. 17
A computer program
is said to learn from
experience E with
respect to some
class of tasks T and
performance
measure P, if its
performance at
tasks in T, as
measured by P,
improves with
experience E
Example with a SPAM classifier
I tag some of my emails
into ‘spam’ or not
Ratio of emails correctly classified
automatically
The classifier put incoming emails
in ‘spam’ or not
SPAM Classifier
18. 18
A Machine Learning approach works only if 3 conditions are fulfilled
What’s new with Big Data in Machine Learning ?
Some « pattern » exist in data
You have a lot of data. A LOT.
(millions of samples)
There’s no analytical model to describe it
(= it’s a probabilistic problem)
A Big Data approach allows us to collect and manipulate much more data.
Machine Learning is a fundamental tool to leverage this huge amount of information
1
2
3
Machine Learning algorithms exist
since many years to address these
In the past, performance of ML models
was often limited by the lack of
available data.
Now we can collect and manipulate
much more
19. 19
Let’s imagine we want to predict if a customer of a telecom operator will churn
(go to a concurrent)
We will build a classifier, and start by building a learning set
For each customer, we collect a finite number of data, named attributes
Customer offer / plan
Customer data (region, age, sex, …)
Last 12 bills amount
Number of calls to call-center last 6 months
Amount of local calls of last 12 months
Amount of international calls of last 12 months
Amount of downloaded data
etc.
And for each customer in the training set, we know if the customer churned or
not. It’s the tag.
Machine Learning example : classification
21. 21
The θ vector is computed during the training phase
When the θ vector is computed, our classification model is ready
Then we test this model against other values for X (the test set), and we check if
our model is good at predicting the output value y. We talk about robustness of
the model = its capacity to generalize the prediction.
The challenge is to get a reasonable error ratio, and not to “overfit” the algorithm to the
training sample (it will predict nothing)
In general, 80% of your whole data set are used for training, and 20% for testing
Machine Learning example : classification
* C’est souvent 60%/20%/20% pour effectuer une
étape de validation du modèle
22. 22
Supervised learning
Data is tagged : we know if the customer is a churner or not for the training phase
Positives (churners) are abundant enough in sample to identify the typical churner
For some use-cases, the tagging may require the help of an expert to prepare the
training set. Expertise is needed before machine learning.
The challenge is about the generalization of the model
Unsupervised learning
We don’t know output values (the Y vector). We don’t know the number of tags, nor
their nature
Some of the attributes are not homogeneous amongst all the samples in X
The algorithm will group inputs xi by similarities (creating clusters)
The expertise is needed after machine learning, to interpret the results, and name the
discovered categories
The challenge is about understanding the output classification
Different strategies in categorisation
??
23. 23
Draw a line (hyperplane) that divide points in space, into 2 classes
Find a line with the best margin (good distance from points to the line)
Try to minimize the error (points on the bad side)
Example of supervised algorithm : Support Vector Machine
If distribution is fundamentaly non-
linearly separable, algorithms exist to
transform the data to higher
dimension, and make it linearly
separable.
24. 24
Example of unsupervised algorithm : K-Means clustering
Choose k points randomly in space
(the seeds)
Until convergence
Assign each input point to nearest seed to
form clusters
Compute the center of gravity of clusters,
and use these points as new seeds
25. 25
Dimensionality reduction
Example : product recommendation engine
N customers x P products
(ci, pj) = 1 if customer i bought product j
Very big and sparse matrix
Each customer is a point in a space having a
big number of dimensions
Idea : find a way to group products and
reduce dimensions of this space
Others algorithms
0
0
0
0
1
0
0
…
0
0
1
0
0
1 M products
10Mcustomers
P1 P2 Pn
…
…
0
1
0
0
0
0
0
…
0
0
0
0
0
0
0
0
0
0
0
1
…
0
0
0
0
0
Quantity prediction
Linear regression : The oldest and most known algorithm
28. 28
1956 : 50 k$ for a 5 MB IBM hard-drive… today : 20 € for a 8 GB microSD !
29. 29
Exponential growth of capacities
CPU, memory, network bandwith, storage … all of them followed the Moore’s law
Source :
http://strata.oreilly.com/2011/08/building-data-startups.html
30. 30
The old strategy : Scale-up
0.01
0.10
1.00
10.00
100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
100k $/GB
0,10 $/GB
HDD
RAM
The old way :
If you have too much data, just wait a few months that the cost decrease,
and then scale-up your infrastructure
Source :
http://www.mkomo.com/cost-per-gigabyte
32. 32
0
10
20
30
40
50
60
70
MB/s
1990 2010
64 MB/s
0,7 MB/s
Seagate
Barracuda
7200.10
Seagate
Barracuda
ATA IV
IBM DTTA
35010
x 100’000 x 91
Storage capacity Throughtput
We can store 100’000 times more data, but it takes 1000 times longer to read it !
33. 33
Limitations of traditional architectures
Over 10 Tb, « classical »
architectures requires huge
software and hardware
adaptations.
Over 1 000 transactions /
second, « classical »
architectures requires huge
software and hardware
adaptations.
Over 10 threads/Core CPU,
sequential programming reach
its limits (IO).
Over 1 000 events / second,
« classical » architectures
requires huge software and
hardware adaptations.
Distributed
storage
Share
nothing
XTP
Parallel
processing
Event Stream
Processing
« Traditional »
architectures
RDBMS,
Application server,
ETL, ESB
Event flow oriented
application
(streaming)
Transaction oriented
applications
(TPS)
Storage oriented
applications
(IO bound)
Computation
oriented applications
(CPU bound)
34. 34
Big Data = explosion of volumes :
data to store online
processing to parallelize
number of transactions per second to handle
number of messages per second to process
+
New constraints
New types of data (unstructured, semi-structured…)
Distribution of storage and processing
Cost reduction
Need of elasticity
=
New technologies
Horizontal scalability and clustering
Data partitioning / sharding
Parallel processing
In-memory processing
New Architectures
38. 38
Hadoop : a reference in the Big Data landscape
• Apache Hadoop
Open Source
• Cloudera CDH
• Hortonworks
• MapR
• DataStax (Brisk)
Main distributions
• Greenplum (EMC)
• IBM InfoSphere BigInsights (CDH)
• Oracle Big data appliance (CDH)
• NetApp Analytics (CDH)
• …
Commercial
• Amazon EMR (MapR)
• VirtualScale (CDH)
Cloud
39. 39
Key principles
File storage more voluminous than a single disk
Data distributed on several nodes
Data replication to ensure « fail-over », with « rack awareness »
Use of commodity disk instead of SAN
Hadoop Distributed File System (HDFS)
40. 40
Key principles
Parallelise and distribute processing
Quicker processing of smaller data volumes (unitary)
Co-location of processing and data
Hadoop distributed processing : Map Reduce
42. 42
Available tools in a typical distribution (CDH)
HDFS
MapReduce
YARN (v2)
Pig
Cascading
Hive
Oozie
Azkaban
Mahout
HAMA
Giraph
Sqoop
Flume
Scribe
Chukwa
CLI
Web
Console
Hue
Cloudera
Manager
HBase
Impala
43. 43
Hadoop : a blooming ecosystem !!
Processing
Hadoop Distributed
Storage
Distributed FS Local FS NoSQL datastores
GlusterFS HDFS S3 CephCassandra RingDynamoDB
OLAP OLTP
Machine
Learning
HBase Impala Hawq Map Reduce /
Tez
Map
Reduce /
Tez
R, Python,…
MahoutStreaming Cascading
R, Python,…
Hive Pig StreamingCascading
Spark Spark
Openstack
SwiftIsilon
Scalding
Giraph Hama
SciKit
Stinger
MapR
Lots of annoucements and new tools appearing every day …
Maturity is very variable from one tool to another
44. 44
Maturities of solutions in the Hadoop ecosystem are very
heterogeneous
Ex : HDFS and MapReduce are perfectly production ready
Yahoo manages a peta-byte scale HDFS cluster
But some tools around are still poor : especially admin and debug tools
Ex : Impala (real-time querying, with SQL-compliant queries) is not
production-ready
Ex : Adaptation of machine learning libraries to distributed computation
with MarReduce is on-going
Apache Mahout has MapReduce compliant algorithms
MapReduce libraries for R are quite young
Maturity of tools
45. 45
Hadoop is a rich and quite new technology, difficult to master
Get trained, bring experts in your project !
49. 49
Since many years, we use Machine Learning algorithms to find patterns in data
Big Data technologies now allow us to manipulate much more data, and get
more value with Machine Learning techniques
Machine Learning + Big Data
Linear regression
Neural network
50. 50
Hadoop : a reference in the Big Data technology landscap
But with a very effervescent ecosystem.
It’s hard to follow all the trends and evolutions without a dedicated RnD team.
Don’t do this alone, get trained, and bring experts in your project
Hadoop