The document discusses how Dataiku aims to help data scientists focus on real problems by providing a ready-to-use data science studio platform. The platform offers visual and interactive data preparation tools for data cleaning, guided machine learning for non-ML experts, and production-ready models and insights. Dataiku was founded in 2013 to make data science accessible to anyone by handling real-life data challenges through a common and democratic data science environment.
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
Our pitch at Data-Driven NYC meetup on September 17th (http://datadrivennyc.com).
Speaking about Data Scientists pains and how Dataiku Data Science Studio can help them to more than Data Cleaners and Data Leak Fixers !
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? During Strata Hadoop World Singapore 2015, Pauline Brown, Director of Marketing at Dataiku, explains how lack of collaboration is what is keeping companies from building and deploying data products effectively. Learn more about Dataiku and Data Science Studio: www.dataiku.com
Dataiku productive application to production - pap is may 2015 Dataiku
Beyond Predictive Analytics : Deploying apps to production and keep them improving
Some smart companies have been putting predictive application in production for decades. Still, either because of lack of sharing or lack of generality, there is still no single and obvious way to put a predictive application in production today.
As a consequence, for most companies, transitioning analytics from development to production is still “the next frontier”.
Behind the single word "production” lays a great number of questions like: what exactly do you put in production: data, model, code all three ? Who is responsible for maintenance and quality check over time : business, tech or both ? How can I make my predictive app continuously improve and check that it delivers the promised business value over time ? What are the best practice for maintenance and updates by the way ? Will my data scientists keep working after first development or should I lay half of them off ? etc…
Let’s make a small analogy with the development of web sites in the 90’s and early 00’s :
Back then, the winners where not necessarily the web sites with an amazing design, but a winner had clearly made the necessary efforts and had a robust way to put their web site reliabily in production
Today, every web developper can enjoy the confort of Heroku, Amazon, Github, docker, Angular, bootstrap … and so we forget. How much time before we get the same confort for the predictive world ?
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
Our pitch at Data-Driven NYC meetup on September 17th (http://datadrivennyc.com).
Speaking about Data Scientists pains and how Dataiku Data Science Studio can help them to more than Data Cleaners and Data Leak Fixers !
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? During Strata Hadoop World Singapore 2015, Pauline Brown, Director of Marketing at Dataiku, explains how lack of collaboration is what is keeping companies from building and deploying data products effectively. Learn more about Dataiku and Data Science Studio: www.dataiku.com
Dataiku productive application to production - pap is may 2015 Dataiku
Beyond Predictive Analytics : Deploying apps to production and keep them improving
Some smart companies have been putting predictive application in production for decades. Still, either because of lack of sharing or lack of generality, there is still no single and obvious way to put a predictive application in production today.
As a consequence, for most companies, transitioning analytics from development to production is still “the next frontier”.
Behind the single word "production” lays a great number of questions like: what exactly do you put in production: data, model, code all three ? Who is responsible for maintenance and quality check over time : business, tech or both ? How can I make my predictive app continuously improve and check that it delivers the promised business value over time ? What are the best practice for maintenance and updates by the way ? Will my data scientists keep working after first development or should I lay half of them off ? etc…
Let’s make a small analogy with the development of web sites in the 90’s and early 00’s :
Back then, the winners where not necessarily the web sites with an amazing design, but a winner had clearly made the necessary efforts and had a robust way to put their web site reliabily in production
Today, every web developper can enjoy the confort of Heroku, Amazon, Github, docker, Angular, bootstrap … and so we forget. How much time before we get the same confort for the predictive world ?
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
In this first course of our Applied Data Science online course series, you'll learn about the mindset shift of going from small to big data, basic definitions and concepts, and an overview of the data science workflow.
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
Many organisations are creating groups dedicated to data. These groups have many names : Data Team, Data Labs, Analytics Teams….
But whatever the name, the success of those teams depends a lot on the quality of the data infrastructure and their ability to actually deploy data science applications in production.
In that regards a new role of “DataOps” is emerging. Similar, to Dev Ops for (Web) Dev, the Data Ops is a merge between a data engineer and a platform administrator. Well versed in cluster administration and optimisation, a data ops would have also a perspective on the quality of data quality and the relevance of predictive models.
Do you want to be a Data Ops ? We’ll discuss its role and challenges during this talk
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
How can we use technology to help the organization make data-driven decision-making part of its organizational DNA, while retaining the context of the business as a whole? How can we imprint data in the culture of the organization and make it easily accessible to everyone? Microsoft directly empowers businesses to derive insights and value from little and big data, through its release of user-friendly analytics through Azure Machine Learning (ML) combined with its acquisition of Revolution Analytics. Power BI can be used to create compelling visual stories around the analysis so that the work is not left to the data consumer. Together, these technologies can be used to make data and analytics part of the organization's DNA.
There are no prerequisites, but attendees are welcome to follow along with the demo if they have an Azure ML and Power BI account and R installed. Files will be released before the session.
Self Service Analytics enabled by Data Virtualization from DenodoDenodo
Watch full webinar here: https://bit.ly/39U9qY8
Self-service Analytics BI is often quoted by many - ie, allow users to discover and access data without having to ask IT to create a data mart, or by allowing users to directly export/copy the data from the data sources themselves into their analytics tools and systems. The challenge is not just to provide access to the data – even from Excel this can be done - but to do this in real time without creating processing overhead, while getting trusted data, with the best response time possible, in a managed, governed and secure way in order for these users to trust the output of the analysis.
Data Virtualization provides a data access platform that allows users to access the data they need from multiple data sources, when they need it, and with the best possible response time. In addition, a Data Marketplace built on top of this proven technology enables Self Service Analytics by exposing consistent and governed data sets to be discovered by users, providing the trusted foundation for a successful Self-Service Analytics initiative.
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...Dataconomy Media
Uwe Seiler, the Data Architect and Trainer at codecentric AG presented "Hadoop & Germany & 2016", as part of the Big Data, Frankfurt v 2.0 meetup organised on the 12th of May 2016 at the headquarters of codecentric AG.
How the world of data analytics, science and insights is failing and how the principles from Agile, DevOps, and Lean are the way forward. #DataOps Given at DevOps Enterprise Summit 2019
The Virtualization of Clouds - The New Enterprise Data Architecture OpportunityDenodo
Watch full webinar here: https://bit.ly/3x7xVuR
Organizations worldwide are adopting a variety of the public cloud service providers (i.e. AWS, Google, Microsoft) and each have a portfolio of storage, compute, network, and security options. All of which create significant challenges in managing a hybrid and multi-cloud enterprise architecture. Even worse is the impact to the governance and integration of data from the clouds and physical infrastructure to support the broad array of analytics and operational requirements.
Can one public cloud provider meet all your needs today and in the future? How do you manage across multiple public and private clouds you have today and where your data exists? And, how would you manage and operate your multi-cloud and on-premises systems to gain value from your data in any of them? The Chief Research Officer at Ventana Research, Mark Smith, will expound the challenges and path ahead for virtualization and integration of your data and the clouds, setting an architectural path for best success.
Data Engineering and the Data Science LifecycleAdam Doyle
Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.
"Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtualityDataconomy Media
"Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality
YouTube Link: https://www.youtube.com/watch?v=P5StySlZTzU
Watch more from Data Natives 2015 here: http://bit.ly/1OVkK2J
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2016: http://bit.ly/1WMJAqS
Presentation Slides:
https://www.slideshare.net/secret/sis...
About the Author:
Matthias Korn, Technical Consultant at datavirtuality. His talk, "Beyond the Data Lake" will talk about the shift in digital era, which harnesses large amounts of data to make astute business decisions and improve operations, which is now an imperative. While our ability to generate data still far outstrips our ability to analyze it, we are making strides. Exciting new approaches are merging big data solutions with traditional enterprise data strategies. Logical data warehouses, in which there is no single data repository, hold enormous promise. By offering an ecosystem of multiple best-fit repositories, technologies, and tools, business can effectively analyze data for powerful insight.
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
The AI Mindset: Bridging Industry and Academic PerspectivesSnapLogic
In this presentation, find out how Dr. Greg Benson brought ML into the SnapLogic platform and how to combine the strengths of industry practices and academic methodologies to achieve success with ML.
My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania
http://www.eurocloud.ro/en/events/all-there-is-to-know-about-big-data/#.UXZFaUDvlVI
Data Wrangling and the Art of Big Data DiscoveryInside Analysis
The Briefing Room with Dr. Robin Bloor, Trifacta and Zoomdata
Live Webcast March 10, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=dd9fed3c7c476ae3a0f881ae6b53dcc5
Square pegs and round holes don't get along, which is one reason why traditional data management approaches simply won't work for Big Data. The variety and velocity of data types flying at us today require a new strategy for identifying, streamlining and utilizing information assets and processes. Decades-old technology won’t cut it – a combination of new tools and techniques must be used to enable effective discovery of insights in a timely fashion.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why today's data landscape calls for a much different data management approach. He'll be briefed by Trifacta and Zoomdata, who will show how their technologies use a range of functionality – including machine learning – to help companies "wrangle" their data. They'll also demonstrate the optimal step-by-step process of working with new data types.
Visit InsideAnalysis.com for more information.
Comment Coyote Systems utilse le Data Science Studio de Dataiku pour optimise...Le_GFII
Intervention de Hugo Le Squeren, Sales Engineer chez Dataiku et Florian Servaux, Chef de projet chez Coyote.
Séminaire DIXIT : Les nouvelles frontières de la « data intelligence » : content analytics, machine-learning, prédictif
Abstract : omme dans les activités TELECOM, le modèle COYOTE est basé sur l’abonnement. À ce titre, la fidélisation du parc d’abonnés est un facteur clé de succès. Afin d’optimiser ses actions de fidélisation et d’accroître la connaissance client, COYOTE en partenariat avec DATAIKU, a croisé les différentes sources de données à sa disposition. Il en résulte des analyses prédictives sur le comportement client.
Source : http://www.gfii.fr/fr/document/seminaire-dixit-les-nouvelles-frontieres-de-la-data-intelligence-content-analytics-machine-learning-predictif
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
In this first course of our Applied Data Science online course series, you'll learn about the mindset shift of going from small to big data, basic definitions and concepts, and an overview of the data science workflow.
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
Many organisations are creating groups dedicated to data. These groups have many names : Data Team, Data Labs, Analytics Teams….
But whatever the name, the success of those teams depends a lot on the quality of the data infrastructure and their ability to actually deploy data science applications in production.
In that regards a new role of “DataOps” is emerging. Similar, to Dev Ops for (Web) Dev, the Data Ops is a merge between a data engineer and a platform administrator. Well versed in cluster administration and optimisation, a data ops would have also a perspective on the quality of data quality and the relevance of predictive models.
Do you want to be a Data Ops ? We’ll discuss its role and challenges during this talk
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
How can we use technology to help the organization make data-driven decision-making part of its organizational DNA, while retaining the context of the business as a whole? How can we imprint data in the culture of the organization and make it easily accessible to everyone? Microsoft directly empowers businesses to derive insights and value from little and big data, through its release of user-friendly analytics through Azure Machine Learning (ML) combined with its acquisition of Revolution Analytics. Power BI can be used to create compelling visual stories around the analysis so that the work is not left to the data consumer. Together, these technologies can be used to make data and analytics part of the organization's DNA.
There are no prerequisites, but attendees are welcome to follow along with the demo if they have an Azure ML and Power BI account and R installed. Files will be released before the session.
Self Service Analytics enabled by Data Virtualization from DenodoDenodo
Watch full webinar here: https://bit.ly/39U9qY8
Self-service Analytics BI is often quoted by many - ie, allow users to discover and access data without having to ask IT to create a data mart, or by allowing users to directly export/copy the data from the data sources themselves into their analytics tools and systems. The challenge is not just to provide access to the data – even from Excel this can be done - but to do this in real time without creating processing overhead, while getting trusted data, with the best response time possible, in a managed, governed and secure way in order for these users to trust the output of the analysis.
Data Virtualization provides a data access platform that allows users to access the data they need from multiple data sources, when they need it, and with the best possible response time. In addition, a Data Marketplace built on top of this proven technology enables Self Service Analytics by exposing consistent and governed data sets to be discovered by users, providing the trusted foundation for a successful Self-Service Analytics initiative.
Uwe Seiler, Data Architect and Trainer at codecentric AG - "Hadoop & Germany ...Dataconomy Media
Uwe Seiler, the Data Architect and Trainer at codecentric AG presented "Hadoop & Germany & 2016", as part of the Big Data, Frankfurt v 2.0 meetup organised on the 12th of May 2016 at the headquarters of codecentric AG.
How the world of data analytics, science and insights is failing and how the principles from Agile, DevOps, and Lean are the way forward. #DataOps Given at DevOps Enterprise Summit 2019
The Virtualization of Clouds - The New Enterprise Data Architecture OpportunityDenodo
Watch full webinar here: https://bit.ly/3x7xVuR
Organizations worldwide are adopting a variety of the public cloud service providers (i.e. AWS, Google, Microsoft) and each have a portfolio of storage, compute, network, and security options. All of which create significant challenges in managing a hybrid and multi-cloud enterprise architecture. Even worse is the impact to the governance and integration of data from the clouds and physical infrastructure to support the broad array of analytics and operational requirements.
Can one public cloud provider meet all your needs today and in the future? How do you manage across multiple public and private clouds you have today and where your data exists? And, how would you manage and operate your multi-cloud and on-premises systems to gain value from your data in any of them? The Chief Research Officer at Ventana Research, Mark Smith, will expound the challenges and path ahead for virtualization and integration of your data and the clouds, setting an architectural path for best success.
Data Engineering and the Data Science LifecycleAdam Doyle
Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.
"Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtualityDataconomy Media
"Beyond the Data Lake", Matthias Korn, Technical Consultant at datavirtuality
YouTube Link: https://www.youtube.com/watch?v=P5StySlZTzU
Watch more from Data Natives 2015 here: http://bit.ly/1OVkK2J
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2016: http://bit.ly/1WMJAqS
Presentation Slides:
https://www.slideshare.net/secret/sis...
About the Author:
Matthias Korn, Technical Consultant at datavirtuality. His talk, "Beyond the Data Lake" will talk about the shift in digital era, which harnesses large amounts of data to make astute business decisions and improve operations, which is now an imperative. While our ability to generate data still far outstrips our ability to analyze it, we are making strides. Exciting new approaches are merging big data solutions with traditional enterprise data strategies. Logical data warehouses, in which there is no single data repository, hold enormous promise. By offering an ecosystem of multiple best-fit repositories, technologies, and tools, business can effectively analyze data for powerful insight.
When it comes to creating an enterprise AI strategy: if your company isn’t good at analytics, it’s not ready for AI. Succeeding in AI requires being good at data engineering AND analytics. Unfortunately, management teams often assume they can leapfrog best practices for basic data analytics by directly adopting advanced technologies such as ML/AI – setting themselves up for failure from the get-go. This presentation explains how to get basic data engineering and the right technology in place to create and maintain data pipelines so that you can solve problems with AI successfully.
The AI Mindset: Bridging Industry and Academic PerspectivesSnapLogic
In this presentation, find out how Dr. Greg Benson brought ML into the SnapLogic platform and how to combine the strengths of industry practices and academic methodologies to achieve success with ML.
My slides on how to use cloud as a data platform at BigDataWeek 2013 Romania
http://www.eurocloud.ro/en/events/all-there-is-to-know-about-big-data/#.UXZFaUDvlVI
Data Wrangling and the Art of Big Data DiscoveryInside Analysis
The Briefing Room with Dr. Robin Bloor, Trifacta and Zoomdata
Live Webcast March 10, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=dd9fed3c7c476ae3a0f881ae6b53dcc5
Square pegs and round holes don't get along, which is one reason why traditional data management approaches simply won't work for Big Data. The variety and velocity of data types flying at us today require a new strategy for identifying, streamlining and utilizing information assets and processes. Decades-old technology won’t cut it – a combination of new tools and techniques must be used to enable effective discovery of insights in a timely fashion.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain why today's data landscape calls for a much different data management approach. He'll be briefed by Trifacta and Zoomdata, who will show how their technologies use a range of functionality – including machine learning – to help companies "wrangle" their data. They'll also demonstrate the optimal step-by-step process of working with new data types.
Visit InsideAnalysis.com for more information.
Comment Coyote Systems utilse le Data Science Studio de Dataiku pour optimise...Le_GFII
Intervention de Hugo Le Squeren, Sales Engineer chez Dataiku et Florian Servaux, Chef de projet chez Coyote.
Séminaire DIXIT : Les nouvelles frontières de la « data intelligence » : content analytics, machine-learning, prédictif
Abstract : omme dans les activités TELECOM, le modèle COYOTE est basé sur l’abonnement. À ce titre, la fidélisation du parc d’abonnés est un facteur clé de succès. Afin d’optimiser ses actions de fidélisation et d’accroître la connaissance client, COYOTE en partenariat avec DATAIKU, a croisé les différentes sources de données à sa disposition. Il en résulte des analyses prédictives sur le comportement client.
Source : http://www.gfii.fr/fr/document/seminaire-dixit-les-nouvelles-frontieres-de-la-data-intelligence-content-analytics-machine-learning-predictif
"Machine Learning and Internet of Things, the future of medical prevention", ...Dataconomy Media
"Machine Learning and Internet of Things, the future of medical prevention", Pierre Gutierrez, Sr. Data Scientist at Dataiku
Watch more from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
https://www.youtube.com/c/DataNatives
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Pierre Gutierrez is a senior data scientist at Dataiku. As a data science expert and consultant, Pierre has worked in diverse sectors such as e-business, retail, insurance or telcos. He has experience in various topics such as smart cities, fraud detection, recommender systems, or IoT.
Many think that a Data Science is like a Kaggle competition. There are, however big differences in the approach. This presentation is about designing carefully your evaluation scheme to avoid overfitting and unexpected production performances.
From Labelling Open data images to building a private recommender systemPierre Gutierrez
Recommender systems are paramount for e-business companies. There is an increasing need to take into account all the user information to tailor the best product proposition. One of them is the content that the user actually sees: the visual of the product.
When it comes to hostels, some people can be more attracted by pictures of the room, the building or even the nearby beach.
In this talk, we will describe how we improved an e-business vacation retailer recommender system using the content of images. We’ll explain how to leverage open dataset and pre-trained deep learning models to derive user taste information. This transfer learning approach enables companies to use state of the art machine learning methods without having deep learning expertise.
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages JaunesDataiku
Presentation made at Rennes in January for the handsome BreizhJUG. This is a mixed presentation for big data technologies, which covers topics such as : Why Hadoop ? What next ? Machine Learning for big data in practice.
Slides from the presentation of this NYC meetup : http://www.meetup.com/Data-Modeling/events/224554990/
I talked about how to model churn before even thinking about the machine learning model.
How to Build a Successful Data Team - Florian Douetteau @ PAPIs ConnectPAPIs.io
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve that.
As Data Manager, you know the challenges ahead:
Multitudes of technology choices to make
Building a team and solving the skill-set disconnect
Data can be deceiving...
Figuring out what the successful data product must be
The goal of this talk is to provide some perspective to these topics
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising and gaming industries, holding various data or CTO’s role. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains from the data enthusiasts and let them express their creativity.
Course 8 : How to start your big data project by Eric Rodriguez Betacowork
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
---------
"Data is the new oil" - Many companies and professionals do not know how to use their data or are not aware of the added value they could gain from it.
It is in response to these problems that the project “Brussels: The Beating Heart of Big Data” was born.
This project, financed by the Region of Brussels Capital and organised by Betacowork, offers 3 training cycles of 10 courses on big data, at both beginner and advanced levels. These 3 cycles will be followed by a Hackathon weekend.
No prerequisites are required to start these courses. The aim of these courses is to familiarize participants with the principles of Big Data.
------
For more info about our Big Data courses, check out our website ➡️ https://www.betacowork.com/big-data/
How to crack Big Data and Data Science rolesUpXAcademy
How to crack Big Data and Data Science roles is the flagship event of UpX Academy. This slide was used for the event on 10th Sept that was attended by hundreds of participants globally.
Data Science is a form of science that focuses on dealing with huge chunks of data by using modern data analysis tools and techniques to discover hidden patterns, meaningful insights, and make critical business decisions.
A Data Science professional has to utilize complicated machine learning algorithms to develop predictive models. There could be multiple sources present in different formats used in data analysis.
From Lab to Factory: Or how to turn data into valuePeadar Coyle
We've all heard of 'big data' or data science, but how do we convert these trends into actual business value. I share case studies, and technology tips and talk about the challenges of the data science process. This is all based on two years of in-the-field research of deploying models, and going from prototypes to production.
These are slides from my talk at PyCon Ireland 2015
La BuzzWord dell’ultimo anno è “Data Science”. Ma cosa significa realmente? Cosa fa un “Data Scientist”? Che strumenti sono messi a disposizione da Microsoft? E che altri strumenti ci sono oltre a Microsoft?
Speaker: Venkatesh Umaashankar
LinkedIn: https://www.linkedin.com/in/venkateshumaashankar/
What will be discussed?
What is Data Science?
Types of data scientists
What makes a Data Science Team? Who are its members?
Why does a DS team need Full Stack Developer?
Who should lead the DS Team
Building a Data Science team in a Startup Vs Enterprise
Case studies on:
Evolution Of Airbnb’s DS Team
How Facebook on-boards DS team and trains them
Apple’s Acqui-hiring Strategy to build DS team
Spotify -‘Center of Excellence’ Model
Who should attend?
Managers
Technical Leaders who want to get started with Data Science
Analytics-Enabled Experiences: The New Secret WeaponDatabricks
Tracking and analyzing how our individual products come together has always been an elusive problem for Steelcase. Our problem can be thought of in the following way: “we know how many Lego pieces we sell, yet we don’t know what Lego set our customers buy.” The Data Science team took over this initiative, which resulted in an evolution of our analytics journey. It is a story of innovation, resilience, agility and grit.
The effects of the COVID-19 pandemic on corporate America shined the spotlight on office furniture manufacturers to solve for ways on which the office can be made safe again. The team would have never imagined how relevant our work on product application analytics would become. Product application analytics became an industry priority overnight.
The proposal presented this year is the story of how data science is helping corporations bring people back to the office and set the path to lead the reinvention of the office space.
After groundbreaking milestones to overcome technical challenges, the most important question is: What do we do with this? How do we scale this? How do we turn this opportunity into a true competitive advantage? The response: stop thinking about this work as a data science project and start to think about this as an analytics-enabled experience.
During our session we will cover the technical elements that we overcame as a team to set-up a pipeline that ingests semi-structured and unstructured data at scale, performs analytics and produces digital experiences for multiple users.
This presentation will be particularly insightful for Data Scientists, Data Engineers and analytics leaders who are seeking to better understand how to augment the value of data for their organization
Drinking from the Fire Hose: Practical Approaches to Big Data Preparation and...Inside Analysis
The Briefing Room with Robin Bloor and Pervasive Software
Slides from the Live Webcast on May 1, 2012
The old methods of delivering data for analysts and other business users will simply not scale to meet new demands. Hadoop is rapidly emerging as a powerful and economic platform for storing and processing Big Data. And yet, the biggest obstacle to implementing Hadoop solutions is the scarcity of Hadoop programming skills.
Check out this episode of The Briefing Room to learn from veteran Analyst Robin Bloor, who will explain why modern information architectures must embrace the new, massively parallel world of computing as it relates to several enterprise roles: traditional business analysts, data scientists, and line-of-business workers. He'll be briefed by David Inbar and Jim Falgout of Pervasive Software, who will explain how Pervasive RushAnalyzer™ was designed to accommodate the new reality of Big Data.
For more information visit: http://www.insideanalysis.com
Watch us on YouTube: http://www.youtube.com/playlist?list=PL5EE76E2EEEC8CF9E
Big Data for Data Scientists - Info SessionWeCloudData
In this talk, WeCloudData introduces the Hadoop/Spark ecosystem and how businesses use big data tools and platforms. For more detail about WeCloudData's big data for data scientist course please visit: https://weclouddata.com/data-science/
Industry of Things World - Berlin 19-09-16Boris Adryan
This talk makes the case for a measured use of big data pipelines and analytics methods based on the specific business case: one size doesn't fit all. Rather than buying the fastest stack and the most hyped methods, practitioners interested in analytics for Internet-of-Things deployments can save a lot of money by asking themselves a few questions that I lay out in the talk.
Similar to Dataiku, Pitch Data Innovation Night, Boston, Septembre 16th (20)
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Dataiku
In our 3rd applied machine learning online course, we'll dive into different methods for data preparation, including handling missing values, dummification and rescaling.
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
In the second part of our applied machine learning online course, you'll get an overview of the different steps in the data science workflow as well as a deep dive in 3 basic types of models: linear, tree-based and clustering.
Before Kaggle : from a business goal to a Machine Learning problem Dataiku
Many think that a Data Science is like a Kaggle competition. There are, however big differences in the approach. This presentation is about designing carefully your evaluation scheme to avoid overfitting and unexpected production performances.
This is a presentation by Pierre Gutierrez (Dataiku’s data scientist).
Retrouvez l'intégralité de la présentation commune de Dataiku et Coyote sur la "Valorisation des données".
Cette présentation a été réalisée dans le cadre du Symposium du 04 Juin 2015, organisé par le Club Urba-EA et le Club Pilotes de Processus.
Plus d'informations sur www.dataiku.com
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku
Between traditional Business Intelligence and "Big Data" approaches, many companies need to innovate and work in a hybrid manner. How and with what tools can business and technical profiles collaborate productively together? lorian Douetteau, Dataiku's CEO, answers these questions.
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
This is a presentation made on the 13th August 2014 at the SF Data Mining Meetup at Trulia. It's about Dataiku and the Kaggle Personalized Web Search Ranking challenge sponsored by Yandex
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
Snapshot of the hadoop ecosystem at the beginning of 2014, with the rise of real time and in memory processing distributed frameworks that complement and supplant the Map Reduce paradigm
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
7. Pain points
• Data preparation is time-consuming
• Machine learning is hard to understand
• Insights and models (almost) never reach
production
8. Data Science Studio
• A democratic & ready to use Data Science
Studio to start innovating with data!
Ready to Use Data
Science Platform
Common playground for
innovation
Accessible Statistics &
Machine Learning for
everyone
Handle real-life data
9. Data Science Studio
Visual and Interactive Data
Preparation
For Data Cleaners
Guided Machine Learning
For non Machine Learning Experts
Production ready
For Data Leak Fixers
10. Dataiku at a glance
• Founded in 2013 by Data and Search Engine veterans
• From “data” and “haïku”
“data can be big
solution would be small
feel the hot wind”
• 1 goal: make Data Science accessible to anyone!
Contact: marc.batty@dataiku.com / @battymarc
Editor's Notes
Good evening everyoneI’m Marc Batty cofounder of DATAIKUI would like to speak about Data Science tonight.Even though Data Scientist is a buzzword these days, almost nobody knows what they do!
You may think about the Machine Learning expertHe is going to answer all your business questions and may be save the world.
But we mostly see a lot of Data Cleaners, and there not quite happy about their jobs
Or also Data Leak Fixer, you know when you have to do all the plumbing between all your databases and hadoop clusters.
And even the Data Waiter, waiting for his endless hadoop job to finish, before getting the first insight on his data?
So the question is …
They all have in common :They spend to much time preparing their data to go from raw data to usable dataMachine learning is hard to understand if you don’t have a PHD in staticticsIn most companies, insights and models (almost) reach production because it’s hard to integrate all the required big data technologies.
So at DATAIKU we built a Data Science Studio.It’s a ready to use Data Science platform with all the tools you need to create your Data Science Apps.It’s accessible so you don’t need to be an experienced Data Scientist to start building models.It’s a common playground for your team that can share datasets, models and insights.
In our studio we’ve got a whole range of tools to help all the Data Scientists being more productive.Visual Data Preparation for Data Cleaners for instant feedbackGuided Machine Learning for non Machine Learning experts to quickly start building modelsProduction tools to integrate all the required Big Data technologies.Now Data Scientists can focus on being innovative and creative with their data.
Dataiku has 1 goal : make Data Science accessible to anyone.I’ll be happy to continue this discussion and show you a demo after to this pitch so don’t hesitate to come see me.Thank you very much