This document discusses Hal's need for a big data platform at his company Dim's Private Showroom. It outlines Hal's wishes to better understand customer behavior, determine which products to feature, and solve data and computing challenges. The document then introduces Dataiku and its open source data tracking and mining platform using Google Cloud and Hadoop. Finally, it provides an example project timeline and discusses early successes including improved report times and optimization of marketing channels.
Dataiku productive application to production - pap is may 2015 Dataiku
Beyond Predictive Analytics : Deploying apps to production and keep them improving
Some smart companies have been putting predictive application in production for decades. Still, either because of lack of sharing or lack of generality, there is still no single and obvious way to put a predictive application in production today.
As a consequence, for most companies, transitioning analytics from development to production is still “the next frontier”.
Behind the single word "production” lays a great number of questions like: what exactly do you put in production: data, model, code all three ? Who is responsible for maintenance and quality check over time : business, tech or both ? How can I make my predictive app continuously improve and check that it delivers the promised business value over time ? What are the best practice for maintenance and updates by the way ? Will my data scientists keep working after first development or should I lay half of them off ? etc…
Let’s make a small analogy with the development of web sites in the 90’s and early 00’s :
Back then, the winners where not necessarily the web sites with an amazing design, but a winner had clearly made the necessary efforts and had a robust way to put their web site reliabily in production
Today, every web developper can enjoy the confort of Heroku, Amazon, Github, docker, Angular, bootstrap … and so we forget. How much time before we get the same confort for the predictive world ?
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku
Between traditional Business Intelligence and "Big Data" approaches, many companies need to innovate and work in a hybrid manner. How and with what tools can business and technical profiles collaborate productively together? lorian Douetteau, Dataiku's CEO, answers these questions.
Dataiku productive application to production - pap is may 2015 Dataiku
Beyond Predictive Analytics : Deploying apps to production and keep them improving
Some smart companies have been putting predictive application in production for decades. Still, either because of lack of sharing or lack of generality, there is still no single and obvious way to put a predictive application in production today.
As a consequence, for most companies, transitioning analytics from development to production is still “the next frontier”.
Behind the single word "production” lays a great number of questions like: what exactly do you put in production: data, model, code all three ? Who is responsible for maintenance and quality check over time : business, tech or both ? How can I make my predictive app continuously improve and check that it delivers the promised business value over time ? What are the best practice for maintenance and updates by the way ? Will my data scientists keep working after first development or should I lay half of them off ? etc…
Let’s make a small analogy with the development of web sites in the 90’s and early 00’s :
Back then, the winners where not necessarily the web sites with an amazing design, but a winner had clearly made the necessary efforts and had a robust way to put their web site reliabily in production
Today, every web developper can enjoy the confort of Heroku, Amazon, Github, docker, Angular, bootstrap … and so we forget. How much time before we get the same confort for the predictive world ?
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku
Between traditional Business Intelligence and "Big Data" approaches, many companies need to innovate and work in a hybrid manner. How and with what tools can business and technical profiles collaborate productively together? lorian Douetteau, Dataiku's CEO, answers these questions.
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? During Strata Hadoop World Singapore 2015, Pauline Brown, Director of Marketing at Dataiku, explains how lack of collaboration is what is keeping companies from building and deploying data products effectively. Learn more about Dataiku and Data Science Studio: www.dataiku.com
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages JaunesDataiku
Presentation made at Rennes in January for the handsome BreizhJUG. This is a mixed presentation for big data technologies, which covers topics such as : Why Hadoop ? What next ? Machine Learning for big data in practice.
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
Our pitch at Data-Driven NYC meetup on September 17th (http://datadrivennyc.com).
Speaking about Data Scientists pains and how Dataiku Data Science Studio can help them to more than Data Cleaners and Data Leak Fixers !
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
Many organisations are creating groups dedicated to data. These groups have many names : Data Team, Data Labs, Analytics Teams….
But whatever the name, the success of those teams depends a lot on the quality of the data infrastructure and their ability to actually deploy data science applications in production.
In that regards a new role of “DataOps” is emerging. Similar, to Dev Ops for (Web) Dev, the Data Ops is a merge between a data engineer and a platform administrator. Well versed in cluster administration and optimisation, a data ops would have also a perspective on the quality of data quality and the relevance of predictive models.
Do you want to be a Data Ops ? We’ll discuss its role and challenges during this talk
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
In this first course of our Applied Data Science online course series, you'll learn about the mindset shift of going from small to big data, basic definitions and concepts, and an overview of the data science workflow.
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.
Implementing a machine learning solution from scratch requires a lot of resource investment before yielding results. It is tempting to look for off the shelf machine learning solutions that are easy to integrate within one’s product instead. In this talk, you will follow a real case example of how the need to solve a specific problem led to doing a benchmark on a series of machine learning services. You will learn how these services compare, and pick up some tips on how to conduct your own benchmarks along the way.
Inês Almeida is a machine learning enthusiast from Lisbon, Portugal, where she has given several talks on the topic, in particular on neural networks. Her goal is to share knowledge that is useful for newbies and experts alike. Inês has a Physics MSc. degree and currently works as a data scientist at Liquid Data Intelligence.
How to Build a Successful Data Team - Florian Douetteau @ PAPIs ConnectPAPIs.io
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve that.
As Data Manager, you know the challenges ahead:
Multitudes of technology choices to make
Building a team and solving the skill-set disconnect
Data can be deceiving...
Figuring out what the successful data product must be
The goal of this talk is to provide some perspective to these topics
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising and gaming industries, holding various data or CTO’s role. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains from the data enthusiasts and let them express their creativity.
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
How can we use technology to help the organization make data-driven decision-making part of its organizational DNA, while retaining the context of the business as a whole? How can we imprint data in the culture of the organization and make it easily accessible to everyone? Microsoft directly empowers businesses to derive insights and value from little and big data, through its release of user-friendly analytics through Azure Machine Learning (ML) combined with its acquisition of Revolution Analytics. Power BI can be used to create compelling visual stories around the analysis so that the work is not left to the data consumer. Together, these technologies can be used to make data and analytics part of the organization's DNA.
There are no prerequisites, but attendees are welcome to follow along with the demo if they have an Azure ML and Power BI account and R installed. Files will be released before the session.
How the world of data analytics, science and insights is failing and how the principles from Agile, DevOps, and Lean are the way forward. #DataOps Given at DevOps Enterprise Summit 2019
Big data. Small data. All data. You have access to an ever-expanding volume of data inside the walls of your business and out across the web. The potential in data is endless – from predicting election results to preventing the spread of epidemics. But how can you use it to your advantage to help move your business forward?
Data is growing exponentially and it’s now possible to mine and unlock insights from data in new and unexpected ways. Empower your business to take advantage of this data by harnessing the rich capabilities of Microsoft SQL Server and the familiarity of Microsoft Office to help organize, analyze, and make sense of your data—no matter the size.
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
We live in an era where the world is more connected than ever before and the trajectory is such that data relationships will only continue to increase with no signs of slowing down.
Connected data is the key to your business succeeding and growing in today’s connected world.
Leading enterprises will be the ones that utilize relationship-centric technologies to leverage connections from their internal operations and supply chain to their customer and user interactions. This ability to utilize connected data to understand all the nuanced relationships within their organization will propel them forward as they act on more holistic insights.
Every organization needs a knowledge graph because connected data is an essential foundation to advancing business. Knowledge graphs provide:
- Increased visibility between internal groups
- Efficiency gains
- Cross-functional data collaboration
- Core complete and reliable business insights
- Better customer engagement
The live presentation and discussion can be found here: https://youtu.be/7vBdlXzhs_4
Additional reading on why connected data is beneficial: https://www.graphgrid.com/why-connected-data-is-more-useful/
Connected data solutions available by Benjamin and his team via GraphGrid and AtomRain: https://www.graphgrid.com and https://www.atomrain.com
Data Engineering and the Data Science LifecycleAdam Doyle
Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
Introduction to our Data Platform from capture, processing, analysis and exploration.
The Google Cloud Platform products are based on our internal systems which are powering Google AdWords, Search, YouTube and our leading research in the field of real-time data analysis.
You can get access ($300 for 60 days) to our free trial through google.com/cloud
The 3 Key Barriers Keeping Companies from Deploying Data Products Dataiku
Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? During Strata Hadoop World Singapore 2015, Pauline Brown, Director of Marketing at Dataiku, explains how lack of collaboration is what is keeping companies from building and deploying data products effectively. Learn more about Dataiku and Data Science Studio: www.dataiku.com
BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages JaunesDataiku
Presentation made at Rennes in January for the handsome BreizhJUG. This is a mixed presentation for big data technologies, which covers topics such as : Why Hadoop ? What next ? Machine Learning for big data in practice.
Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013Dataiku
Our pitch at Data-Driven NYC meetup on September 17th (http://datadrivennyc.com).
Speaking about Data Scientists pains and how Dataiku Data Science Studio can help them to more than Data Cleaners and Data Leak Fixers !
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
Many organisations are creating groups dedicated to data. These groups have many names : Data Team, Data Labs, Analytics Teams….
But whatever the name, the success of those teams depends a lot on the quality of the data infrastructure and their ability to actually deploy data science applications in production.
In that regards a new role of “DataOps” is emerging. Similar, to Dev Ops for (Web) Dev, the Data Ops is a merge between a data engineer and a platform administrator. Well versed in cluster administration and optimisation, a data ops would have also a perspective on the quality of data quality and the relevance of predictive models.
Do you want to be a Data Ops ? We’ll discuss its role and challenges during this talk
How to Build a Successful Data Team - Florian Douetteau (@Dataiku) Dataiku
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem.
As Data Manager, you know the challenges ahead:
- Multitudes of technology choices to make
- Building a team and solving the skill-set disconnect
- Data can be deceiving...
- Figuring out what the successful data product must be
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.
Applied Data Science Course Part 1: Concepts & your first ML modelDataiku
In this first course of our Applied Data Science online course series, you'll learn about the mindset shift of going from small to big data, basic definitions and concepts, and an overview of the data science workflow.
Back to Square One: Building a Data Science Team from ScratchKlaas Bosteels
Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.
Implementing a machine learning solution from scratch requires a lot of resource investment before yielding results. It is tempting to look for off the shelf machine learning solutions that are easy to integrate within one’s product instead. In this talk, you will follow a real case example of how the need to solve a specific problem led to doing a benchmark on a series of machine learning services. You will learn how these services compare, and pick up some tips on how to conduct your own benchmarks along the way.
Inês Almeida is a machine learning enthusiast from Lisbon, Portugal, where she has given several talks on the topic, in particular on neural networks. Her goal is to share knowledge that is useful for newbies and experts alike. Inês has a Physics MSc. degree and currently works as a data scientist at Liquid Data Intelligence.
How to Build a Successful Data Team - Florian Douetteau @ PAPIs ConnectPAPIs.io
As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve that.
As Data Manager, you know the challenges ahead:
Multitudes of technology choices to make
Building a team and solving the skill-set disconnect
Data can be deceiving...
Figuring out what the successful data product must be
The goal of this talk is to provide some perspective to these topics
Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising and gaming industries, holding various data or CTO’s role. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains from the data enthusiasts and let them express their creativity.
PASS Summit Data Storytelling with R Power BI and AzureMLJen Stirrup
How can we use technology to help the organization make data-driven decision-making part of its organizational DNA, while retaining the context of the business as a whole? How can we imprint data in the culture of the organization and make it easily accessible to everyone? Microsoft directly empowers businesses to derive insights and value from little and big data, through its release of user-friendly analytics through Azure Machine Learning (ML) combined with its acquisition of Revolution Analytics. Power BI can be used to create compelling visual stories around the analysis so that the work is not left to the data consumer. Together, these technologies can be used to make data and analytics part of the organization's DNA.
There are no prerequisites, but attendees are welcome to follow along with the demo if they have an Azure ML and Power BI account and R installed. Files will be released before the session.
How the world of data analytics, science and insights is failing and how the principles from Agile, DevOps, and Lean are the way forward. #DataOps Given at DevOps Enterprise Summit 2019
Big data. Small data. All data. You have access to an ever-expanding volume of data inside the walls of your business and out across the web. The potential in data is endless – from predicting election results to preventing the spread of epidemics. But how can you use it to your advantage to help move your business forward?
Data is growing exponentially and it’s now possible to mine and unlock insights from data in new and unexpected ways. Empower your business to take advantage of this data by harnessing the rich capabilities of Microsoft SQL Server and the familiarity of Microsoft Office to help organize, analyze, and make sense of your data—no matter the size.
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Benjamin Nussbaum
We live in an era where the world is more connected than ever before and the trajectory is such that data relationships will only continue to increase with no signs of slowing down.
Connected data is the key to your business succeeding and growing in today’s connected world.
Leading enterprises will be the ones that utilize relationship-centric technologies to leverage connections from their internal operations and supply chain to their customer and user interactions. This ability to utilize connected data to understand all the nuanced relationships within their organization will propel them forward as they act on more holistic insights.
Every organization needs a knowledge graph because connected data is an essential foundation to advancing business. Knowledge graphs provide:
- Increased visibility between internal groups
- Efficiency gains
- Cross-functional data collaboration
- Core complete and reliable business insights
- Better customer engagement
The live presentation and discussion can be found here: https://youtu.be/7vBdlXzhs_4
Additional reading on why connected data is beneficial: https://www.graphgrid.com/why-connected-data-is-more-useful/
Connected data solutions available by Benjamin and his team via GraphGrid and AtomRain: https://www.graphgrid.com and https://www.atomrain.com
Data Engineering and the Data Science LifecycleAdam Doyle
Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.
Google Cloud Data Platform - Why Google for Data Analysis?Andreas Raible
Introduction to our Data Platform from capture, processing, analysis and exploration.
The Google Cloud Platform products are based on our internal systems which are powering Google AdWords, Search, YouTube and our leading research in the field of real-time data analysis.
You can get access ($300 for 60 days) to our free trial through google.com/cloud
Case Study - Gordon Foods Delivers Fresh Data to the CloudDATAVERSITY
The traditional ETL approach for moving data to the cloud is labor-intensive and costly, not to mention brittle and slow, draining organizations of time and resources that they just do not have.
In this webinar, you will hear from Gordon Food Service and how they sharpened their competitive edge by delivering the freshest data to Google Cloud and dished up a better customer experience through real-time data insights. You will discover how Qlik’s data integration platform enabled Gordon Food Service to successfully run their Data Modernization Analytics Program and build real-time analytic data pipelines, unlocking multiple data sources, to Google Cloud with simple yet powerful data delivery.
Register today and learn how Gordon Foods:
• Improved their Customer Experience
• Replaced slow custom replication scripts and speed up analytics
• Simplify and automate their real-time data streaming process
• Moves thousands of objects on a daily basis
Find out how your organization can breathe new life into your data in the cloud, stay ahead of changing demands while lowering over-reliance on resources, production time and costs.
Where the Warehouse Ends: A New Age of Information AccessInside Analysis
The Briefing Room with Barry Devlin and Composite Software
Live Webcast May 21, 2013
All good things must come to an end, and even though the data warehouse will remain a prominent force in the information age, the handwriting is all over the enterprise: the center of gravity is moving. Whether due to Big Data or real-time demands, Cloud computing or globalization, today's leading organizations have analytical needs that the warehouse simply cannot accommodate. That's why data virtualization continues to attract attention.
Register for this episode of The Briefing Room to hear veteran Analyst Barry Devlin explain why the traditional model for data warehousing is being outmoded by a range of more flexible methods for accessing and analyzing information assets. He'll be briefed by David Besemer of Composite Software who will discuss how his company's data virtualization platform can be used to provide access to all manner of information sources, including data warehouses, Big Data silos, as well as partner and public data sources on demand.
Visit: http://www.insideanalysis.com
Getting Started with Google Data StudioChris Burgess
Presented at the Melbourne SEO Meetup in September 2019, this slide deck offers a broad overview of the Google Data Studio product. It includes a walk through of the main features, resources to help you learn more, as well as some tips to help you with your own custom dashboards and reports.
Hadoop for Business Intelligence ProfessionalsSkillspeed
This is a presentation on Hadoop for BI Professionals who want to upgrade their career path to BIG Data technologies. Hadoop for Business Intelligence Professionals is a definite upgrade in terms of career growth, scope of worth and organization influence.
The PPT covers the following topics:
✓ What is BIG Data?
✓ What is Hadoop? Why is it so popular?
✓ Upgrading from BI to Hadoop
✓ Career Path
✓ Salary & Job Trends
✓ Hiring Companies
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Email: sales@skillspeed.com
Website: https://www.skillspeed.com
Social Data Week is a global summit to highlight the revolution happening in social data by bringing together industry leaders, experts, influencers and brands for thought-provoking discussions and exchange of ideas.
Modern Thinking: Cómo el Big Data y Cognitive están cambiando la estrategia de Marketing
Por: Ismael Yuste, Strategic Cloud Engineer Google Cloud
Presentación: Introducción a las soluciones Big Data de Google
The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures.
The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years.
This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture.
Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.
BIG Data & Hadoop Applications in LogisticsSkillspeed
Explore the applications of BIG Data & Hadoop in Logistics via Skillspeed.
BIG Data & Hadoop in Logistics is a key differentiator, especially in terms of optimizing back-end operations. They are used by companies for delivery optimization, demand & inventory forecasting and simplifying distribution networks.
To get more details regarding BIG Data & Hadoop, please visit - www.SkillSpeed.com
Introduction to MapReduce | MapReduce Architecture | MapReduce FundamentalsSkillspeed
This Hadoop MapReduce tutorial will unravel MapReduce Programming, MapReduce Commands, MapReduce Fundamentals, Driver Class, Mapper Class, Reducer Class, Job Tracker & Task Tracker.
At the end, you'll have a strong knowledge regarding Hadoop MapReduce Basics.
PPT Agenda:
✓ Introduction to BIG Data & Hadoop
✓ What is MapReduce?
✓ MapReduce Data Flows
✓ MapReduce Programming
----------
What is MapReduce?
MapReduce is a programming framework for distributed processing of large data-sets via commodity computing clusters. It is based on the principal of parallel data processing, wherein data is broken into smaller blocks rather than processed as a single block. This ensures a faster, secure & scalable solution. Mapreduce commands are based in Java.
----------
What are MapReduce Components?
It has the following components:
1. Combiner: The combiner collates all the data from the sample set based on your desired filters. For example, you can collate data based on day, week, month and year. After this, the data is prepared and sent for parallel processing.
2. Job Tracker: This allocates the data across multiple servers.
3. Task Tracker: This executes the program across various servers.
4. Reducer: It will isolate the desired output from across the multiple servers.
----------
Applications of MapReduce
1. Data Mining
2. Document Indexing
3. Business Intelligence
4. Predictive Modelling
5. Hypothesis Testing
----------
Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance.
Email: sales@skillspeed.com
Website: https://www.skillspeed.com
Google Cloud Connect @ Korea
- Google Cloud Vision
- G Suite Product Roadmap
- Google Cloud Security
- Google Cloud Machine Learning
- G suite Customer Stories
Run your code serverlessly on Google's open cloudwesley chun
This is a half-hour technical seminar on Google support of the open source ecosystem, a quick high-level overview/review of cloud computing in general, and then focuses on serverless compute products in Google Cloud and how the platforms are more open than ever!
Similar to Dataiku - google cloud platform roadshow - october 2013 (20)
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Dataiku
In our 3rd applied machine learning online course, we'll dive into different methods for data preparation, including handling missing values, dummification and rescaling.
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
In the second part of our applied machine learning online course, you'll get an overview of the different steps in the data science workflow as well as a deep dive in 3 basic types of models: linear, tree-based and clustering.
Before Kaggle : from a business goal to a Machine Learning problem Dataiku
Many think that a Data Science is like a Kaggle competition. There are, however big differences in the approach. This presentation is about designing carefully your evaluation scheme to avoid overfitting and unexpected production performances.
This is a presentation by Pierre Gutierrez (Dataiku’s data scientist).
Retrouvez l'intégralité de la présentation commune de Dataiku et Coyote sur la "Valorisation des données".
Cette présentation a été réalisée dans le cadre du Symposium du 04 Juin 2015, organisé par le Club Urba-EA et le Club Pilotes de Processus.
Plus d'informations sur www.dataiku.com
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
This is a presentation made on the 13th August 2014 at the SF Data Mining Meetup at Trulia. It's about Dataiku and the Kaggle Personalized Web Search Ranking challenge sponsored by Yandex
Dataiku big data paris - the rise of the hadoop ecosystemDataiku
Snapshot of the hadoop ecosystem at the beginning of 2014, with the rise of real time and in memory processing distributed frameworks that complement and supplant the Map Reduce paradigm
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
3. The Project
(c) Dataiku 2013 - Confidential
Hal Alowne
BI Manager
Dim’s Private Showroom
Dim Sum
CEO & Founder
Dim’s Private Showroom
Medium size e-commerce
• 100M$ revenue
• 1 Data Analyst
Big Guys
$10B + revenue
100+ Data Scientists
Hey Hal ! We need a big data
platform, like the big guys!
Let’s just do as they do!
4. Hal Wish #1
Global Customer Value Funnel
SEO
NewsLetter
Display
Retargeting
Display
AdWords Marketplace
Direct Sales
Delivery
View Basket
Support
Returns
$
$
$ $
Orders
5. Hal Wish #2
Why people drop basket ?
9/30/13 5
Basket
Payment refused
Credit Refused
Cheaper elsewhere ?
Delivery costs ?
Wait Xmas?
ACTION
6. Hal Wish #3
What product to put on top ?
9/30/13 6
Original
Most Popular on top
Better
Machine Learning Score
(age/discount/margin…)
Advanced
Machine Learning Score
+ Personalization
13. Dataiku
Open Source Web Tracker
(WT1)
} Apache License
} Javascript & IO
} Write directly to Google
Cloud Storage
} Full Java, Easy To Deploy
Step 1
Get your own data
9/30/13 13
Silent in night
Autoscale during Sales
summer and winter
14. Step 2
Mix All Your Data
9/30/13 14
4 VMs on GCE
Tracking Data
Internal Data
Partner Data
Data Science Studio
Pig
Hive
HADOOP
auto-sync
to BigQuery
15. Step 3
Mine your Data
9/30/13 15
Builtin Predictive Models
Advanced Adhoc Models
(R or Python)
Shared Web Based
Data Mining
Platform
16. } January
◦ Choose Partner / Setup the architecture
} February
◦ Initial Deployment : 4TB
◦ Replace BI
} May
◦ New Applications (SEO, …)
} September
◦ Scale Deployment to 15TB
◦ Integrate all channels
Typical Project Calendar
9/30/13 16
17. } Enhance Daily Report Availability
◦ Previous architecture
– Between H+17 and H+26 (!)
◦ Hadoop on GCE
– Between H+3 AND H+7
} +21% Email Channel Optimization
} SEO plan optimization
} and a dozen BI Style “apps”
Some Success For the Project
9/30/13 17
18. Thank you !
9/30/13 18
Follow us on twitter
@dataiku
Ask any big data question
florian.douetteau@dataiku.com