Dataiku - google cloud platform roadshow - october 2013

•

1 like•2,140 views

This document discusses Hal's need for a big data platform at his company Dim's Private Showroom. It outlines Hal's wishes to better understand customer behavior, determine which products to feature, and solve data and computing challenges. The document then introduces Dataiku and its open source data tracking and mining platform using Google Cloud and Hadoop. Finally, it provides an example project timeline and discusses early successes including improved report times and optimization of marketing channels.

Data
Science
Studio
19 customers
Founded in
January 2013
Data Science
For Everyone

(big) data(s)
+ machine learning
+ for practical applications
= Data Science

The Project
(c) Dataiku 2013 - Confidential
Hal Alowne
BI Manager
Dim’s Private Showroom
Dim Sum
CEO & Founder
Dim’s Private Showroom
Medium size e-commerce
•  100M$ revenue
•  1 Data Analyst
Big Guys
$10B + revenue
100+ Data Scientists
Hey Hal ! We need a big data
platform, like the big guys!
Let’s just do as they do!

Hal Wish #1 
Global Customer Value Funnel
SEO
NewsLetter
Display
Retargeting
Display
AdWords Marketplace
Direct Sales
Delivery
View Basket
Support
Returns
$
$
$ $
Orders

Hal Wish #2 
Why people drop basket ?
9/30/13 5
Basket
Payment refused
Credit Refused
Cheaper elsewhere ?
Delivery costs ?
Wait Xmas?
ACTION

Hal Wish #3 
What product to put on top ?
9/30/13 6
Original
Most Popular on top
Better
Machine Learning Score
(age/discount/margin…)
Advanced
Machine Learning Score
+ Personalization

Partner Data Spaghetti
Mailing
Partner
DMP
Partnerz
Mail
Optimizer
Retargeter
Market
Data
Providers
Social z
Networks

Database are Full
9/30/13 9
1 TB
BI Database
20 TB
BI Database
Any new computing job take
> 1 day
NEED FOR SCALE

Architecture Bingo
9/30/13 10
BI Real-TimeBatch Real Real-Time
Simple
Queries
Statistics
Machine
Learning
Hive
Pig
Spark
MongoDB
ElasticSearch
Cascading
R

Hadoop
Ceph
Sphere
Cassandra
Spark
Scikit-Learn
Mahout
WEKA
MLBase
RapidMiner
Panda
D3
Crossfilter
InfiniDB
LucidDB
Impala
Elastic Search
SOLR
MongoDB
Riak
Membase
Pig
Hive
Cascading
Talend
Machine Learning !
Mystery Land!
Scalability Central!NoSQL-Slavia!
SQL Columnar Republic!
Vizualization County!
Data Cleanup Wasteland!
Statistician Old !
House!
R

Hal’s Bingo !
9/30/13 12
HADOOP
Google Cloud Platform
Dataiku

Dataiku
Open Source Web Tracker
(WT1)
}  Apache License
}  Javascript & IO
}  Write directly to Google
Cloud Storage
}  Full Java, Easy To Deploy
Step 1 
Get your own data
9/30/13 13
Silent in night
Autoscale during Sales
summer and winter

Step 2 
Mix All Your Data
9/30/13 14
4 VMs on GCE
Tracking Data
Internal Data
Partner Data
Data Science Studio
Pig
Hive
HADOOP
auto-sync
to BigQuery

Step 3 
Mine your Data
9/30/13 15
Builtin Predictive Models
Advanced Adhoc Models
(R or Python)
Shared Web Based
Data Mining
Platform

}  January
◦  Choose Partner / Setup the architecture
}  February
◦  Initial Deployment : 4TB
◦  Replace BI
}  May
◦  New Applications (SEO, …)
}  September
◦  Scale Deployment to 15TB
◦  Integrate all channels
Typical Project Calendar
9/30/13 16

}  Enhance Daily Report Availability
◦  Previous architecture
–  Between H+17 and H+26 (!)
◦  Hadoop on GCE
–  Between H+3 AND H+7
}  +21% Email Channel Optimization
}  SEO plan optimization
}  and a dozen BI Style “apps”
Some Success For the Project
9/30/13 17

Thank you !
9/30/13 18
Follow us on twitter
@dataiku
Ask any big data question
florian.douetteau@dataiku.com

Beyond Predictive Analytics : Deploying apps to production and keep them improving Some smart companies have been putting predictive application in production for decades. Still, either because of lack of sharing or lack of generality, there is still no single and obvious way to put a predictive application in production today. As a consequence, for most companies, transitioning analytics from development to production is still “the next frontier”. Behind the single word "production” lays a great number of questions like: what exactly do you put in production: data, model, code all three ? Who is responsible for maintenance and quality check over time : business, tech or both ? How can I make my predictive app continuously improve and check that it delivers the promised business value over time ? What are the best practice for maintenance and updates by the way ? Will my data scientists keep working after first development or should I lay half of them off ? etc… Let’s make a small analogy with the development of web sites in the 90’s and early 00’s : Back then, the winners where not necessarily the web sites with an amazing design, but a winner had clearly made the necessary efforts and had a robust way to put their web site reliabily in production Today, every web developper can enjoy the confort of Heroku, Amazon, Github, docker, Angular, bootstrap … and so we forget. How much time before we get the same confort for the predictive world ?

Online Games Analytics - Data Science for FunDataiku

Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team

Dataiku

Getting from raw data to deploying data-driven solutions requires technology, data, and people. All of which exist. So why aren’t we seeing more truly data-driven companies: what's missing and why? During Strata Hadoop World Singapore 2015, Pauline Brown, Director of Marketing at Dataiku, explains how lack of collaboration is what is keeping companies from building and deploying data products effectively. Learn more about Dataiku and Data Science Studio: www.dataiku.com

The paradox of big data - dataiku / oxalide APEROTECH

Dataiku

BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes

Dataiku

Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013

Dataiku

Dataiku r users group v2Cdiscount

The Rise of the DataOps - Dataiku - J On the Beach 2016

Dataiku

Many organisations are creating groups dedicated to data. These groups have many names : Data Team, Data Labs, Analytics Teams…. But whatever the name, the success of those teams depends a lot on the quality of the data infrastructure and their ability to actually deploy data science applications in production. In that regards a new role of “DataOps” is emerging. Similar, to Dev Ops for (Web) Dev, the Data Ops is a merge between a data engineer and a platform administrator. Well versed in cluster administration and optimisation, a data ops would have also a perspective on the quality of data quality and the relevance of predictive models. Do you want to be a Data Ops ? We’ll discuss its role and challenges during this talk

Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014

Dataiku

How to Build a Successful Data Team - Florian Douetteau (@Dataiku)

Dataiku

As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve this problem. As Data Manager, you know the challenges ahead: - Multitudes of technology choices to make - Building a team and solving the skill-set disconnect - Data can be deceiving... - Figuring out what the successful data product must be Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising, and gaming industries, holding various data or CTO roles. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains encountered by data teams all around.

Applied Data Science Course Part 1: Concepts & your first ML model

Dataiku

Back to Square One: Building a Data Science Team from Scratch

Klaas Bosteels

Generally speaking, big data and data science originated in the west and are coming to Europe with a bit of a delay. There is at least one exception though: The London-based music discovery website Last.fm is a data company at heart and has been doing large-scale data processing and analysis for years. It started using Hadoop in early 2006, for instance, making it one of the earliest adopters worldwide. When I left Last.fm to join Massive Media, the social media company behind Netlog.com and Twoo.com, I basically moved from a data science forerunner to a newcomer. Massive Media had at least as much data to play with and tremendous potential, but they were not doing much with it yet. The data science team had to be build from the ground up and every step had to be argued for and justified along the way. Having done this exercise of evaluating everything I learned at Last.fm and starting over completely with a clean slate at Massive Media, I developed a pretty clear perspective on how to find good data scientists, what they should be doing, what tools they should be using, and how to organize them to work together efficiently as team, which is precisely what I would like to share in this talk.

Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

PAPIs.io

Implementing a machine learning solution from scratch requires a lot of resource investment before yielding results. It is tempting to look for off the shelf machine learning solutions that are easy to integrate within one’s product instead. In this talk, you will follow a real case example of how the need to solve a specific problem led to doing a benchmark on a series of machine learning services. You will learn how these services compare, and pick up some tips on how to conduct your own benchmarks along the way. Inês Almeida is a machine learning enthusiast from Lisbon, Portugal, where she has given several talks on the topic, in particular on neural networks. Her goal is to share knowledge that is useful for newbies and experts alike. Inês has a Physics MSc. degree and currently works as a data scientist at Liquid Data Intelligence.

Dataiku - for Data Geek Paris@Criteo - Close the Data Circle

Dataiku

How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect

PAPIs.io

As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve that. As Data Manager, you know the challenges ahead: Multitudes of technology choices to make Building a team and solving the skill-set disconnect Data can be deceiving... Figuring out what the successful data product must be The goal of this talk is to provide some perspective to these topics Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising and gaming industries, holding various data or CTO’s role. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains from the data enthusiasts and let them express their creativity.

PASS Summit Data Storytelling with R Power BI and AzureML

Jen Stirrup

How can we use technology to help the organization make data-driven decision-making part of its organizational DNA, while retaining the context of the business as a whole? How can we imprint data in the culture of the organization and make it easily accessible to everyone? Microsoft directly empowers businesses to derive insights and value from little and big data, through its release of user-friendly analytics through Azure Machine Learning (ML) combined with its acquisition of Revolution Analytics. Power BI can be used to create compelling visual stories around the analysis so that the work is not left to the data consumer. Together, these technologies can be used to make data and analytics part of the organization's DNA. There are no prerequisites, but attendees are welcome to follow along with the demo if they have an Azure ML and Power BI account and R installed. Files will be released before the session.

Walmart Big Data Expo

BigDataExpo

Your Data Nerd Friends Need You!

DataKitchen

Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...

VMware Tanzu

Data Culture Series - Keynote - 24th feb

Jonathan Woodward

Big data. Small data. All data. You have access to an ever-expanding volume of data inside the walls of your business and out across the web. The potential in data is endless – from predicting election results to preventing the spread of epidemics. But how can you use it to your advantage to help move your business forward? Data is growing exponentially and it’s now possible to mine and unlock insights from data in new and unexpected ways. Empower your business to take advantage of this data by harnessing the rich capabilities of Microsoft SQL Server and the familiarity of Microsoft Office to help organize, analyze, and make sense of your data—no matter the size.

Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...

Benjamin Nussbaum

We live in an era where the world is more connected than ever before and the trajectory is such that data relationships will only continue to increase with no signs of slowing down. Connected data is the key to your business succeeding and growing in today’s connected world. Leading enterprises will be the ones that utilize relationship-centric technologies to leverage connections from their internal operations and supply chain to their customer and user interactions. This ability to utilize connected data to understand all the nuanced relationships within their organization will propel them forward as they act on more holistic insights. Every organization needs a knowledge graph because connected data is an essential foundation to advancing business. Knowledge graphs provide: - Increased visibility between internal groups - Efficiency gains - Cross-functional data collaboration - Core complete and reliable business insights - Better customer engagement The live presentation and discussion can be found here: https://youtu.be/7vBdlXzhs_4 Additional reading on why connected data is beneficial: https://www.graphgrid.com/why-connected-data-is-more-useful/ Connected data solutions available by Benjamin and his team via GraphGrid and AtomRain: https://www.graphgrid.com and https://www.atomrain.com

Data Engineering and the Data Science Lifecycle

Adam Doyle

Everyone wants to be a data scientist. Data modeling is the hottest thing since Tickle Me Elmo. But data scientists don’t work alone. They rely on data engineers to help with data acquisition and data shaping before their model can be developed. They rely on data engineers to deploy their model into production. Once the model is in production, the data engineer’s job isn’t done. The model must be monitored to make sure that it retains its predictive power. And when the model slips, the data engineer and the data scientist need to work together to correct it through retraining or remodeling.

Google Cloud Data Platform - Why Google for Data Analysis?

Andreas Raible

How Google Does Big Data - DevNexus 2014James Chittenden

What's hot

The 3 Key Barriers Keeping Companies from Deploying Data Products

Dataiku

The paradox of big data - dataiku / oxalide APEROTECH

Dataiku

BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes

Dataiku

Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013

Dataiku

Dataiku r users group v2Cdiscount

The Rise of the DataOps - Dataiku - J On the Beach 2016

Dataiku

Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014

Dataiku

How to Build a Successful Data Team - Florian Douetteau (@Dataiku)

Dataiku

Applied Data Science Course Part 1: Concepts & your first ML model

Dataiku

Back to Square One: Building a Data Science Team from Scratch

Klaas Bosteels

Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

PAPIs.io

Dataiku - for Data Geek Paris@Criteo - Close the Data Circle

Dataiku

How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect

PAPIs.io

As you walk into your office on Monday morning, before you've even had a chance to grab a cup of coffee, your CEO asks to see you. He's worried: both customer churn and fraudulent transactions have increased over the past 6 months. As Data Manager, you have 6 months to solve that. As Data Manager, you know the challenges ahead: Multitudes of technology choices to make Building a team and solving the skill-set disconnect Data can be deceiving... Figuring out what the successful data product must be The goal of this talk is to provide some perspective to these topics Florian works in the “data” field since 01’, back when it was not yet big. He worked in successful startups in search engine, advertising and gaming industries, holding various data or CTO’s role. He started Dataiku in 2013, his first venture as a CEO, with the goal of alleviating the daily pains from the data enthusiasts and let them express their creativity.

PASS Summit Data Storytelling with R Power BI and AzureML

Jen Stirrup

Walmart Big Data Expo

BigDataExpo

Your Data Nerd Friends Need You!

DataKitchen

Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...

VMware Tanzu

Data Culture Series - Keynote - 24th feb

Jonathan Woodward

Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...

Benjamin Nussbaum

Data Engineering and the Data Science Lifecycle

Adam Doyle

What's hot (20)

The 3 Key Barriers Keeping Companies from Deploying Data Products

The paradox of big data - dataiku / oxalide APEROTECH

BreizhJUG - Janvier 2014 - Big Data - Dataiku - Pages Jaunes

Dataiku, Pitch at Data-Driven NYC, New York City, September 17th 2013

Dataiku r users group v2

The Rise of the DataOps - Dataiku - J On the Beach 2016

Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014

How to Build a Successful Data Team - Florian Douetteau (@Dataiku)

Applied Data Science Course Part 1: Concepts & your first ML model

Back to Square One: Building a Data Science Team from Scratch

Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Dataiku - for Data Geek Paris@Criteo - Close the Data Circle

How to Build a Successful Data Team - Florian Douetteau @ PAPIs Connect

PASS Summit Data Storytelling with R Power BI and AzureML

Walmart Big Data Expo

Your Data Nerd Friends Need You!

Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...

Data Culture Series - Keynote - 24th feb

Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...

Data Engineering and the Data Science Lifecycle

Similar to Dataiku - google cloud platform roadshow - october 2013

Google Cloud Data Platform - Why Google for Data Analysis?

Andreas Raible

How Google Does Big Data - DevNexus 2014James Chittenden

Case Study - Gordon Foods Delivers Fresh Data to the Cloud

DATAVERSITY

The traditional ETL approach for moving data to the cloud is labor-intensive and costly, not to mention brittle and slow, draining organizations of time and resources that they just do not have. In this webinar, you will hear from Gordon Food Service and how they sharpened their competitive edge by delivering the freshest data to Google Cloud and dished up a better customer experience through real-time data insights. You will discover how Qlik’s data integration platform enabled Gordon Food Service to successfully run their Data Modernization Analytics Program and build real-time analytic data pipelines, unlocking multiple data sources, to Google Cloud with simple yet powerful data delivery. Register today and learn how Gordon Foods: • Improved their Customer Experience • Replaced slow custom replication scripts and speed up analytics • Simplify and automate their real-time data streaming process • Moves thousands of objects on a daily basis Find out how your organization can breathe new life into your data in the cloud, stay ahead of changing demands while lowering over-reliance on resources, production time and costs.

Where the Warehouse Ends: A New Age of Information Access

Inside Analysis

The Briefing Room with Barry Devlin and Composite Software Live Webcast May 21, 2013 All good things must come to an end, and even though the data warehouse will remain a prominent force in the information age, the handwriting is all over the enterprise: the center of gravity is moving. Whether due to Big Data or real-time demands, Cloud computing or globalization, today's leading organizations have analytical needs that the warehouse simply cannot accommodate. That's why data virtualization continues to attract attention. Register for this episode of The Briefing Room to hear veteran Analyst Barry Devlin explain why the traditional model for data warehousing is being outmoded by a range of more flexible methods for accessing and analyzing information assets. He'll be briefed by David Besemer of Composite Software who will discuss how his company's data virtualization platform can be used to provide access to all manner of information sources, including data warehouses, Big Data silos, as well as partner and public data sources on demand. Visit: http://www.insideanalysis.com

Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016

Chris Jang

HadoopWorkshopJuly2014

Dieter De Witte

Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...

Codecamp Romania

Google Cloud Machine Learning

India Quotient

Workshop on Google Cloud Data Platform

GoDataDriven

Getting Started with Google Data Studio

Chris Burgess

Hadoop for Business Intelligence Professionals

Skillspeed

This is a presentation on Hadoop for BI Professionals who want to upgrade their career path to BIG Data technologies. Hadoop for Business Intelligence Professionals is a definite upgrade in terms of career growth, scope of worth and organization influence. The PPT covers the following topics: ✓ What is BIG Data? ✓ What is Hadoop? Why is it so popular? ✓ Upgrading from BI to Hadoop ✓ Career Path ✓ Salary & Job Trends ✓ Hiring Companies ---------- Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance. Email: sales@skillspeed.com Website: https://www.skillspeed.com

Social Data Week - London - Google Session

Tableau Software

Modern Thinking área digital MSKM 21/09/2017

MSMK - Madrid School of Marketing

Architecture of Big Data Solutions

Guido Schmutz

The right architecture is key for any IT project. This is especially the case for big data projects, where there are no standard architectures which have proven their suitability over years. This session discusses the different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Streaming Analytics architecture as well as Lambda and Kappa architecture and presents the mapping of components from both Open Source as well as the Oracle stack onto these architectures. The right architecture is key for any IT project. This is valid in the case for big data projects as well, but on the other hand there are not yet many standard architectures which have proven their suitability over years. This session discusses different Big Data Architectures which have evolved over time, including traditional Big Data Architecture, Event Driven architecture as well as Lambda and Kappa architecture. Each architecture is presented in a vendor- and technology-independent way using a standard architecture blueprint. In a second step, these architecture blueprints are used to show how a given architecture can support certain use cases and which popular open source technologies can help to implement a solution based on a given architecture.

Big Data & Open Source - Neil JadhavSwapnil (Neil) Jadhav

Big Data Platform and Architecture Recommendation

Sofyan Hadi AHmad

BIG Data & Hadoop Applications in Logistics

Skillspeed

Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals

Skillspeed

This Hadoop MapReduce tutorial will unravel MapReduce Programming, MapReduce Commands, MapReduce Fundamentals, Driver Class, Mapper Class, Reducer Class, Job Tracker & Task Tracker. At the end, you'll have a strong knowledge regarding Hadoop MapReduce Basics. PPT Agenda: ✓ Introduction to BIG Data & Hadoop ✓ What is MapReduce? ✓ MapReduce Data Flows ✓ MapReduce Programming ---------- What is MapReduce? MapReduce is a programming framework for distributed processing of large data-sets via commodity computing clusters. It is based on the principal of parallel data processing, wherein data is broken into smaller blocks rather than processed as a single block. This ensures a faster, secure & scalable solution. Mapreduce commands are based in Java. ---------- What are MapReduce Components? It has the following components: 1. Combiner: The combiner collates all the data from the sample set based on your desired filters. For example, you can collate data based on day, week, month and year. After this, the data is prepared and sent for parallel processing. 2. Job Tracker: This allocates the data across multiple servers. 3. Task Tracker: This executes the program across various servers. 4. Reducer: It will isolate the desired output from across the multiple servers. ---------- Applications of MapReduce 1. Data Mining 2. Document Indexing 3. Business Intelligence 4. Predictive Modelling 5. Hypothesis Testing ---------- Skillspeed is a live e-learning company focusing on high-technology courses. We provide live instructor led training in BIG Data & Hadoop featuring Realtime Projects, 24/7 Lifetime Support & 100% Placement Assistance. Email: sales@skillspeed.com Website: https://www.skillspeed.com

Google Cloud Connect Korea - Sep 2017

Google Cloud Korea

Run your code serverlessly on Google's open cloud

wesley chun

Similar to Dataiku - google cloud platform roadshow - october 2013 (20)

Google Cloud Data Platform - Why Google for Data Analysis?

How Google Does Big Data - DevNexus 2014

Case Study - Gordon Foods Delivers Fresh Data to the Cloud

Where the Warehouse Ends: A New Age of Information Access

Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016

HadoopWorkshopJuly2014

Bogdan botea, dmitry nefedkin no fiddle, efficient development on the googl...

Google Cloud Machine Learning

Workshop on Google Cloud Data Platform

Getting Started with Google Data Studio

Hadoop for Business Intelligence Professionals

Social Data Week - London - Google Session

Modern Thinking área digital MSKM 21/09/2017

Architecture of Big Data Solutions

Big Data & Open Source - Neil Jadhav

Big Data Platform and Architecture Recommendation

BIG Data & Hadoop Applications in Logistics

Introduction to MapReduce | MapReduce Architecture | MapReduce Fundamentals

Google Cloud Connect Korea - Sep 2017

Run your code serverlessly on Google's open cloud

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Product School

Leading Change strategies and insights for effective change management pdf 1.pdf

OnBoard

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

Ramesh Iyer

In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.

"Impact of front-end architecture on development cost", Viktor Turskyi

Fwdays

I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.

GraphRAG is All You need? LLM & Knowledge Graph

Guy Korland

Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs. 1. Unifying Large Language Models and Knowledge Graphs: A Roadmap. https://arxiv.org/abs/2306.08302 2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs: https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/

Mission to Decommission: Importance of Decommissioning Products to Increase E...

Product School

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

FIDO Alliance

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

DanBrown980551

Do you want to learn how to model and simulate an electrical network from scratch in under an hour? Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)! During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook. PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides: - A fully editable and extendable library for grid component modelling; - Visualization tools to display your network; - Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses; The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well. What you will learn during the webinar: - For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills; - For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.

PHP Frameworks: I want to break free (IPC Berlin 2024)

Ralf Eggert

In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development. This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.

ODC, Data Fabric and Architecture User Group

CatarinaPereira64715

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

Product School

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Sri Ambati

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Inflectra

In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring. Learn about: • The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks. • Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective. • Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification. • Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process. Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.

Key Trends Shaping the Future of Infrastructure.pdf

Cheryl Hung

Accelerate your Kubernetes clusters with Varnish Caching

Thijs Feryn

The Future of Platform Engineering

Jemma Hussein Allen

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

UiPathCommunity

💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™: See how to accelerate model training and optimize model performance with active learning Learn about the latest enhancements to out-of-the-box document processing – with little to no training required Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath. Speakers: 👨‍🏫 Andras Palfi, Senior Product Manager, UiPath 👩‍🏫 Lenka Dulovicova, Product Program Manager, UiPath

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

FIDO Alliance

Knowledge engineering: from people to machines and back

Elena Simperl

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Tobias Schneck

As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other? Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...

Leading Change strategies and insights for effective change management pdf 1.pdf

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...

"Impact of front-end architecture on development cost", Viktor Turskyi

GraphRAG is All You need? LLM & Knowledge Graph

Mission to Decommission: Importance of Decommissioning Products to Increase E...

FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...

PHP Frameworks: I want to break free (IPC Berlin 2024)

ODC, Data Fabric and Architecture User Group

De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality

Key Trends Shaping the Future of Infrastructure.pdf

Accelerate your Kubernetes clusters with Varnish Caching

The Future of Platform Engineering

Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...

FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf

Knowledge engineering: from people to machines and back

Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024

Dataiku - google cloud platform roadshow - october 2013

1. Data Science Studio 19 customers Founded in January 2013 Data Science For Everyone

2. (big) data(s) + machine learning + for practical applications = Data Science

3. The Project (c) Dataiku 2013 - Confidential Hal Alowne BI Manager Dim’s Private Showroom Dim Sum CEO & Founder Dim’s Private Showroom Medium size e-commerce •  100M$ revenue •  1 Data Analyst Big Guys $10B + revenue 100+ Data Scientists Hey Hal ! We need a big data platform, like the big guys! Let’s just do as they do!

4. Hal Wish #1  Global Customer Value Funnel SEO NewsLetter Display Retargeting Display AdWords Marketplace Direct Sales Delivery View Basket Support Returns $ $ $ $ Orders

5. Hal Wish #2  Why people drop basket ? 9/30/13 5 Basket Payment refused Credit Refused Cheaper elsewhere ? Delivery costs ? Wait Xmas? ACTION

6. Hal Wish #3  What product to put on top ? 9/30/13 6 Original Most Popular on top Better Machine Learning Score (age/discount/margin…) Advanced Machine Learning Score + Personalization

7. 9/30/13 7 Why is it so complicated ?

8. Partner Data Spaghetti Mailing Partner DMP Partnerz Mail Optimizer Retargeter Market Data Providers Social z Networks

9. Database are Full 9/30/13 9 1 TB BI Database 20 TB BI Database Any new computing job take > 1 day NEED FOR SCALE

10. Architecture Bingo 9/30/13 10 BI Real-TimeBatch Real Real-Time Simple Queries Statistics Machine Learning Hive Pig Spark MongoDB ElasticSearch Cascading R

11. Hadoop Ceph Sphere Cassandra Spark Scikit-Learn Mahout WEKA MLBase RapidMiner Panda D3 Crossfilter InfiniDB LucidDB Impala Elastic Search SOLR MongoDB Riak Membase Pig Hive Cascading Talend Machine Learning ! Mystery Land! Scalability Central!NoSQL-Slavia! SQL Columnar Republic! Vizualization County! Data Cleanup Wasteland! Statistician Old ! House! R

12. Hal’s Bingo ! 9/30/13 12 HADOOP Google Cloud Platform Dataiku

13. Dataiku Open Source Web Tracker (WT1) }  Apache License }  Javascript & IO }  Write directly to Google Cloud Storage }  Full Java, Easy To Deploy Step 1  Get your own data 9/30/13 13 Silent in night Autoscale during Sales summer and winter

14. Step 2  Mix All Your Data 9/30/13 14 4 VMs on GCE Tracking Data Internal Data Partner Data Data Science Studio Pig Hive HADOOP auto-sync to BigQuery

15. Step 3  Mine your Data 9/30/13 15 Builtin Predictive Models Advanced Adhoc Models (R or Python) Shared Web Based Data Mining Platform

16. }  January ◦  Choose Partner / Setup the architecture }  February ◦  Initial Deployment : 4TB ◦  Replace BI }  May ◦  New Applications (SEO, …) }  September ◦  Scale Deployment to 15TB ◦  Integrate all channels Typical Project Calendar 9/30/13 16

17. }  Enhance Daily Report Availability ◦  Previous architecture –  Between H+17 and H+26 (!) ◦  Hadoop on GCE –  Between H+3 AND H+7 }  +21% Email Channel Optimization }  SEO plan optimization }  and a dozen BI Style “apps” Some Success For the Project 9/30/13 17

18. Thank you ! 9/30/13 18 Follow us on twitter @dataiku Ask any big data question florian.douetteau@dataiku.com

Dataiku - google cloud platform roadshow - october 2013

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Dataiku - google cloud platform roadshow - october 2013

Similar to Dataiku - google cloud platform roadshow - october 2013 (20)

More from Dataiku

More from Dataiku (14)

Recently uploaded

Recently uploaded (20)

Dataiku - google cloud platform roadshow - october 2013