Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to Fail your Data Lab Implementation"

•

1 like•492 views

Anne-Sophie Roessler, International Business Developer at Dataiku presented "3 ways to Fail your Data Lab Implementation" as part of the Big Data, Berlin v 8.0 meetup organised on the 14th of July 2016 at the WeWork headquarters.

Technology

Three ways to Fail your Data Lab
Implementation

DataLabs
10 M€ in 2014121 499 M€ in2014 3 029 M€ in2015
5 454 M€ in2014816 M€ in201410 M€ in 2008
Marketing/ Web
ü Behavioral segmentation
ü Churn prediction
ü Sales forecast
ü Dynamic Pricing
Industrie& Infrastructure
ü Predictive maintenance
ü Logistic Optimization
ü Smart Cities
Bank & Insurance
ü Fraud detection
ü Riskanticipation
ü Lifetime moment detection

Why a data Lab?
• 1 single Workflow : from a segmentated workflow to a transversal one
• Several use cases: Ability to adress many different data centric topics within a
single unit
• Multiple competences: Business focused approached mixing many different
competences
• End to end projects : combining data from different sources to handle several
aspects on a single topic

Deployment ofthe
predictions
Dataiku DSSfor fraud prediction
Client service
Sensor data
Garage data
Administration
• 1 Project Owner (IT)
• 1 Project Manager (Business)
• 1 Data scientist in house
• 3 data scientist sfrom 3 different firms
• 3 consultants from 3 different firms
• 1 architect (external)
Accepted file
INVESTIGATE !
Thetransactions areblocked
dependingontheir gap with the
business rules and behavioral
patterns

Focuson the framework,not on the input
Data
Acquisition &
Understanding
Data
Preparation
Model Creation
Evaluation Deployment
Scored
dataset
Scored
dataset
Iteration 1
Iteration 2
Iteration n
✓ Read and import raw data
✓ Detect schemas and structure
✓ Analyze distributions
✓ Assess quality: outliers,
missing values...
✓ Performance metrics
✓ Robustness & generalization
(cross validation)
✓ Insights (eg variable importance)
✓ Create derived and
aggregated variables
→ Analytical dataset
→ Report
✓ Feature selection
✓ Compare algorithms
✓ Scoring engine
✓ Publish predictions
✓ Monitor performance
✓ API
Business
Understanding
Adapted from the CRISP-DM methodology
Dataset
1
Dataset
2
Dataset
n

People and Governance
?
PolyglottVS dictator
Problems :
• Collaboration between
technical and non
technical profiles inside
a single project
• Nécessary
collaboration between
business and tech
teams to adress
transversal projects
accurately
Focus :
• Promote diversity
• …within a workflow
centric environment

End to end, from prototyping into production
Do it you way …

DataLab Organisation
Data Lab
Lab Environment
MultydisciplinaryTeam:
Direction/ Project Management
Business Analysts
Data Miners / Data Scientists
Production Environment
Business needs
Internal Data
sources
External
datasources
Missions :
Priorisationof the business needs
Prototyping /Agile solution engineering
Support for Apps deployment
Business Applications
Marketing CampaignAutomation
Reporting webanalytics
Data as A Service Platform
Conceptionof“DATAPRODUCTS”
Integration of DataProducts
OptimisationEngine
Real Time Scoring
Data Flow
Insights & Services
Processing chain
API Deployment

"Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & Co-Founder of Pivii Technologies Watch videos from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo Visit the conference website to learn more: www.datanatives.io Follow Data Natives: https://www.facebook.com/DataNatives https://twitter.com/DataNativesConf https://www.youtube.com/c/DataNatives Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS About the Author: Vladislav is an entrepreneur, machine learning enthusiast, and DevOps geek. Currently, he is co-founding a startup, running a data engineering consulting business, traveling and writing on data-related topics.

Applied Data Science Part 3: Getting dirty; data preparation and feature crea...

Dataiku

Driving Datascience at scale using Postgresql, Greenplum and Dataiku - Greenp...

VMware Tanzu

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...

Dataconomy Media

Counting Unique Users in Real-Time: Here's a Challenge for You!

DataWorks Summit

Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time. To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion. Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues. In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing. We will also provide guidelines and best practices with regards to Druid. Topics include : * The need and possible solutions * Intro to Druid and ThetaSketch * How we use Druid * Guidelines and pitfalls

Roularta is a leading publishing company in Belgium. As digital news and channels move at a rapid pace and contain massive volumes of data, Roularta decided in 2019 to invest in a Spark-based data platform to drive true real-time website analytics and unlock insights on previously untouched (big) data sources. In this talk we’ll first explain why and how Roularta embarked from a classical data warehouse to a Spark-based Lakehouse using Delta. We’ll outline the series of publishing & marketing use-cases done in the last 12 months and highlight for each use-case the advantages of Spark and how the team further tuned performance to truly deliver insights with high velocity.

The Yellowbrick Impact for MicroStrategy

Yellowbrick Data

The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity

Denodo

Watch full webinar here: https://bit.ly/3x7xVuR Organizations worldwide are adopting a variety of the public cloud service providers (i.e. AWS, Google, Microsoft) and each have a portfolio of storage, compute, network, and security options. All of which create significant challenges in managing a hybrid and multi-cloud enterprise architecture. Even worse is the impact to the governance and integration of data from the clouds and physical infrastructure to support the broad array of analytics and operational requirements. Can one public cloud provider meet all your needs today and in the future? How do you manage across multiple public and private clouds you have today and where your data exists? And, how would you manage and operate your multi-cloud and on-premises systems to gain value from your data in any of them? The Chief Research Officer at Ventana Research, Mark Smith, will expound the challenges and path ahead for virtualization and integration of your data and the clouds, setting an architectural path for best success.

Data Science Day New York: Data Science: A Personal History

Cloudera, Inc.

ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup

Romain Yon

Original event: https://www.meetup.com/NYC-Machine-Learning/events/256605862/ -- "Doing large scale ML in production is hard" – Everyone who's tried This talk is focussed on ML Systems. Especially the less obvious pitfalls, which have caused us troubles at Spotify. This talk assumes a certain level of familiarity with ML: You'll get the most out of if you've some experience with applied ML, ideally on production systems. Romain Yon is a Staff ML Engineer at Spotify. Over the years, Romain has worked on many of the core ML systems that power Spotify today (Music Recommendation, Catalog Quality, Search Ranking, Ads, ..). During the past year, Romain has been mostly focusing on designing reusable ML Infrastructure that can be leveraged throughout Spotify. Prior to Spotify, Romain co-founded the startup https://linkurio.us while getting his MSc in ML from Georgia Tech.

AzureDay - Introduction Big Data Analytics.

Łukasz Grala

Business Innovations Through Big Data Analytics - 30th November 2017

sisira samarasinghe

Introduction to Cloud Applications

DataStax

Big data from the trenches

Azrul MADISA

DataVirtulizationBhavendra Chavan

Building a Distributed Collaborative Data Pipeline with Apache Spark

Databricks

The year of COVID-19 pandemic has spotlighted as never before the many shortcomings of the world’s data management workflows. The lack of established ways to exchange and access data was a highly recognized contributing factor in our poor response to the pandemic. On multiple occasions we have witnessed how our poor practices around reproducibility and provenance have completely sidetracked major vaccine research efforts, prompting many calls for action from scientific and medical communities to address these problems.

The Big Data Ecosystem for Financial Services

DataStax

Agile enterprise analytics on aws

Don Gillis

Big data-science-oanycOpen Analytics

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...

Databricks

Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes

Provectus

Maxim Tereschenko (BigData Lead, Provectus) with the talk "Analytical Systems Evolution - From Excel to Big Data Platforms and Data Lakes". Description: For the last ten years, analytical systems have changed dramatically. From Excel and Data Warehouses, we came to Big Data platforms and Data Lakes. It is no longer fantasy to communicate with the analytical system by voice or to wander in 3D glasses among the visualizations of the data. In scope of the speech, I want to follow this evolution, identify its main trends and fantasize about the future.

Enable Advanced Analytics with Hadoop and an Enterprise Data HubCloudera, Inc.

How OpenTable uses Big Data to impact growth by Raman Marya

Data Con LA

Applied Data Science Course Part 2: the data science workflow and basic model...

Dataiku

Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

IT Arena

Dr. Nadine Schöne is a Senior Solutions Architect at Dataiku in Berlin. In this role, she deals with all aspects of the data value chain for all users – including integration of data sources, ETL, cooperation, statistics, modelling, but also operationalization, monitoring, automatization and security during production. She regularly talks at conferences, holds webinars and writes articles. Speech Overview: How can you get the most out of your data – while staying flexible in your choice of infrastructure and without having to integrate a multitude of tools for the different personas involved? Maximizing the value you get out of your data is a necessity today. Looking at the whole picture as well as careful planning are the key for success. We will have a look at the complete data value chain from end to end: from the data stores, collaboration features, data preparation, visualization and automation capabilities, and external compute to scheduling, operationalization, monitoring and security.

Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...

Precisely

The advanced analytics and AI that run today’s businesses rely on a larger volume, and greater variety, of data. This data needs to be of the highest quality to ensure the best possible outcomes, but traditional data quality tools weren’t designed for today’s modern data environments. That’s why we’ve developed Trillium DQ for Big Data -- an integrated product that delivers industry-leading data profiling and data quality at scale, in the cloud or on premises. In this on-demand webcast, you will learn how Trillium DQ: • Empowers data analysts to easily profile large, diverse data sources to discover new insights, uncover issues, and report on their findings – all without involving IT. • Delivers best-in-class entity resolution to support mission-critical applications such as Customer 360, fraud detection, AML, and predictive analytics. • Supports Cloud and hybrid architectures by providing consistent high-performance processing within critical time windows on all platforms. • Keeps enterprise data lakes validated, clean, and trusted with the highest quality data – without technical expertise in big data or distributed architectures. • Enables data quality monitoring based on targeted business rules for data governance and business insight

What's hot

Building data "Py-pelines"

Rob Winters

How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...

Databricks

The Yellowbrick Impact for MicroStrategy

Yellowbrick Data

The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity

Denodo

Data Science Day New York: Data Science: A Personal History

Cloudera, Inc.

ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup

Romain Yon

AzureDay - Introduction Big Data Analytics.

Łukasz Grala

Business Innovations Through Big Data Analytics - 30th November 2017

sisira samarasinghe

Introduction to Cloud Applications

DataStax

Big data from the trenches

Azrul MADISA

DataVirtulizationBhavendra Chavan

Building a Distributed Collaborative Data Pipeline with Apache Spark

Databricks

The Big Data Ecosystem for Financial Services

DataStax

Agile enterprise analytics on aws

Don Gillis

Big data-science-oanycOpen Analytics

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...

Databricks

Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes

Provectus

Enable Advanced Analytics with Hadoop and an Enterprise Data HubCloudera, Inc.

How OpenTable uses Big Data to impact growth by Raman Marya

Data Con LA

Applied Data Science Course Part 2: the data science workflow and basic model...

Dataiku

What's hot (20)

Building data "Py-pelines"

How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...

The Yellowbrick Impact for MicroStrategy

The Virtualization of Clouds - The New Enterprise Data Architecture Opportunity

Data Science Day New York: Data Science: A Personal History

ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML Meetup

AzureDay - Introduction Big Data Analytics.

Business Innovations Through Big Data Analytics - 30th November 2017

Introduction to Cloud Applications

Big data from the trenches

DataVirtulization

Building a Distributed Collaborative Data Pipeline with Apache Spark

The Big Data Ecosystem for Financial Services

Agile enterprise analytics on aws

Big data-science-oanyc

Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...

Analytical Systems Evolution: From Excel to Big Data Platforms and Data Lakes

Enable Advanced Analytics with Hadoop and an Enterprise Data Hub

How OpenTable uses Big Data to impact growth by Raman Marya

Applied Data Science Course Part 2: the data science workflow and basic model...

Similar to Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to Fail your Data Lab Implementation"

Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

IT Arena

Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...

Precisely

Building the Artificially Intelligent Enterprise

Databricks

This session looks at where we are today with data and analytics and what is needed to transition to the Artificially Intelligent Enterprise. How do you mobilise developers to exploit what data scientists and business analysts have built? How do you align it all with business strategy to maximise business outcomes? How do you combine BI, predictive and prescriptive analytics, automation and reinforcement learning to get maximum value across the enterprise? What is the blueprint for building the artificially intelligent enterprise? •Data and analytics – Where are we? •Why is the journey only half-way done? •2021 and beyond – The new era of AI usage and not just build •The requirement – event-driven, on-demand and automated analytics •Operationalising what you build – DataOps, MLOps and RPA •Mobilising the masses to integrate AI into processes – what needs to be done? •Business strategy alignment – the guiding light to AI utilisation for high reward •Agility step change – the shift to no-code integration of AI by citizen developers •Recording decisions, and analysing business impact •Reinforcement-learning – transitioning to continuous reward

Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...

Denodo

Watch full webinar here: https://bit.ly/3xj6fnm Presented at Chief Data Officer Live 2021 A/NZ The world is changing faster than ever. And for companies to compete and succeed they need to be agile in order to respond quickly to market changes and emerging opportunities. Data plays an integral role in achieving this business agility. However, given the complex nature of the enterprise data architecture finding and analysing data is an increasingly challenging task. Data virtualization is a modern data integration technique that integrates data in real-time, without having to physically replicate it. Watch on-demand this session to understand what data virtualization is and how it: - Delivers data in real-time, and without replication - Creates a logical architecture to provide a single view of truth - Centralises the data governance and security framework - Democratises data for faster decision making and business agility

Data Science Operationalization: The Journey of Enterprise AI

Denodo

Watch full webinar here: https://bit.ly/3kVmYJl As we move into a world driven by AI initiatives, we find ourselves facing new and diverse challenges when it comes to operationalization. Creating a solution and putting it into practice, is certainly not the same. The challenges span various organizational and data facades. In many instances, the data scientists may be working in silos and connecting to the live data may not always be possible. But how does one guarantee their developed model in a silo is still relevant to live data? How can we manage the data flow and data access across the entire AI operationalization cycle? Watch on-demand to explore: - The journey and challenges of the Data Scientist - How Denodo data virtualization with data movement streamlines operationalization - The best practices and techniques when dealing with siloed data - How customers have used data virtualization in their data science initiatives

ADV Slides: How to Improve Your Analytic Data Architecture Maturity

DATAVERSITY

Many organizations are immature when it comes to data use. The answer lies in delivering a greater level of insight from data, straight to the point of need. Enter: machine learning. In this webinar, William will look at categories of organizational response to the challenge across strategy, architecture, modeling, processes, and ethics. Machine learning maturity levels tend to move in harmony across these categories. As a general principle of maturity models, you can’t skip levels in any category, nor can you advance in one category well beyond the others. Vis-à-vis ML, attaining and retaining momentum up the model is paramount for success. You will ascend the model through concerted efforts delivering business wins utilizing progressive elements of the model, and thereby increasing your machine learning maturity. The model will evolve. No plateaus are comfortable for long. With ML maturity markers, sequencing, and tactics, this webinar provides a plan for how to build analytic Data Architecture maturity in your organization.

Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...

Denodo

Watch full webinar here: https://bit.ly/3lSwLyU En la era de la explosión de la información repartida en distintas fuentes, el gobierno de datos es un componente clave para garantizar la disponibilidad, usabilidad, integridad y seguridad de la información. Asimismo, el conjunto de procesos, roles y políticas que define permite que las organizaciones alcancen sus objetivos asegurando el uso eficiente de sus datos. La virtualización de datos forma parte de las herramientas estratégica para implementar y optimizar el gobierno de datos. Esta tecnología permite a las empresas crear una visión 360º de sus datos y establecer controles de seguridad y políticas de acceso sobre toda la infraestructura, independientemente del formato o de su ubicación. De ese modo, reúne múltiples fuentes de datos, las hace accesibles desde una sola capa y proporciona capacidades de trazabilidad para supervisar los cambios en los datos. Le invitamos a participar en este webinar para aprender: - Cómo acelerar la integración de datos provenientes de fuentes de datos fragmentados en los sistemas internos y externos y obtener una vista integral de la información. - Cómo activar en toda la empresa una sola capa de acceso a los datos con medidas de protección. - Cómo la virtualización de datos proporciona los pilares para cumplir con las normativas actuales de protección de datos mediante auditoría, catálogo y seguridad de datos.

Introduction to BigData

Abdelkader OUARED

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...

DATAVERSITY

Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.

Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...

Matt Stubbs

Date: 14th November 2018 Location: Governance and MDM Theatre Time: 10:30 - 11:00 Speaker: Mike Ferguson Organisation: IBS About: For most organisations today, data complexity has increased rapidly. In the area of operations, we now have cloud and on-premises OLTP systems with customers, partners and suppliers accessing these applications via APIs and mobile apps. In the area of analytics, we now have data warehouse, data marts, big data Hadoop systems, NoSQL databases, streaming data platforms, cloud storage, cloud data warehouses, and IoT-generated data being created at the edge. Also, the number of data sources is exploding as companies ingest more and more external data such as weather and open government data. Silos have also appeared everywhere as business users are buying in self-service data preparation tools without consideration for how these tools integrate with what IT is using to integrate data. Yet new regulations are demanding that we do a better job of governing data, and business executives are demanding more agility to remain competitive in a digital economy. So how can companies remain agile, reduce cost and reduce the time-to-value when data complexity is on the up? In this session, Mike will discuss how companies can create an information supply chain to manufacture business-ready data and analytics to reduce time to value and improve agility while also getting data under control.

Introduction to Data Science (Data Summit, 2017)

Caserta

At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.

Keyrus US InformationDevon Ziegenfuss

Keyrus US InformationJulian Tong

Why Your Data Science Architecture Should Include a Data Virtualization Tool ...

Denodo

Watch full webinar here: https://bit.ly/35FUn32 Presented at CDAO New Zealand Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists. However, most architecture laid out to enable data scientists miss two key challenges: - Data scientists spend most of their time looking for the right data and massaging it into a usable format - Results and algorithms created by data scientists often stay out of the reach of regular data analysts and business users Watch this session on-demand to understand how data virtualization offers an alternative to address these issues and can accelerate data acquisition and massaging. And a customer story on the use of Machine Learning with data virtualization.

¿En qué se parece el Gobierno del Dato a un parque de atracciones?

Denodo

Watch full webinar here: https://bit.ly/3Ab9gYq Imagina llegar a un parque de atracciones con tu familia y comenzar tu día sin el típico plano que te permitirá planificarte para saber qué espectáculos ver, a qué atracciones ir, donde pueden o no pueden montar los niños… Posiblemente, no podrás sacar el máximo partido a tu día y te habrás perdido muchas cosas. Hay personas que les gusta ir a la aventura e ir descubriendo poco a poco, pero cuando hablamos de negocios, ir a la aventura puede ser fatídico... En la era de la explosión de la información repartida en distintas fuentes, el gobierno de datos es clave para garantizar la disponibilidad, usabilidad, integridad y seguridad de esa información. Asimismo, el conjunto de procesos, roles y políticas que define permite que las organizaciones alcancen sus objetivos asegurando el uso eficiente de sus datos. La virtualización de datos, herramienta estratégica para implementar y optimizar el gobierno del dato, permite a las empresas crear una visión 360º de sus datos y establecer controles de seguridad y políticas de acceso sobre toda la infraestructura, independientemente del formato o de su ubicación. De ese modo, reúne múltiples fuentes de datos, las hace accesibles desde una sola capa y proporciona capacidades de trazabilidad para supervisar los cambios en los datos. En este webinar aprenderás a: - Acelerar la integración de datos provenientes de fuentes de datos fragmentados en los sistemas internos y externos y obtener una vista integral de la información. - Activar en toda la empresa una sola capa de acceso a los datos con medidas de protección. - Cómo la virtualización de datos proporciona los pilares para cumplir con las normativas actuales de protección de datos mediante auditoría, catálogo y seguridad de datos.

Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016

Caserta

Big Data Analyst at BankofAmerica

Ganaparthi Jagadeesh

I am an accomplished certified Data Science professional with 8 + years of experience, looking for Data Scientist/Data Analyst/Data Engineer Position in a reputed organization. I have played strategic role in driving business solution and business growth through innovation and thought leadership in analytics technology/product domain. With an avid intellectual curiosity, and the ability to mine hidden gems located within large sets of structured, semi-structured and unstructured data. Able to leverage a heavy dose of mathematics and applied statistics with visualization and a healthy sense of exploration. Delivers efficient and reliable IT solutions and excels in building/leading teams in high-pressure environments. Have experience on Big Data Hadoop, Data Science, Data Mining, Business Intelligence & Analytics, Database Architecture and Incident Management area. In the recent past lot of my work has been in the Predictive & Prescriptive Analytics arena

How Data Virtualization Puts Machine Learning into Production (APAC)

Denodo

Watch full webinar here: https://bit.ly/3mJJ4w9 Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way. Attend this session to learn how companies can use data virtualization to: - Create a logical architecture to make all enterprise data available for advanced analytics exercise - Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice - Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc

Data Con LA 2022 - Self-Service Success and Data Products

Data Con LA

Chirag Katbamna, Senior Manager, Accenture We have grown past the traditional reporting off of centralized EDW data store. Each department wants to be empowered to do their own analytics, but this creates new challenges in areas of governance, security, access and monitoring. How do we do this the right way? - Explore the concept of Data Mesh - Explore what is Data Product - Explore how to implement successful Self-Service - Pushing the limits of new capabilities

Key Considerations While Rolling Out Denodo Platform

Denodo

Watch full webinar here: https://bit.ly/3zaPGLO Our approach for data virtualization advisory takes the following 3 dimensions/areas into consideration: - Technology / Architecture - Business User Groups (your clients) - IT Organization To deliver quick results, Q-PERIOR uses a multitude of accelerators in predefined topics within these three dimensions. In our presentation we will elaborate on client examples why such an exercise makes sense before rolling out Denodo and what kind of risks you can avoid doing so.

Similar to Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to Fail your Data Lab Implementation" (20)

Nadine Schöne, Dataiku. The Complete Data Value Chain in a Nutshell

Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...

Building the Artificially Intelligent Enterprise

Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...

Data Science Operationalization: The Journey of Enterprise AI

ADV Slides: How to Improve Your Analytic Data Architecture Maturity

Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...

Introduction to BigData

ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...

Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...

Introduction to Data Science (Data Summit, 2017)

Keyrus US Information

Why Your Data Science Architecture Should Include a Data Virtualization Tool ...

¿En qué se parece el Gobierno del Dato a un parque de atracciones?

Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016

Big Data Analyst at BankofAmerica

How Data Virtualization Puts Machine Learning into Production (APAC)

Data Con LA 2022 - Self-Service Success and Data Products

Key Considerations While Rolling Out Denodo Platform

More from Dataconomy Media

Data Natives Paris v 10.0 | "Blockchain in Healthcare" - Lea Dias & David An...

Dataconomy Media

Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...

Dataconomy Media

The challenges of increasing complexity of organizations, companies and projects are obvious and omnipresent. Everywhere there are connections and dependencies that are often not adequately managed or not considered at all because of a lack of technology or expertise to uncover and leverage the relationships in data and information. In his presentation, Axel Morgner talks about graph technology and knowledge graphs as indispensable building blocks for successful companies.

Data Natives Frankfurt v 11.0 | "Can we be responsible for misuse of data & a...

Dataconomy Media

Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...

Dataconomy Media

Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...

Dataconomy Media

Compliance departments within banks and other financial institutions are turning to machine learning for improving their Anti Money Laundering compliance activities. Today, the systems that aim to detect potentially suspicious activity are commonly rule-based, and suffer from ultra-high false positive rates. DataRobot will discuss how their Automated Machine Learning platform was successfully used for a real use case to reduce their false positives and to enhance their Anti-Money Laundering activities.

Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...

Dataconomy Media

Trump, Brexit, Cambridge Analytica... In the last few years, we have had to confront the consequences of the use and misuse of data science algorithms in manipulating public opinion through social media. The use of private data to microtarget individuals is a daily practice (and a trillion-dollar industry), which has serious side-effects when the selling product is your political ideology. How can we cope with this new scenario?

Data Natives Vienna v 7.0 | "Building Kubernetes Operators with KUDO for Dat...

Dataconomy Media

Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...

Dataconomy Media

Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...

Dataconomy Media

What does it take to build a good data product or service? Data practitioners always think about the technology, user experience and commercial viability. But rarely do they think about the implications of the systems they build. This talk will shed light on the impact of AI systems and the unintended consequences of the use of data in different products. It will also discuss our role, as data practitioners, in planting the seeds of fairness in the systems we build.

Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...

Dataconomy Media

Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...

Dataconomy Media

Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!

Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...

Dataconomy Media

Data Natives Hamburg v 6.0 | "Interpersonal behavior: observing Alex to under...

Dataconomy Media

Data Natives Hamburg v 6.0 | "About Surfing, Failing & Scaling" - Florian Sch...

Dataconomy Media

Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...

Dataconomy Media

Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...

Dataconomy Media

Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...

Dataconomy Media

Creativity is the mental ability to create new ideas and designs. Innovation, on the other hand, Means developing useful solutions from new ideas. Creativity can be goal-oriented, Whereas innovation is always goal-oriented. This bedeutet, dass innovation aims to achieve defined goals. The use of cloud services and technologies promises enterprise users many benefits in terms of more flexible use of IT resources and faster access to innovative solutions. That’s why we want to examine the question in this talk, of what role cloud computing plays for innovation in companies.

Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...

Dataconomy Media

Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...

Dataconomy Media

"With most machine learning (ML) and deep learning (DL) frameworks, it can take hours to move data for ETL, and hours to train models. It's also hard to scale, with data sets increasingly being larger than the capacity of any single server. The amount of the data also makes it hard to incrementally test and retrain models in near real-time. Learn how Apache Ignite and GridGain help to address limitations like ETL costs, scaling issues and Time-To-Market for the new models and help achieve near-real-time, continuous learning. Yuriy Babak, the head of ML/DL framework development at GridGain and Apache Ignite committer, will explain how ML/DL work with Apache Ignite, and how to get started. Topics include: — Overview of distributed ML/DL including architecture, implementation, usage patterns, pros and cons — Overview of Apache Ignite ML/DL, including built-in ML/DL algorithms, and how to implement your own — Model inference with Apache Ignite, including how to train models with other libraries, like Apache Spark, and deploy them in Ignite — How Apache Ignite and TensorFlow can be used together to build distributed DL model training and inference"

Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...

Dataconomy Media

"Machine learning algorithms require significant amounts of training data which has been centralized on one machine or in a datacenter so far. For numerous applications, such need of collecting data can be extremely privacy-invasive. Recent advancements in AI research approach this issue by a new paradigm of training AI models, i.e., Federated Learning. In federated learning, edge devices (phones, computers, cars etc.) collaboratively learn a shared AI model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. From personal data perspective, this paradigm enables a way of training a model on the device without directly inspecting users’ data on a server. This talk will pinpoint several examples of AI applications benefiting from federated learning and the likely future of privacy-aware systems."

More from Dataconomy Media (20)