Anne-Sophie Roessler, International Business Developer at Dataiku presented "3 ways to Fail your Data Lab Implementation" as part of the Big Data, Berlin v 8.0 meetup organised on the 14th of July 2016 at the WeWork headquarters.
Gianluigi Vigano, Senior Architect and Fouad Teban, Regional Presales Manager...Dataconomy Media
Gianluigi Vigano, Senior Architect and Fouad Teban, Regional Presales Manager at HPE, presented "Using advanced analytics functions of HPE Vertica for the following use cases: IoT, clickstream, machine data, integration with Hadoop & Kafka …" as part of the Big Data, Budapest v 3.0 meetup organised on the 19th of May 2016 at Skyscanner's headquarters.
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...Dataconomy Media
Zsolt Várnai, Principal Software Engineer at Skyscanner, presented "The advantages of real-time monitoring in apps development" as part of the Big Data, Budapest v 3.0 meetup organised on the 19th of May 2016 at Skyscanner's headquarters.
"Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & ...Dataconomy Media
"Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & Co-Founder of Pivii Technologies
Watch videos from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
https://www.youtube.com/c/DataNatives
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Vladislav is an entrepreneur, machine learning enthusiast, and DevOps geek. Currently, he is co-founding a startup, running a data engineering consulting business, traveling and writing on data-related topics.
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Dataiku
In our 3rd applied machine learning online course, we'll dive into different methods for data preparation, including handling missing values, dummification and rescaling.
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls
Gianluigi Vigano, Senior Architect and Fouad Teban, Regional Presales Manager...Dataconomy Media
Gianluigi Vigano, Senior Architect and Fouad Teban, Regional Presales Manager at HPE, presented "Using advanced analytics functions of HPE Vertica for the following use cases: IoT, clickstream, machine data, integration with Hadoop & Kafka …" as part of the Big Data, Budapest v 3.0 meetup organised on the 19th of May 2016 at Skyscanner's headquarters.
Zsolt Várnai, Principal Software Engineer at Skyscanner - "The advantages of...Dataconomy Media
Zsolt Várnai, Principal Software Engineer at Skyscanner, presented "The advantages of real-time monitoring in apps development" as part of the Big Data, Budapest v 3.0 meetup organised on the 19th of May 2016 at Skyscanner's headquarters.
"Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & ...Dataconomy Media
"Data Pipelines for Small, Messy and Tedious Data", Vladislav Supalov, CAO & Co-Founder of Pivii Technologies
Watch videos from Data Natives Berlin 2016 here: http://bit.ly/2fE1sEo
Visit the conference website to learn more: www.datanatives.io
Follow Data Natives:
https://www.facebook.com/DataNatives
https://twitter.com/DataNativesConf
https://www.youtube.com/c/DataNatives
Stay Connected to Data Natives by Email: Subscribe to our newsletter to get the news first about Data Natives 2017: http://bit.ly/1WMJAqS
About the Author:
Vladislav is an entrepreneur, machine learning enthusiast, and DevOps geek. Currently, he is co-founding a startup, running a data engineering consulting business, traveling and writing on data-related topics.
Applied Data Science Part 3: Getting dirty; data preparation and feature crea...Dataiku
In our 3rd applied machine learning online course, we'll dive into different methods for data preparation, including handling missing values, dummification and rescaling.
Counting Unique Users in Real-Time: Here's a Challenge for You!DataWorks Summit
Finding the number of unique users out of 10 billion events per day is challenging. At this session, we're going to describe how re-architecting our data infrastructure, relying on Druid and ThetaSketch, enables our customers to obtain these insights in real-time.
To put things into context, at NMC (Nielsen Marketing Cloud) we provide our customers (marketers and publishers) real-time analytics tools to profile their target audiences. Specifically, we provide them with the ability to see the number of unique users who meet a given criterion.
Historically, we have used Elasticsearch to answer these types of questions, however, we have encountered major scaling and stability issues.
In this presentation we will detail the journey of rebuilding our data infrastructure, including researching, benchmarking and productionizing a new technology, Druid, with ThetaSketch, to overcome the limitations we were facing.
We will also provide guidelines and best practices with regards to Druid.
Topics include :
* The need and possible solutions
* Intro to Druid and ThetaSketch
* How we use Druid
* Guidelines and pitfalls
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...Databricks
Roularta is a leading publishing company in Belgium. As digital news and channels move at a rapid pace and contain massive volumes of data, Roularta decided in 2019 to invest in a Spark-based data platform to drive true real-time website analytics and unlock insights on previously untouched (big) data sources. In this talk we’ll first explain why and how Roularta embarked from a classical data warehouse to a Spark-based Lakehouse using Delta. We’ll outline the series of publishing & marketing use-cases done in the last 12 months and highlight for each use-case the advantages of Spark and how the team further tuned performance to truly deliver insights with high velocity.
The Virtualization of Clouds - The New Enterprise Data Architecture OpportunityDenodo
Watch full webinar here: https://bit.ly/3x7xVuR
Organizations worldwide are adopting a variety of the public cloud service providers (i.e. AWS, Google, Microsoft) and each have a portfolio of storage, compute, network, and security options. All of which create significant challenges in managing a hybrid and multi-cloud enterprise architecture. Even worse is the impact to the governance and integration of data from the clouds and physical infrastructure to support the broad array of analytics and operational requirements.
Can one public cloud provider meet all your needs today and in the future? How do you manage across multiple public and private clouds you have today and where your data exists? And, how would you manage and operate your multi-cloud and on-premises systems to gain value from your data in any of them? The Chief Research Officer at Ventana Research, Mark Smith, will expound the challenges and path ahead for virtualization and integration of your data and the clouds, setting an architectural path for best success.
Data Science Day New York: Data Science: A Personal HistoryCloudera, Inc.
Understand the path Jeff Hammerbacher from Facebook and building scalable systems on Hadoop to Co-founding Cloudera and building an organization that provides the leading Hadoop platform.
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML MeetupRomain Yon
Original event: https://www.meetup.com/NYC-Machine-Learning/events/256605862/
--
"Doing large scale ML in production is hard" – Everyone who's tried
This talk is focussed on ML Systems. Especially the less obvious pitfalls, which have caused us troubles at Spotify.
This talk assumes a certain level of familiarity with ML: You'll get the most out of if you've some experience with applied ML, ideally on production systems.
Romain Yon is a Staff ML Engineer at Spotify. Over the years, Romain has worked on many of the core ML systems that power Spotify today (Music Recommendation, Catalog Quality, Search Ranking, Ads, ..).
During the past year, Romain has been mostly focusing on designing reusable ML Infrastructure that can be leveraged throughout Spotify.
Prior to Spotify, Romain co-founded the startup https://linkurio.us while getting his MSc in ML from Georgia Tech.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
AzureDay North 2016. Conference about cloud solutions.
What is Analytics? What is Big Data? Why Big Data we have in the cloud. What offer Microsoft for Big Data Analytics. How to start with Big Data Analytics or Advanced Analytics? Session introduce fundamentals for Big Data and Advanced Analytics.
By Data Scientist as a Service
Building a Distributed Collaborative Data Pipeline with Apache SparkDatabricks
The year of COVID-19 pandemic has spotlighted as never before the many shortcomings of the world’s data management workflows. The lack of established ways to exchange and access data was a highly recognized contributing factor in our poor response to the pandemic. On multiple occasions we have witnessed how our poor practices around reproducibility and provenance have completely sidetracked major vaccine research efforts, prompting many calls for action from scientific and medical communities to address these problems.
With Enterprise data growing rapidly year over year, traditional analytics approaches have proven to be expensive and unyielding. The result is that a growing proportion of our data is unused “dark data”. How can we create the basis for a data driven organization? Enter the "perfect storm" of cloud data analytics tools and approaches.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Analytical Systems Evolution: From Excel to Big Data Platforms and Data LakesProvectus
Maxim Tereschenko (BigData Lead, Provectus) with the talk "Analytical Systems Evolution - From Excel to Big Data Platforms and Data Lakes".
Description: For the last ten years, analytical systems have changed dramatically. From Excel and Data Warehouses, we came to Big Data platforms and Data Lakes. It is no longer fantasy to communicate with the analytical system by voice or to wander in 3D glasses among the visualizations of the data. In scope of the speech, I want to follow this evolution, identify its main trends and fantasize about the future.
How OpenTable uses Big Data to impact growth by Raman MaryaData Con LA
Abstract:- We have created variety of Analytics Solutions combining data from our Data Lake with Traditional DW. Data API's which are fed into product for improving conversions, Churn prediction alogrithm to help account managers focus on high risk customers, using analytics as an edge to empower sales team to win prospective customers.
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
In the second part of our applied machine learning online course, you'll get an overview of the different steps in the data science workflow as well as a deep dive in 3 basic types of models: linear, tree-based and clustering.
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellIT Arena
Dr. Nadine Schöne is a Senior Solutions Architect at Dataiku in Berlin. In this role, she deals with all aspects of the data value chain for all users – including integration of data sources, ETL, cooperation, statistics, modelling, but also operationalization, monitoring, automatization and security during production. She regularly talks at conferences, holds webinars and writes articles.
Speech Overview:
How can you get the most out of your data – while staying flexible in your choice of infrastructure and without having to integrate a multitude of tools for the different personas involved? Maximizing the value you get out of your data is a necessity today. Looking at the whole picture as well as careful planning are the key for success. We will have a look at the complete data value chain from end to end: from the data stores, collaboration features, data preparation, visualization and automation capabilities, and external compute to scheduling, operationalization, monitoring and security.
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Precisely
The advanced analytics and AI that run today’s businesses rely on a larger volume, and greater variety, of data. This data needs to be of the highest quality to ensure the best possible outcomes, but traditional data quality tools weren’t designed for today’s modern data environments.
That’s why we’ve developed Trillium DQ for Big Data -- an integrated product that delivers industry-leading data profiling and data quality at scale, in the cloud or on premises.
In this on-demand webcast, you will learn how Trillium DQ:
• Empowers data analysts to easily profile large, diverse data sources to discover new insights, uncover issues, and report on their findings – all without involving IT.
• Delivers best-in-class entity resolution to support mission-critical applications such as Customer 360, fraud detection, AML, and predictive analytics.
• Supports Cloud and hybrid architectures by providing consistent high-performance processing within critical time windows on all platforms.
• Keeps enterprise data lakes validated, clean, and trusted with the highest quality data – without technical expertise in big data or distributed architectures.
• Enables data quality monitoring based on targeted business rules for data governance and business insight
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...Databricks
Roularta is a leading publishing company in Belgium. As digital news and channels move at a rapid pace and contain massive volumes of data, Roularta decided in 2019 to invest in a Spark-based data platform to drive true real-time website analytics and unlock insights on previously untouched (big) data sources. In this talk we’ll first explain why and how Roularta embarked from a classical data warehouse to a Spark-based Lakehouse using Delta. We’ll outline the series of publishing & marketing use-cases done in the last 12 months and highlight for each use-case the advantages of Spark and how the team further tuned performance to truly deliver insights with high velocity.
The Virtualization of Clouds - The New Enterprise Data Architecture OpportunityDenodo
Watch full webinar here: https://bit.ly/3x7xVuR
Organizations worldwide are adopting a variety of the public cloud service providers (i.e. AWS, Google, Microsoft) and each have a portfolio of storage, compute, network, and security options. All of which create significant challenges in managing a hybrid and multi-cloud enterprise architecture. Even worse is the impact to the governance and integration of data from the clouds and physical infrastructure to support the broad array of analytics and operational requirements.
Can one public cloud provider meet all your needs today and in the future? How do you manage across multiple public and private clouds you have today and where your data exists? And, how would you manage and operate your multi-cloud and on-premises systems to gain value from your data in any of them? The Chief Research Officer at Ventana Research, Mark Smith, will expound the challenges and path ahead for virtualization and integration of your data and the clouds, setting an architectural path for best success.
Data Science Day New York: Data Science: A Personal HistoryCloudera, Inc.
Understand the path Jeff Hammerbacher from Facebook and building scalable systems on Hadoop to Co-founding Cloudera and building an organization that provides the leading Hadoop platform.
ML Infra @ Spotify: Lessons Learned - Romain Yon - NYC ML MeetupRomain Yon
Original event: https://www.meetup.com/NYC-Machine-Learning/events/256605862/
--
"Doing large scale ML in production is hard" – Everyone who's tried
This talk is focussed on ML Systems. Especially the less obvious pitfalls, which have caused us troubles at Spotify.
This talk assumes a certain level of familiarity with ML: You'll get the most out of if you've some experience with applied ML, ideally on production systems.
Romain Yon is a Staff ML Engineer at Spotify. Over the years, Romain has worked on many of the core ML systems that power Spotify today (Music Recommendation, Catalog Quality, Search Ranking, Ads, ..).
During the past year, Romain has been mostly focusing on designing reusable ML Infrastructure that can be leveraged throughout Spotify.
Prior to Spotify, Romain co-founded the startup https://linkurio.us while getting his MSc in ML from Georgia Tech.
AzureDay - Introduction Big Data Analytics.Łukasz Grala
AzureDay North 2016. Conference about cloud solutions.
What is Analytics? What is Big Data? Why Big Data we have in the cloud. What offer Microsoft for Big Data Analytics. How to start with Big Data Analytics or Advanced Analytics? Session introduce fundamentals for Big Data and Advanced Analytics.
By Data Scientist as a Service
Building a Distributed Collaborative Data Pipeline with Apache SparkDatabricks
The year of COVID-19 pandemic has spotlighted as never before the many shortcomings of the world’s data management workflows. The lack of established ways to exchange and access data was a highly recognized contributing factor in our poor response to the pandemic. On multiple occasions we have witnessed how our poor practices around reproducibility and provenance have completely sidetracked major vaccine research efforts, prompting many calls for action from scientific and medical communities to address these problems.
With Enterprise data growing rapidly year over year, traditional analytics approaches have proven to be expensive and unyielding. The result is that a growing proportion of our data is unused “dark data”. How can we create the basis for a data driven organization? Enter the "perfect storm" of cloud data analytics tools and approaches.
Data Mesh in Practice: How Europe’s Leading Online Platform for Fashion Goes ...Databricks
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
Analytical Systems Evolution: From Excel to Big Data Platforms and Data LakesProvectus
Maxim Tereschenko (BigData Lead, Provectus) with the talk "Analytical Systems Evolution - From Excel to Big Data Platforms and Data Lakes".
Description: For the last ten years, analytical systems have changed dramatically. From Excel and Data Warehouses, we came to Big Data platforms and Data Lakes. It is no longer fantasy to communicate with the analytical system by voice or to wander in 3D glasses among the visualizations of the data. In scope of the speech, I want to follow this evolution, identify its main trends and fantasize about the future.
How OpenTable uses Big Data to impact growth by Raman MaryaData Con LA
Abstract:- We have created variety of Analytics Solutions combining data from our Data Lake with Traditional DW. Data API's which are fed into product for improving conversions, Churn prediction alogrithm to help account managers focus on high risk customers, using analytics as an edge to empower sales team to win prospective customers.
Applied Data Science Course Part 2: the data science workflow and basic model...Dataiku
In the second part of our applied machine learning online course, you'll get an overview of the different steps in the data science workflow as well as a deep dive in 3 basic types of models: linear, tree-based and clustering.
Nadine Schöne, Dataiku. The Complete Data Value Chain in a NutshellIT Arena
Dr. Nadine Schöne is a Senior Solutions Architect at Dataiku in Berlin. In this role, she deals with all aspects of the data value chain for all users – including integration of data sources, ETL, cooperation, statistics, modelling, but also operationalization, monitoring, automatization and security during production. She regularly talks at conferences, holds webinars and writes articles.
Speech Overview:
How can you get the most out of your data – while staying flexible in your choice of infrastructure and without having to integrate a multitude of tools for the different personas involved? Maximizing the value you get out of your data is a necessity today. Looking at the whole picture as well as careful planning are the key for success. We will have a look at the complete data value chain from end to end: from the data stores, collaboration features, data preparation, visualization and automation capabilities, and external compute to scheduling, operationalization, monitoring and security.
Introducing Trillium DQ for Big Data: Powerful Profiling and Data Quality for...Precisely
The advanced analytics and AI that run today’s businesses rely on a larger volume, and greater variety, of data. This data needs to be of the highest quality to ensure the best possible outcomes, but traditional data quality tools weren’t designed for today’s modern data environments.
That’s why we’ve developed Trillium DQ for Big Data -- an integrated product that delivers industry-leading data profiling and data quality at scale, in the cloud or on premises.
In this on-demand webcast, you will learn how Trillium DQ:
• Empowers data analysts to easily profile large, diverse data sources to discover new insights, uncover issues, and report on their findings – all without involving IT.
• Delivers best-in-class entity resolution to support mission-critical applications such as Customer 360, fraud detection, AML, and predictive analytics.
• Supports Cloud and hybrid architectures by providing consistent high-performance processing within critical time windows on all platforms.
• Keeps enterprise data lakes validated, clean, and trusted with the highest quality data – without technical expertise in big data or distributed architectures.
• Enables data quality monitoring based on targeted business rules for data governance and business insight
Building the Artificially Intelligent EnterpriseDatabricks
This session looks at where we are today with data and analytics and what is needed to transition to the Artificially Intelligent Enterprise.
How do you mobilise developers to exploit what data scientists and business analysts have built? How do you align it all with business strategy to maximise business outcomes? How do you combine BI, predictive and prescriptive analytics, automation and reinforcement learning to get maximum value across the enterprise? What is the blueprint for building the artificially intelligent enterprise?
•Data and analytics – Where are we?
•Why is the journey only half-way done?
•2021 and beyond – The new era of AI usage and not just build
•The requirement – event-driven, on-demand and automated analytics
•Operationalising what you build – DataOps, MLOps and RPA
•Mobilising the masses to integrate AI into processes – what needs to be done?
•Business strategy alignment – the guiding light to AI utilisation for high reward
•Agility step change – the shift to no-code integration of AI by citizen developers
•Recording decisions, and analysing business impact
•Reinforcement-learning – transitioning to continuous reward
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Denodo
Watch full webinar here: https://bit.ly/3xj6fnm
Presented at Chief Data Officer Live 2021 A/NZ
The world is changing faster than ever. And for companies to compete and succeed they need to be agile in order to respond quickly to market changes and emerging opportunities. Data plays an integral role in achieving this business agility. However, given the complex nature of the enterprise data architecture finding and analysing data is an increasingly challenging task. Data virtualization is a modern data integration technique that integrates data in real-time, without having to physically replicate it.
Watch on-demand this session to understand what data virtualization is and how it:
- Delivers data in real-time, and without replication
- Creates a logical architecture to provide a single view of truth
- Centralises the data governance and security framework
- Democratises data for faster decision making and business agility
Data Science Operationalization: The Journey of Enterprise AIDenodo
Watch full webinar here: https://bit.ly/3kVmYJl
As we move into a world driven by AI initiatives, we find ourselves facing new and diverse challenges when it comes to operationalization. Creating a solution and putting it into practice, is certainly not the same. The challenges span various organizational and data facades. In many instances, the data scientists may be working in silos and connecting to the live data may not always be possible. But how does one guarantee their developed model in a silo is still relevant to live data? How can we manage the data flow and data access across the entire AI operationalization cycle?
Watch on-demand to explore:
- The journey and challenges of the Data Scientist
- How Denodo data virtualization with data movement streamlines operationalization
- The best practices and techniques when dealing with siloed data
- How customers have used data virtualization in their data science initiatives
ADV Slides: How to Improve Your Analytic Data Architecture MaturityDATAVERSITY
Many organizations are immature when it comes to data use. The answer lies in delivering a greater level of insight from data, straight to the point of need. Enter: machine learning.
In this webinar, William will look at categories of organizational response to the challenge across strategy, architecture, modeling, processes, and ethics. Machine learning maturity levels tend to move in harmony across these categories. As a general principle of maturity models, you can’t skip levels in any category, nor can you advance in one category well beyond the others.
Vis-à-vis ML, attaining and retaining momentum up the model is paramount for success. You will ascend the model through concerted efforts delivering business wins utilizing progressive elements of the model, and thereby increasing your machine learning maturity. The model will evolve. No plateaus are comfortable for long.
With ML maturity markers, sequencing, and tactics, this webinar provides a plan for how to build analytic Data Architecture maturity in your organization.
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Denodo
Watch full webinar here: https://bit.ly/3lSwLyU
En la era de la explosión de la información repartida en distintas fuentes, el gobierno de datos es un componente clave para garantizar la disponibilidad, usabilidad, integridad y seguridad de la información. Asimismo, el conjunto de procesos, roles y políticas que define permite que las organizaciones alcancen sus objetivos asegurando el uso eficiente de sus datos.
La virtualización de datos forma parte de las herramientas estratégica para implementar y optimizar el gobierno de datos. Esta tecnología permite a las empresas crear una visión 360º de sus datos y establecer controles de seguridad y políticas de acceso sobre toda la infraestructura, independientemente del formato o de su ubicación. De ese modo, reúne múltiples fuentes de datos, las hace accesibles desde una sola capa y proporciona capacidades de trazabilidad para supervisar los cambios en los datos.
Le invitamos a participar en este webinar para aprender:
- Cómo acelerar la integración de datos provenientes de fuentes de datos fragmentados en los sistemas internos y externos y obtener una vista integral de la información.
- Cómo activar en toda la empresa una sola capa de acceso a los datos con medidas de protección.
- Cómo la virtualización de datos proporciona los pilares para cumplir con las normativas actuales de protección de datos mediante auditoría, catálogo y seguridad de datos.
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
Many data scientists are well grounded in creating accomplishment in the enterprise, but many come from outside – from academia, from PhD programs and research. They have the necessary technical skills, but it doesn’t count until their product gets to production and in use. The speaker recently helped a struggling data scientist understand his organization and how to create success in it. That turned into this presentation, because many new data scientists struggle with the complexities of an enterprise.
Big Data LDN 2018: DATA MANAGEMENT AUTOMATION AND THE INFORMATION SUPPLY CHAI...Matt Stubbs
Date: 14th November 2018
Location: Governance and MDM Theatre
Time: 10:30 - 11:00
Speaker: Mike Ferguson
Organisation: IBS
About: For most organisations today, data complexity has increased rapidly. In the area of operations, we now have cloud and on-premises OLTP systems with customers, partners and suppliers accessing these applications via APIs and mobile apps. In the area of analytics, we now have data warehouse, data marts, big data Hadoop systems, NoSQL databases, streaming data platforms, cloud storage, cloud data warehouses, and IoT-generated data being created at the edge. Also, the number of data sources is exploding as companies ingest more and more external data such as weather and open government data. Silos have also appeared everywhere as business users are buying in self-service data preparation tools without consideration for how these tools integrate with what IT is using to integrate data. Yet new regulations are demanding that we do a better job of governing data, and business executives are demanding more agility to remain competitive in a digital economy. So how can companies remain agile, reduce cost and reduce the time-to-value when data complexity is on the up?
In this session, Mike will discuss how companies can create an information supply chain to manufacture business-ready data and analytics to reduce time to value and improve agility while also getting data under control.
Introduction to Data Science (Data Summit, 2017)Caserta
At DBTA's 2017 Data Summit in New York, NY, Caserta Founder & President, Joe Caserta, and Senior Architect, Bill Walrond, gave a pre-conference workshop presenting the ins and outs of data science. Data scientist has been dubbed the "sexiest" job of the 21st century, but it requires an understanding of many different elements of data analysis. This presentation dives into the fundamentals of data exploration, mining, and preparation, applying the principles of statistical modeling and data visualization in real-world applications.
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Denodo
Watch full webinar here: https://bit.ly/35FUn32
Presented at CDAO New Zealand
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python, and Scala put advanced techniques at the fingertips of the data scientists.
However, most architecture laid out to enable data scientists miss two key challenges:
- Data scientists spend most of their time looking for the right data and massaging it into a usable format
- Results and algorithms created by data scientists often stay out of the reach of regular data analysts and business users
Watch this session on-demand to understand how data virtualization offers an alternative to address these issues and can accelerate data acquisition and massaging. And a customer story on the use of Machine Learning with data virtualization.
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
Watch full webinar here: https://bit.ly/3Ab9gYq
Imagina llegar a un parque de atracciones con tu familia y comenzar tu día sin el típico plano que te permitirá planificarte para saber qué espectáculos ver, a qué atracciones ir, donde pueden o no pueden montar los niños… Posiblemente, no podrás sacar el máximo partido a tu día y te habrás perdido muchas cosas. Hay personas que les gusta ir a la aventura e ir descubriendo poco a poco, pero cuando hablamos de negocios, ir a la aventura puede ser fatídico...
En la era de la explosión de la información repartida en distintas fuentes, el gobierno de datos es clave para garantizar la disponibilidad, usabilidad, integridad y seguridad de esa información. Asimismo, el conjunto de procesos, roles y políticas que define permite que las organizaciones alcancen sus objetivos asegurando el uso eficiente de sus datos.
La virtualización de datos, herramienta estratégica para implementar y optimizar el gobierno del dato, permite a las empresas crear una visión 360º de sus datos y establecer controles de seguridad y políticas de acceso sobre toda la infraestructura, independientemente del formato o de su ubicación. De ese modo, reúne múltiples fuentes de datos, las hace accesibles desde una sola capa y proporciona capacidades de trazabilidad para supervisar los cambios en los datos.
En este webinar aprenderás a:
- Acelerar la integración de datos provenientes de fuentes de datos fragmentados en los sistemas internos y externos y obtener una vista integral de la información.
- Activar en toda la empresa una sola capa de acceso a los datos con medidas de protección.
- Cómo la virtualización de datos proporciona los pilares para cumplir con las normativas actuales de protección de datos mediante auditoría, catálogo y seguridad de datos.
Building New Data Ecosystem for Customer Analytics, Strata + Hadoop World, 2016Caserta
Caserta Concepts Founder and President, Joe Caserta, gave this presentation at Strata + Hadoop World 2016 in New York, NY. His session covers path-to-purchase analytics using a data lake and spark.
For more information, visit http://casertaconcepts.com/
I am an accomplished certified Data Science professional with 8 + years of experience, looking for Data Scientist/Data Analyst/Data Engineer Position in a reputed organization. I have played strategic role in driving business solution and business growth through innovation and thought leadership in analytics technology/product domain. With an avid intellectual curiosity, and the ability to mine hidden gems located within large sets of structured, semi-structured and unstructured data. Able to leverage a heavy dose of mathematics and applied statistics with visualization and a healthy sense of exploration. Delivers efficient and reliable IT solutions and excels in building/leading teams in high-pressure environments. Have experience on Big Data Hadoop, Data Science, Data Mining, Business Intelligence & Analytics, Database Architecture and Incident Management area. In the recent past lot of my work has been in the Predictive & Prescriptive Analytics arena
How Data Virtualization Puts Machine Learning into Production (APAC)Denodo
Watch full webinar here: https://bit.ly/3mJJ4w9
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spend most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this session to learn how companies can use data virtualization to:
- Create a logical architecture to make all enterprise data available for advanced analytics exercise
- Accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
- Integrate popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc
Data Con LA 2022 - Self-Service Success and Data ProductsData Con LA
Chirag Katbamna, Senior Manager, Accenture
We have grown past the traditional reporting off of centralized EDW data store. Each department wants to be empowered to do their own analytics, but this creates new challenges in areas of governance, security, access and monitoring. How do we do this the right way? - Explore the concept of Data Mesh - Explore what is Data Product - Explore how to implement successful Self-Service - Pushing the limits of new capabilities
Key Considerations While Rolling Out Denodo PlatformDenodo
Watch full webinar here: https://bit.ly/3zaPGLO
Our approach for data virtualization advisory takes the following 3 dimensions/areas into consideration:
- Technology / Architecture
- Business User Groups (your clients)
- IT Organization
To deliver quick results, Q-PERIOR uses a multitude of accelerators in predefined topics within these three dimensions. In our presentation we will elaborate on client examples why such an exercise makes sense before rolling out Denodo and what kind of risks you can avoid doing so.
Similar to Anne-Sophie Roessler, International Business Developer, Dataiku - "3 ways to Fail your Data Lab Implementation" (20)
Data Natives Frankfurt v 11.0 | "Competitive advantages with knowledge graphs...Dataconomy Media
The challenges of increasing complexity of organizations, companies and projects are obvious and omnipresent. Everywhere there are connections and dependencies that are often not adequately managed or not considered at all because of a lack of technology or expertise to uncover and leverage the relationships in data and information. In his presentation, Axel Morgner talks about graph technology and knowledge graphs as indispensable building blocks for successful companies.
Data Natives Munich v 12.0 | "How to be more productive with Autonomous Data ...Dataconomy Media
Every day we are challenged with more data, more use cases and an ever increasing demand for analytics. In this talk Bjorn will explain how autonomous data management and machine learning help innovators to more productive and give examples how to deliver new data driven projects with less risk at lower costs.
Data Natives meets DataRobot | "Build and deploy an anti-money laundering mo...Dataconomy Media
Compliance departments within banks and other financial institutions are turning to machine learning for improving their Anti Money Laundering compliance activities. Today, the systems that aim to detect potentially suspicious activity are commonly rule-based, and suffer from ultra-high false positive rates. DataRobot will discuss how their Automated Machine Learning platform was successfully used for a real use case to reduce their false positives and to enhance their Anti-Money Laundering activities.
Data Natives Munich v 12.0 | "Political Data Science: A tale of Fake News, So...Dataconomy Media
Trump, Brexit, Cambridge Analytica... In the last few years, we have had to confront the consequences of the use and misuse of data science algorithms in manipulating public opinion through social media. The use of private data to microtarget individuals is a daily practice (and a trillion-dollar industry), which has serious side-effects when the selling product is your political ideology. How can we cope with this new scenario?
Data Natives Vienna v 7.0 | "The Ingredients of Data Innovation" - Robbert de...Dataconomy Media
When taking a deep dive into the world of data, one thing is certain: the ultimate goal is to create something new, something better, something faster. In other words, innovation should always be at the forefront of companies strategic outlook, whether their goal is to pioneer new processes, user experiences, products or services.
Data Natives Cologne v 4.0 | "The Data Lorax: Planting the Seeds of Fairness...Dataconomy Media
What does it take to build a good data product or service? Data practitioners always think about the technology, user experience and commercial viability. But rarely do they think about the implications of the systems they build. This talk will shed light on the impact of AI systems and the unintended consequences of the use of data in different products. It will also discuss our role, as data practitioners, in planting the seeds of fairness in the systems we build.
Data Natives Cologne v 4.0 | "How People Analytics Can Reveal the Hidden Aspe...Dataconomy Media
We all hear about the power of data, big data and data analysis in todays market place. But rarely feel it's touchable effects on our own business decisions and performance.
Let's dive into it and see how can people analytics increase people performance, motivation and business revenue?
Data Natives Amsterdam v 9.0 | "Ten Little Servers: A Story of no Downtime" -...Dataconomy Media
Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!
Data Natives Amsterdam v 9.0 | "Point in Time Labeling at Scale" - Timothy Th...Dataconomy Media
In the data industry, having correctly labelled datasets is vital. Timothy Thatcher explains how tagging your data while considering time and location and complex hierarchical rules at scale can be handled.
Data NativesBerlin v 20.0 | "Serving A/B experimentation platform end-to-end"...Dataconomy Media
During the lifetime of an A/B test product managers and analysts in GetYourGuide require various tools and different kinds of data to plan the trial properly, control it during the run and analyze the results at the end. This talk would be about the architecture, tools and data flow for serving their needs.
Data Natives Berlin v 20.0 | "Ten Little Servers: A Story of no Downtime" - A...Dataconomy Media
Cloud Infrastructure is a hostile environment: a power supply failure or a network outage leads to downtime and big losses. There is nothing we can trust: a single server, a server rack, even a whole datacenter can fail, and if an application is fragile by design, disruption is inevitable. We must distribute our application and diversify cloud data strategy to survive disturbances of any scale. Apache Cassandra is a cloud-native platform-agnostic database that stores data with a distributed redundancy so it easily survives any issue. What to know how Apple and Netflix handle petabytes of data, keeping it highly available? Join us and listen to a story of 10 little servers and no downtime!
Big Data Frankfurt meets Thinkport | "The Cloud as a Driver of Innovation" - ...Dataconomy Media
Creativity is the mental ability to create new ideas and designs. Innovation, on the other hand, Means developing useful solutions from new ideas. Creativity can be goal-oriented, Whereas innovation is always goal-oriented. This bedeutet, dass innovation aims to achieve defined goals. The use of cloud services and technologies promises enterprise users many benefits in terms of more flexible use of IT resources and faster access to innovative solutions. That’s why we want to examine the question in this talk, of what role cloud computing plays for innovation in companies.
Thinkport meets Frankfurt | "Financial Time Series Analysis using Wavelets" -...Dataconomy Media
Presentation of Time Series Properties of Financial Instrument and Possibilities in Frequency Decomposition and Information Extraction using FT, STFT and Wavelets with Outlook in Current Research on Wavelet Neural Networks
Big Data Helsinki v 3 | "Distributed Machine and Deep Learning at Scale with ...Dataconomy Media
"With most machine learning (ML) and deep learning (DL) frameworks, it can take hours to move data for ETL, and hours to train models. It's also hard to scale, with data sets increasingly being larger than the capacity of any single server. The amount of the data also makes it hard to incrementally test and retrain models in near real-time.
Learn how Apache Ignite and GridGain help to address limitations like ETL costs, scaling issues and Time-To-Market for the new models and help achieve near-real-time, continuous learning.
Yuriy Babak, the head of ML/DL framework development at GridGain and Apache Ignite committer, will explain how ML/DL work with Apache Ignite, and how to get started.
Topics include:
— Overview of distributed ML/DL including architecture, implementation, usage patterns, pros and cons
— Overview of Apache Ignite ML/DL, including built-in ML/DL algorithms, and how to implement your own
— Model inference with Apache Ignite, including how to train models with other libraries, like Apache Spark, and deploy them in Ignite
— How Apache Ignite and TensorFlow can be used together to build distributed DL model training and inference"
Big Data Helsinki v 3 | "Federated Learning and Privacy-preserving AI" - Oguz...Dataconomy Media
"Machine learning algorithms require significant amounts of training data which has been centralized on one machine or in a datacenter so far. For numerous applications, such need of collecting data can be extremely privacy-invasive. Recent advancements in AI research approach this issue by a new paradigm of training AI models, i.e., Federated Learning.
In federated learning, edge devices (phones, computers, cars etc.) collaboratively learn a shared AI model while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud. From personal data perspective, this paradigm enables a way of training a model on the device without directly inspecting users’ data on a server. This talk will pinpoint several examples of AI applications benefiting from federated learning and the likely future of privacy-aware systems."
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
4. Why a data Lab?
• 1 single Workflow : from a segmentated workflow to a transversal one
• Several use cases: Ability to adress many different data centric topics within a
single unit
• Multiple competences: Business focused approached mixing many different
competences
• End to end projects : combining data from different sources to handle several
aspects on a single topic
5. Deployment ofthe
predictions
Dataiku DSSfor fraud prediction
Client service
Sensor data
Garage data
Administration
• 1 Project Owner (IT)
• 1 Project Manager (Business)
• 1 Data scientist in house
• 3 data scientist sfrom 3 different firms
• 3 consultants from 3 different firms
• 1 architect (external)
Accepted file
INVESTIGATE !
Thetransactions areblocked
dependingontheir gap with the
business rules and behavioral
patterns
7. Focuson the framework,not on the input
Data
Acquisition &
Understanding
Data
Preparation
Model Creation
Evaluation Deployment
Scored
dataset
Scored
dataset
Iteration 1
Iteration 2
Iteration n
✓ Read and import raw data
✓ Detect schemas and structure
✓ Analyze distributions
✓ Assess quality: outliers,
missing values...
✓ Performance metrics
✓ Robustness & generalization
(cross validation)
✓ Insights (eg variable importance)
✓ Create derived and
aggregated variables
→ Analytical dataset
→ Report
✓ Feature selection
✓ Compare algorithms
✓ Scoring engine
✓ Publish predictions
✓ Monitor performance
✓ API
Business
Understanding
Adapted from the CRISP-DM methodology
Dataset
1
Dataset
2
Dataset
n
8. People and Governance
?
PolyglottVS dictator
Problems :
• Collaboration between
technical and non
technical profiles inside
a single project
• Nécessary
collaboration between
business and tech
teams to adress
transversal projects
accurately
Focus :
• Promote diversity
• …within a workflow
centric environment
9. End to end, from prototyping into production
Do it you way …
11. DataLab Organisation
Data Lab
Lab Environment
MultydisciplinaryTeam:
Direction/ Project Management
Business Analysts
Data Miners / Data Scientists
Production Environment
Business needs
Internal Data
sources
External
datasources
Missions :
Priorisationof the business needs
Prototyping /Agile solution engineering
Support for Apps deployment
Business Applications
Marketing CampaignAutomation
Reporting webanalytics
Data as A Service Platform
Conceptionof“DATAPRODUCTS”
Integration of DataProducts
OptimisationEngine
Real Time Scoring
Data Flow
Insights & Services
Processing chain
API Deployment