A talk given by Dr. Arif Wider (ThoughtWorks) and Sebastian Herold (Zalando) at OOP 2018 in Munich.
Abstract:
More and more companies migrate their monolithic applications to a microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: huge amounts of structured and unstructured data, and hundreds of data sources.
Furthermore, data-driven product development multiplies the analytics requirements: every product team needs constantly updated and specially tailored metrics which often combine product specific data with company wide data.
Having a centralized data team does not scale in this setting as it becomes the bottleneck between data producers and data consumers.
We created a Manifesto of seven principles which break with traditional separation of roles and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDev: a culture shift similar to DevOps in which application developers own their data and take over responsibilities for data & analytics.
Learn about our experiences and best practices with facilitating this cultural transformation at Scout24, the provider of Europe’s largest online markets for cars and real estate.
DataDevOps - A Manifesto on Shared Data Responsibility in Times of MicroservicesDr. Arif Wider
A talk by Sebastian Herold (Scout24) and Arif Wider (ThoughtWorks)
Abstract:
More and more companies successfully migrate their monolithic applications to a Microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: unstructured data, huge amounts of data, and hundreds of data sources. Having a centralized data team does not scale in this setting as it becomes the bottleneck between application developers and business analysts.
We created a Data Manifesto of seven principles which break with traditional role separations and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDevOps: a culture where application developers also own their data. Learn about the experiences we made with facilitating this cultural transformation at Scout24, the provider of Europe’s largest online markets for cars and real estate.
Hear from our executive team about: key developments in the industry focusing on data quality, internationalization, and business intelligence; recent accomplishments in Standards and key upcoming projects; IDW platform update including an outline of the next generation platform's new features, and the schedule for roll out; an introduction to the powerful features of the new IDX platform and transition schedule; and a summary of IDEA’s roadmap in the coming years.
What’s the problem? The data is in silos. Business and IT are both demanding a unified view of data to help provide solutions to today’s business challenges, but you can’t use the tools and technologies that created the problem to solve the problem. Enter the Multi-Model database. In this session John Biedebach introduces a trusted and secure approach to data integration using Multi-Model databases. The data we want to integrate has already been modeled, so we’ll discuss how to load information as-is into a Multi-Model database to leverage the models that already exist in the data. We’ll then apply our own models to our data in place to rapidly deliver answers to business questions while providing value from harmonized information directly to consumers. We’ll also discuss the characteristics of a Multi-Model database and the benefits of a Multi-Model approach, including:
How to get unified views across disparate data models and formats within a single database
The benefits of a single product vs multi-product Multi-Model approach to data integration
The importance of agility in data access and delivery through APIs, interfaces, and indexes
How to scale a multi-model database while still providing ACID capabilities and security
How to determine where a multi-model database fits in your existing architecture
During this Big Data Warehousing Meetup, we discussed how graph databases work, shared some real world use cases, and showed a live demo of the world’s leading graph database, Neo4J. Pitney Bowes demonstrated their new MDM product developed on a graph database.
For more information, check out the other slides from this meetup or visit our website at www.casertaconcepts.com
Collaborative Data UX Design - Virtually and Phyically Datentreiber
Many data products fail, partly because users do not understand or accept the software. To avoid this, analytics solutions e.g. KPI dashboards should be designed together with the users and this is especially true for the user interface.
At the Data Brain Meetup Datentreiber Martin Szugat showed three wireframing tools to sketch UI designs collaboratively with the users:
1) the virtual collaboration tool Miro,
2) the PowerPoint add-on PowerMockup and
3) the physical Dashboard Wireframing Kit.
Just a copy from http://www.isim.ac.in/Infovision%202012/presentations/sunilshirguppilinkedin.pdf for saving purpose. All right reserved by the above link.
DataDevOps - A Manifesto on Shared Data Responsibility in Times of MicroservicesDr. Arif Wider
A talk by Sebastian Herold (Scout24) and Arif Wider (ThoughtWorks)
Abstract:
More and more companies successfully migrate their monolithic applications to a Microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: unstructured data, huge amounts of data, and hundreds of data sources. Having a centralized data team does not scale in this setting as it becomes the bottleneck between application developers and business analysts.
We created a Data Manifesto of seven principles which break with traditional role separations and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDevOps: a culture where application developers also own their data. Learn about the experiences we made with facilitating this cultural transformation at Scout24, the provider of Europe’s largest online markets for cars and real estate.
Hear from our executive team about: key developments in the industry focusing on data quality, internationalization, and business intelligence; recent accomplishments in Standards and key upcoming projects; IDW platform update including an outline of the next generation platform's new features, and the schedule for roll out; an introduction to the powerful features of the new IDX platform and transition schedule; and a summary of IDEA’s roadmap in the coming years.
What’s the problem? The data is in silos. Business and IT are both demanding a unified view of data to help provide solutions to today’s business challenges, but you can’t use the tools and technologies that created the problem to solve the problem. Enter the Multi-Model database. In this session John Biedebach introduces a trusted and secure approach to data integration using Multi-Model databases. The data we want to integrate has already been modeled, so we’ll discuss how to load information as-is into a Multi-Model database to leverage the models that already exist in the data. We’ll then apply our own models to our data in place to rapidly deliver answers to business questions while providing value from harmonized information directly to consumers. We’ll also discuss the characteristics of a Multi-Model database and the benefits of a Multi-Model approach, including:
How to get unified views across disparate data models and formats within a single database
The benefits of a single product vs multi-product Multi-Model approach to data integration
The importance of agility in data access and delivery through APIs, interfaces, and indexes
How to scale a multi-model database while still providing ACID capabilities and security
How to determine where a multi-model database fits in your existing architecture
During this Big Data Warehousing Meetup, we discussed how graph databases work, shared some real world use cases, and showed a live demo of the world’s leading graph database, Neo4J. Pitney Bowes demonstrated their new MDM product developed on a graph database.
For more information, check out the other slides from this meetup or visit our website at www.casertaconcepts.com
Collaborative Data UX Design - Virtually and Phyically Datentreiber
Many data products fail, partly because users do not understand or accept the software. To avoid this, analytics solutions e.g. KPI dashboards should be designed together with the users and this is especially true for the user interface.
At the Data Brain Meetup Datentreiber Martin Szugat showed three wireframing tools to sketch UI designs collaboratively with the users:
1) the virtual collaboration tool Miro,
2) the PowerPoint add-on PowerMockup and
3) the physical Dashboard Wireframing Kit.
Just a copy from http://www.isim.ac.in/Infovision%202012/presentations/sunilshirguppilinkedin.pdf for saving purpose. All right reserved by the above link.
Denodo DataFest 2016: Metadata and Data: Search and ExplorationDenodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/ptQMW7
What matters the most for analysts and decision makers is finding the right data within seconds. Data virtualization incorporates a rich metadata catalog and graphical interface for the self-service users
In this session, you will learn:
• How to discover, search, explore, curate and share trusted data assets in a governed manner
• How to view and utilize the complete lineage of data assets
• Ways to infer patterns in data and metadata
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...Denodo
Watch full webinar here: https://bit.ly/3J22EjE
The much sought after Denodo Data Innovation Award. Who will be the winner this year? Listen two customers duel it out. You determine the winner.
Design Thinking for Data Superwomen & SupermenDatentreiber
Martin Szugat from Datentreiber held a keynote at the Predictive Analytics World Business Conference on November 14th, 2018 to share his knowledge on how to transform a business with an individual and successful data strategy for interdisciplinary teams.
As one of the largest processors and controllers of global information, IBM has embarked on a global program towards GDPR compliance readiness. Using the same methodology, services, and solutions as it does with clients, this session will demonstrate how this process can serve as a model for GDPR for any large enterprise. How this model can then be a basis to help comply with all other regulatory needs and be a framework for future business transformation and opportunity. Specifics will include:
• A summary to the needs and opportunities of the GDPR regulation
• With the time left, where are you, what can still be done
• A prescriptive phased methodology of execution
• Core solution technical measures and capabilities
• Key GDPR actionable outcomes by stakeholder
The focus is on discovering, mapping, and managing personal data for GDPR, along with data protection and compliance, on Hadoop in a sustainable way.
Speaker
Richard Hogg, Global GDPR Evangelist, IBM
Multi-Cloud Data Integration with Data Virtualization (APAC)Denodo
Watch full webinar here: https://bit.ly/3cnw5MW
More and more organization are adopting multi-cloud strategies to provide greater flexibility, cost savings, and performance optimization. Even when organizations commit to a single cloud provider, they often have data and applications spread across different cloud regions to support different business units or geographies. The result of this is a high distributed infrastructure that makes finding and accessing the data needed for reporting and analytics even more challenging.
The Denodo Platform Multi-Location Architecture provides quick and easy managed access to data while still providing local control to the 'data owners' and complying with local privacy and data protection regulations (think GDPR and CCPA!).
In this on-demand session, you will learn about:
- The challenges facing organizations as they adopt multi-cloud data strategies
- How the Denodo Platform provides a managed data access layer across the organization
- The different multi-location architectures that can maximize local control over data while still making it readily available
- How organizations have benefited from using the Denodo Platform as a multi-cloud data access layer
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDr. Arif Wider
A talk by Sebastian Herold & Dr. Arif Wider at TDWI 2018 Munich.
Abstract:
More and more companies migrate their monolithic applications to a microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: huge amounts of structured and unstructured data, and hundreds of data sources.
Furthermore, data-driven product development multiplies the analytics requirements: every product team needs constantly updated and specially tailored metrics which often combine product specific data with company wide data.
Having a centralized data team does not scale in this setting as it becomes the bottleneck between data producers and data consumers.
We created a Manifesto based on five general themes which break with traditional separation of roles and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDev: a culture shift similar to DevOps in which application developers own their data and take over responsibilities for data & analytics.
Learn about our experiences and best practices with facilitating this cultural transformation at Zalando, one of Europe's largest online fashion platforms.
This ppt covers everything that a beginner learning Power BI must know. It explains why power bi is the best choice for data visualization, what is its use, what are career opportunities, etc.
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Tyler Wishnoff
Simplify data lake governance, no matter how much data you work with and how many data sources and BI tools you manage. This presentation offers all you need to develop your own strategy for smarter data lake governance. Learn more at: https://kyligence.io/
‘Edge’ Technologies: a new language of innovationDXC Eclipse
Microsoft Dynamics 365: Continue Your Transformation Journey.
‘Edge’ Technologies: a new language of innovation .
We will demystify the new language of ‘Edge’ Technologies – Common Data Service, Power Apps, Flow, Cortana Intelligence Suite, BOT Framework.
Presented by Henrik Mozart - Senior Technology Specialist, DXC Eclipse
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...Databricks
Asurion’s Connected Home simplifies the complexities of operation, setup, and management of connected devices and services.
Leveraging latest advancements in Machine Learning, Big Data, and Real-Time processing capabilities, we have built a platform capable of keeping the connected world connected and continually learning.
Solving this technical challenge requires running multiple continuous applications capable of transactional data storage, multi-level aggregations, joins, execution of ML models with data privacy and security at its core.
Asurion Connected Home platform achieves this goal by using Spark Structured Streaming, Delta Lake and MLflow on Databricks.
Data Thinking Preview - Predictive Analytics World for Industry 4.0Datentreiber
Now is the time to strengthen your skills and knowledge: with the virtual workshop “Data Thinking” we give you the chance to learn a proven method and free open source tools to design useful data products and more successful data projects. With this free preview, you have the possibility to get an overview of the workshop and an insight into how to apply design thinking for data science & analytics.
Read the full post at https://www.fourquadrant.com/gartner-go-to-market-strategy/
Gartner's IT Predictions
Key technology drivers that will impact go to market strategy and tactics include: intelligent things, collecting massive amounts of data, artificial intelligence and machine learning.
Gartner identifies 3 key themes that form the basis for the Top 10 strategic technology trends:
- Intelligent
- Digital
- and Mesh
The technologies noted above are at the front-end of the technology adoption curve but are expected to break out of an emerging state and stand to have substantial disruptive potential across industries.
Read Pragmatic Posts on B2B Marketing - https://www.fourquadrant.com/marketing-resource-blog/
Download Go to Market Templates (FREE) - https://www.fourquadrant.com/marketing-tempates/
View the Go to Market PowerPoint Slide Library - https://www.fourquadrant.com/marketing-slides/
Leverage Proven Go to Market Planning Templates - https://www.fourquadrant.com/products/
What makes it worth becoming a Data Engineer?Hadi Fadlallah
This presentation explains what data engineering is for non-computer science students and why it is worth being a data engineer. I used this presentation while working as an on-demand instructor at Nooreed.com
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Thoughtworks
To support the successful production, consumption and governance of data needed to establish a data-driven product team, Scout24 (Europe’s largest online marketplace for cars and real estate) and ThoughtWorks created a manifesto of seven principles for DataDevOps.
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformRising Media Ltd.
The Scout24 Data Landscape Manifesto is the formalization of our opinions on how a successful data-driven company should approach data. In a truly data-driven company, no manager, no salesperson, no engineer and no data scientist can do their job properly without easy access to large amounts of high-quality data. It is Sean's mandate to create a platform that encourages the production of high-quality data and enables engagement with data by all employees. He and his team are opinionated about how all producers and consumers of data need to be active participants in the data platform, to make data-driven decisions and to be responsible for the data they produce. And he built the data platform with 'nudges' that reward data usage that matches his vision for a data-driven company. In this talk, Sean will present the Scout24 Data Landscape Manifesto and will show how the strong opinions it contains enabled him to successfully migrate from a classic centralized data warehouse to a decentralized, scalable, cloud-based data platform at AutoScout24 and ImmobilienScout24 that is core to their analytics and machine learning activities.
Denodo DataFest 2016: Metadata and Data: Search and ExplorationDenodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/ptQMW7
What matters the most for analysts and decision makers is finding the right data within seconds. Data virtualization incorporates a rich metadata catalog and graphical interface for the self-service users
In this session, you will learn:
• How to discover, search, explore, curate and share trusted data assets in a governed manner
• How to view and utilize the complete lineage of data assets
• Ways to infer patterns in data and metadata
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Denodo Data Innovation Award: Creating a Logical Data Fabric to Digitize City...Denodo
Watch full webinar here: https://bit.ly/3J22EjE
The much sought after Denodo Data Innovation Award. Who will be the winner this year? Listen two customers duel it out. You determine the winner.
Design Thinking for Data Superwomen & SupermenDatentreiber
Martin Szugat from Datentreiber held a keynote at the Predictive Analytics World Business Conference on November 14th, 2018 to share his knowledge on how to transform a business with an individual and successful data strategy for interdisciplinary teams.
As one of the largest processors and controllers of global information, IBM has embarked on a global program towards GDPR compliance readiness. Using the same methodology, services, and solutions as it does with clients, this session will demonstrate how this process can serve as a model for GDPR for any large enterprise. How this model can then be a basis to help comply with all other regulatory needs and be a framework for future business transformation and opportunity. Specifics will include:
• A summary to the needs and opportunities of the GDPR regulation
• With the time left, where are you, what can still be done
• A prescriptive phased methodology of execution
• Core solution technical measures and capabilities
• Key GDPR actionable outcomes by stakeholder
The focus is on discovering, mapping, and managing personal data for GDPR, along with data protection and compliance, on Hadoop in a sustainable way.
Speaker
Richard Hogg, Global GDPR Evangelist, IBM
Multi-Cloud Data Integration with Data Virtualization (APAC)Denodo
Watch full webinar here: https://bit.ly/3cnw5MW
More and more organization are adopting multi-cloud strategies to provide greater flexibility, cost savings, and performance optimization. Even when organizations commit to a single cloud provider, they often have data and applications spread across different cloud regions to support different business units or geographies. The result of this is a high distributed infrastructure that makes finding and accessing the data needed for reporting and analytics even more challenging.
The Denodo Platform Multi-Location Architecture provides quick and easy managed access to data while still providing local control to the 'data owners' and complying with local privacy and data protection regulations (think GDPR and CCPA!).
In this on-demand session, you will learn about:
- The challenges facing organizations as they adopt multi-cloud data strategies
- How the Denodo Platform provides a managed data access layer across the organization
- The different multi-location architectures that can maximize local control over data while still making it readily available
- How organizations have benefited from using the Denodo Platform as a multi-cloud data access layer
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & AnalyticsDr. Arif Wider
A talk by Sebastian Herold & Dr. Arif Wider at TDWI 2018 Munich.
Abstract:
More and more companies migrate their monolithic applications to a microservices architecture. However, maintaining a consistent and usable data landscape has only become more challenging by this: huge amounts of structured and unstructured data, and hundreds of data sources.
Furthermore, data-driven product development multiplies the analytics requirements: every product team needs constantly updated and specially tailored metrics which often combine product specific data with company wide data.
Having a centralized data team does not scale in this setting as it becomes the bottleneck between data producers and data consumers.
We created a Manifesto based on five general themes which break with traditional separation of roles and show a path how to deal with distributed data in a federal and scalable fashion. This leads to DataDev: a culture shift similar to DevOps in which application developers own their data and take over responsibilities for data & analytics.
Learn about our experiences and best practices with facilitating this cultural transformation at Zalando, one of Europe's largest online fashion platforms.
This ppt covers everything that a beginner learning Power BI must know. It explains why power bi is the best choice for data visualization, what is its use, what are career opportunities, etc.
Hassle-Free Data Lake Governance: Automating Your Analytics with a Semantic L...Tyler Wishnoff
Simplify data lake governance, no matter how much data you work with and how many data sources and BI tools you manage. This presentation offers all you need to develop your own strategy for smarter data lake governance. Learn more at: https://kyligence.io/
‘Edge’ Technologies: a new language of innovationDXC Eclipse
Microsoft Dynamics 365: Continue Your Transformation Journey.
‘Edge’ Technologies: a new language of innovation .
We will demystify the new language of ‘Edge’ Technologies – Common Data Service, Power Apps, Flow, Cortana Intelligence Suite, BOT Framework.
Presented by Henrik Mozart - Senior Technology Specialist, DXC Eclipse
Powering Asurion's Connected Home Platform with Spark Structured Streaming, D...Databricks
Asurion’s Connected Home simplifies the complexities of operation, setup, and management of connected devices and services.
Leveraging latest advancements in Machine Learning, Big Data, and Real-Time processing capabilities, we have built a platform capable of keeping the connected world connected and continually learning.
Solving this technical challenge requires running multiple continuous applications capable of transactional data storage, multi-level aggregations, joins, execution of ML models with data privacy and security at its core.
Asurion Connected Home platform achieves this goal by using Spark Structured Streaming, Delta Lake and MLflow on Databricks.
Data Thinking Preview - Predictive Analytics World for Industry 4.0Datentreiber
Now is the time to strengthen your skills and knowledge: with the virtual workshop “Data Thinking” we give you the chance to learn a proven method and free open source tools to design useful data products and more successful data projects. With this free preview, you have the possibility to get an overview of the workshop and an insight into how to apply design thinking for data science & analytics.
Read the full post at https://www.fourquadrant.com/gartner-go-to-market-strategy/
Gartner's IT Predictions
Key technology drivers that will impact go to market strategy and tactics include: intelligent things, collecting massive amounts of data, artificial intelligence and machine learning.
Gartner identifies 3 key themes that form the basis for the Top 10 strategic technology trends:
- Intelligent
- Digital
- and Mesh
The technologies noted above are at the front-end of the technology adoption curve but are expected to break out of an emerging state and stand to have substantial disruptive potential across industries.
Read Pragmatic Posts on B2B Marketing - https://www.fourquadrant.com/marketing-resource-blog/
Download Go to Market Templates (FREE) - https://www.fourquadrant.com/marketing-tempates/
View the Go to Market PowerPoint Slide Library - https://www.fourquadrant.com/marketing-slides/
Leverage Proven Go to Market Planning Templates - https://www.fourquadrant.com/products/
What makes it worth becoming a Data Engineer?Hadi Fadlallah
This presentation explains what data engineering is for non-computer science students and why it is worth being a data engineer. I used this presentation while working as an on-demand instructor at Nooreed.com
Data DevOps - Arif Wider and Sean Gustafson (ThoughtWorks Live)Thoughtworks
To support the successful production, consumption and governance of data needed to establish a data-driven product team, Scout24 (Europe’s largest online marketplace for cars and real estate) and ThoughtWorks created a manifesto of seven principles for DataDevOps.
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformRising Media Ltd.
The Scout24 Data Landscape Manifesto is the formalization of our opinions on how a successful data-driven company should approach data. In a truly data-driven company, no manager, no salesperson, no engineer and no data scientist can do their job properly without easy access to large amounts of high-quality data. It is Sean's mandate to create a platform that encourages the production of high-quality data and enables engagement with data by all employees. He and his team are opinionated about how all producers and consumers of data need to be active participants in the data platform, to make data-driven decisions and to be responsible for the data they produce. And he built the data platform with 'nudges' that reward data usage that matches his vision for a data-driven company. In this talk, Sean will present the Scout24 Data Landscape Manifesto and will show how the strong opinions it contains enabled him to successfully migrate from a classic centralized data warehouse to a decentralized, scalable, cloud-based data platform at AutoScout24 and ImmobilienScout24 that is core to their analytics and machine learning activities.
The Briefing Room with Dr. Robin Bloor and RedPoint Global
Live Webcast Jan. 13, 2015
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=c847c54220dfb80841f3e0c63664fd08
Context is king in the realm of Big Data. With enough perspective on a customer or prospect, organizations can fine-tune their offerings in game-changing ways. Today's cutting-edge companies are viewing their customers within the context of a decade or more of interactions, and across multiple channels. How so? Real-time integration with social media and other customer channels can now result in actionable insights with serious potential.
Register for this episode of The Briefing Room to hear veteran Analyst Dr. Robin Bloor, as he describes the changing landscape of data flow, and how that impacts enterprise responsiveness. He'll be briefed by George Corugedo of RedPoint Global, who will explain how companies are leveraging Hadoop's YARN architecture to deliver a whole new array of highly responsive, data-driven enterprise applications. He'll demonstrate how RedPoint's platform running inside Hadoop can enable a wide range of both real-time and strategic data management functionality, all of which can be applied to any number of critical business processes.
Visit InsideAnalysis.com for more information.
By Thoughtworks | Building data as a product: The key to unlocking Data Mesh'...IngridBuenaventura
Building data as a product: The key to unlocking Data Mesh's potential
Data as a product is an exciting concept. It brings product thinking into datasets and facilitates data driven culture by encouraging teams to share data rather than data being in a silo. Once we are convinced with the philosophy of data as a product then there is an immediate question of how to build them?
In this talk we will uncover ways to design and architect data as a product that meets the needs of a business use case. We will also discuss creating a blueprint for better resiliency via contracts and service level objectives.
Speakers: Harmeet Sokhi, Lead Data Consultant, Thoughtworks and Vishal Srivastava, Senior Data Engineer, Thoughtworks
Harmeet has extensive experience in Cloud, data engineering and machine learning operations. She has worked on designing large enterprise-scale data applications and has also implemented mature machine learning engineering solutions for clients in several industries. She is always in the pursuit of learning and keeps herself current in the ever-changing technology landscape. She is an experienced team leader who helps address challenges, both technical and non-technical, to deliver highly credible results.
Vishal is a Senior Data Engineer with DevOps skills who has worked across a range of industries. He has experience in establishing cloud infrastructure foundations, event-driven data lake, data visualisation, master data management, data quality and data governance frameworks. He is passionate about real time event driven distributed systems. Vishal has used these experiences to enable use cases which help businesses realise real value from data.
From the Data Work Out event:
Performant and scalable Data Science with Dataiku DSS and Snowflake
Managing the whole process of setting up a machine learning environment from end-to-end becomes significantly easier when using cloud-based technologies. The ability to provision infrastructure on demand (IaaS) solves the problem of manually requesting virtual machines. It also provides immediate access to compute resources whenever they are needed. But that still leaves the administrative overhead of managing the ML software and the platform to store and manage the data.
A fully managed end-to-end machine learning platform like Dataiku Data Science Studio (DSS) that enables data scientists, machine learning experts, and even business users to quickly build, train and host machine learning models at scale, needs to access data from many different sources and can also access data provided by Snowflake. Storing data in Snowflake has three significant advantages: a single source of truth, shorten the data preparation cycle, scale as you go.
Emerging Trends in Multimodal Data Collection - Miovision Fall 2016Miovision
Miovision has been collecting data for public agencies and engineering firms around the world for almost a decade. Our huge database of traffic counts and classifications, alongside our relationships with these agencies give us a unique perspective on emerging trends.
In this webinar our Product Marketing Manager, Cam Davies, will be sharing his insights on the globally emerging trends in multimodal data collection and how Miovision is innovating their products to meet market demand.
Trends in Bike and Pedestrian Data Collection
Innovations in Data Collection Equipment
Modernized Traffic Data Management
These slides from EMA VP of Research, Shawn Rogers, and Actian VP of Solutions & Product Marketing, John Santaferraro will help you:
Learn how your peers are using Big Data for success to create transformational value
Discover how diverse data and platform ecosystems create opportunity and complexity
Examine the key differences between early Big Data projects and Big Data 2.0 projects
Explore where Big Data is heading in the future and how innovative companies will enable execution
Identify the gaps and challenges to success as well as the drivers for change and opportunity.
Getting started with Hadoop on the Cloud with BluemixNicolas Morales
Silicon Valley Code Camp -- October 11, 2014.
Session: Getting started with Hadoop on the Cloud.
Hadoop and Cloud is an almost perfect marriage. Hadoop is a distributed computing framework that leverages a cluster built on commodity hardware. The Cloud simplifies provisioning of machines and software. Getting started with Hadoop on the Cloud makes it simple to provision your environment quickly and actually get started using Hadoop. IBM Bluemix has democratized Hadoop for the masses! This session will provide a brief introduction to what Hadoop is, how does cloud work and will then focus on how to get started via a series of demos. We will conclude with a discussion around the tutorials and public datasets - all of the tools needed to get you started quickly.
Learn more about BigInsights for Hadoop: https://developer.ibm.com/hadoop/
Data has been around for a long time. But only in two formats ANALOG and DIGITAL. Recently at an ever increasing rate DIGITAL DATA is growing exponentially year over year. Understand the best practice in Data Integration.
Data and its Role in Your Digital TransformationVMware Tanzu
Data plays a big role in building the kinds of experiences demanded by the market today. In this session, we’ll unpack what goes into building a data-driven app, case studies of how organizations have successfully overcome siloed data and analytics to bring new predictive features into their applications, and what your next steps for data should be on your digital transformation journey.
Speaker: Les Klein, EMEA CTO Data, Pivotal
Role of Data in Digital TransformationVMware Tanzu
Data plays a big role in building the kinds of experiences demanded by the market today. In this session, we’ll unpack what goes into building a data-driven app, case studies of how organizations have successfully overcome siloed data and analytics to bring new predictive features into their applications, and what your next steps for data should be on your digital transformation journey.
Speaker: Les Klein, EMEA CTO Data, Pivotal
Data Integration for Both Self-Service Analytics and IT Users Senturus
See a cloud solution that enables data integration for applications such as Salesforce, NetSuite, Workday, Amazon Redshift and Microsoft Azure. View the webinar video recording and download this deck: http://www.senturus.com/resources/data-integration-tool-for-both-business-and-it-users/.
The rapid growth in self-service business analytics has created tremendous value for organizations, but in many cases has created tension between technical and business users. Technical teams have built solid data warehouses filled with trusted data from source systems such as sales, finance, and operations. Business teams are gaining tremendous insights by analyzing data warehouse information with traditional and new data discovery tools such as Cognos, Business Objects, Tableau, and Power BI.
The Informatica Cloud is a best-of-both-worlds solution that combines data integration for both business and IT users. It allows the following: 1) IT incorporates the business analyst’s data integration routines into the core, trusted data warehouse, 2) Business analysts can do data integration from both cloud-based and on-premise data sources, 3) Business analyst can use the industrial-strength data integration engine that IT teams have loved for years and 4) Integration for apps such as Salesforce, NetSuite, Workday, Amazon Redshift, Microsoft Azure, Marketo, SAP, Oracle and SQL Server.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
Big Data in Action – Real-World Solution ShowcaseInside Analysis
The Briefing Room with Radiant Advisors and IBM
Live Webcast on February 25, 2014
Watch the archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=53c9b7fa2000f98f5b236747e3602511
The power of Big Data depends heavily upon the context in which it's used, and most organizations are just beginning to figure out where, how and when to leverage it. One key to success is integration with existing information systems, many of which still rely on relational database technologies. Finding ways to blend these two worlds can help companies generate measurable business value in fairly short order.
Register for this episode of The Briefing Room to hear Analysts Lindy Ryan and John O'Brien as they explain how the combination of traditional Business Intelligence with Big Data Analytics can provide game-changing results in today's information economy. They'll be briefed by Eric Poulin and Paul Flach of Stream Integration who will share best practices for designing and implementing Big Data solutions. They'll discuss the components of IBM BigInsights, and explain how BigSheets can empower non-technical users who need to explore self-structured data.
Visit InsideAnlaysis.com for more information.
IDC Portugal | Como Libertar os Seus Dados com Virtualização de DadosDenodo
Watch full webinar here: https://bit.ly/3w1LoDi
Os dados se tornaram o ativo mais crítico para qualquer empresa ter sucesso nesta era de transformação digital.
Nesta sessão, Paul Moxon da Denodo irá explicar como funciona a virtualização de dados e como pode ajudar as organizações a responder melhor às necessidades de negócios, integrando dados de várias fontes de dados, também minimizando custos e tempo, e aumentando a quantidade de dados disponibilizados em geral.
Para melhor compreensão, Mariana Pinto da Passio Consulting apresentará uma demonstração ao vivo da Plataforma Denodo.
¿Cómo las manufacturas están evolucionando hacia la Industria 4.0 con la virt...Denodo
Watch full webinar here: https://bit.ly/3cbpipB
Uno de los sectores en los que la transformación digital está teniendo un efecto más disruptivo es el de la fabricación. Líderes del sector manufacturero están apostando por el Big Data, la computación en la nube, la inteligencia artificial y el Internet de las Cosas (IoT) entre otras tecnologías, además de contemplar la llegada de la 5G, con el fin de:
- Automatizar los procesos de manera eficiente, para permitir una mayor producción en menor tiempo
- Crear valor añadido en los productos manufacturados
- Conectar la planta industrial con el punto de venta
- Impulsar el análisis en tiempo real de datos provenientes de diferentes cadenas de producción
Sin embargo, para alcanzar estos objetivos y llevar a cabo esta revolución tecnológica, también conocida como industria 4.0, las manufacturas tienen que enfrentarse a una serie de desafíos no negligentes. El sector industrial es el que genera más datos en el mundo, y en la era digital, la velocidad, la diversidad y el volumen exponencial de los datos pueden superar las arquitecturas de TI tradicionales. Además, la mayoría de los fabricantes se enfrentan a silos de datos, lo que hace que su tratamiento sea lento y costoso. Necesitan entonces una plataforma de TI fiable que permita integrar, centralizar y analizar datos de distintas fuentes y diferentes formatos de manera ágil y segura para poner la información al servicio del negocio.
Los expertos de Enki y Denodo te proponen este seminario online para descubrir qué es la virtualización de datos, y por qué líderes del sector apuestan por esta tecnología innovadora para optimizar su estrategia de TI y conseguir un ROI significativo gracias a un acceso más rápido, simple y unificado a los datos industriales.
RWDG Slides: Building Data Governance Through Data StewardshipDATAVERSITY
Data stewards play an important role in Data Governance solutions. That is why it is critical that organizations get data stewardship right when setting up their program. The data is governed by people. Some people will even tell you that the discipline should be called people governance.
Bob Seiner has a lot to say on this subject. In this RWDG webinar, Bob shares the reasons why you must build your Data Governance program through the stewardship of the data. There is no governance without formal accountability for data. People become stewards when their relationship to data is formalized. It is the only way.
This webinar will focus on:
• The definition of data stewardship that MUST be adopted
• The critical role stewardship plays in governing data
• What it means to formalize accountability
• Why everybody in the organization is a data steward
• How to build Data Governance through stewardship
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the ITDenodo
Watch here: https://bit.ly/3iGMsH6
Today’s CIOs carry a paradoxical responsibility of balancing the yin and yang of the Business – IT interface. That is, "Backroom IT’s quest for Stability" with the “Frontline Business’ need for Agility".
A paradox that is no longer optional, but is essential. A paradox that defines the business competitiveness, business survival, and business sustainability. Also enables the visibility to the fuzzy future.
“Trusted Data Foundation with Data Virtualization” provides a powerful ammunition in the hands of the CIO, to effectively balance these Yin and Yang at the speed of the business. In a trusted, compliant, auditable, flexible and regulated fashion.
Find out more on how you can enhance the competitive edge for your business in the CIO special webinar from COMPEGENCE and DENODO.
Similar to DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics (20)
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
A talk by Arif Wider & Emily Gorcenski presented at NDC Porto '20
Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.
This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.
Continuous Intelligence: Keeping Your AI Application in Production (NDC Sydne...Dr. Arif Wider
A talk about applying Continuous Delivery to Machine Learning (CD4ML) presented by Arif Wider from ThoughtWorks at NDC Sydney Conference 2019.
Abstract:
It is already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, Continuous Delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months. Nevertheless, in the data science world Continuous Delivery is rarely been applied holistically.
This is partly due to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as-is to machine learning projects. Learn how we used our expertise in both fields to adapt practices and tools to allow for Continuous Intelligence–the practice of delivering AI applications continuously.
Continuous Intelligence: Moving Machine Learning into Production ReliablyDr. Arif Wider
A workshop by Danilo Sato, Christoph Windheuser, Emily Gorcenski, and Arif Wider, given at Strata Data Conference 2019 in London.
Abstract:
So you want to include a machine learning component in your IT systems? The process is a little more involved than clicking through an AI tutorial on your laptop. It’s not just the first working model you run that you need to consider; you also need to think about things like integration, scaling, and testing. What’s more, postlaunch, you’ll want to continuously adapt your model to respond to the changing environment.
ThoughtWorks pioneered continuous delivery—a set of tools and processes that ensure that software under development can be reliably released to production at any time and with high frequency.
Danilo Sato and Christoph Windheuser demonstrate how to apply continuous delivery to machine learning—what’s known as continuous intelligence. In a live scenario, you’ll change a machine learning model in a development environment, test its new performance, and, depending on the outcome, automatically deploy the new model into a production environment. The tech stack for this scenario will be Python, DVC (Data Science Version Control), and GoCD.
Continuous Intelligence: Keeping your AI Application in ProductionDr. Arif Wider
A talk by Emily Gorcenski and Arif Wider presented a Strata Data Conference 2019 in London.
Abstract:
It’s already challenging to transition a machine learning model or AI system from the research space to production, and maintaining that system alongside ever-changing data is an even greater challenge. In software engineering, continuous delivery practices have been developed to ensure that developers can adapt, maintain, and update software and systems cheaply and quickly, enabling release cycles on the scale of hours or days instead of weeks or months.
Nevertheless, in the data science world, continuous delivery is rarely applied holistically—due in part to different workflows: data scientists regularly work on whole sets of hypotheses, whereas software engineers work more linearly even when evaluating multiple implementation alternatives. Therefore, existing software engineering practices cannot be applied as is to machine learning projects.
Arif Wider and Emily Gorcenski explore continuous delivery (CD) for AI/ML along with case studies for applying CD principles to data science workflows. Join in to learn how they drew on their expertise to adapt practices and tools to allow for continuous intelligence—the practice of delivering AI applications continuously.
Predictive Analytics for Vehicle Price Prediction - Delivered Continuously at...Dr. Arif Wider
How we applied continuous delivery to data science to create a high-performance & quickly evolving data product. Presented at Predictive Analytics World Business London 2016 by Arif Wider (ThoughtWorks) and Christian Deger (AutoScout24).
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
Even though at surface level ‘java.lang.OutOfMemoryError’ appears as one single error; underlyingly there are 9 types of OutOfMemoryError. Each type of OutOfMemoryError has different causes, diagnosis approaches and solutions. This session equips you with the knowledge, tools, and techniques needed to troubleshoot and conquer OutOfMemoryError in all its forms, ensuring smoother, more efficient Java applications.
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
In this slide, we show the simulation example and the way to compile this solver.
In this solver, the Helmholtz equation can be solved by helmholtzFoam. Also, the Helmholtz equation with uniformly dispersed bubbles can be simulated by helmholtzBubbleFoam.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Understanding Globus Data Transfers with NetSageGlobus
NetSage is an open privacy-aware network measurement, analysis, and visualization service designed to help end-users visualize and reason about large data transfers. NetSage traditionally has used a combination of passive measurements, including SNMP and flow data, as well as active measurements, mainly perfSONAR, to provide longitudinal network performance data visualization. It has been deployed by dozens of networks world wide, and is supported domestically by the Engagement and Performance Operations Center (EPOC), NSF #2328479. We have recently expanded the NetSage data sources to include logs for Globus data transfers, following the same privacy-preserving approach as for Flow data. Using the logs for the Texas Advanced Computing Center (TACC) as an example, this talk will walk through several different example use cases that NetSage can answer, including: Who is using Globus to share data with my institution, and what kind of performance are they able to achieve? How many transfers has Globus supported for us? Which sites are we sharing the most data with, and how is that changing over time? How is my site using Globus to move data internally, and what kind of performance do we see for those transfers? What percentage of data transfers at my institution used Globus, and how did the overall data transfer performance compare to the Globus users?
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar
The European Union Agency for Law Enforcement Cooperation (Europol) has suffered an alleged data breach after a notorious threat actor claimed to have exfiltrated data from its systems. Infamous data leaker IntelBroker posted on the even more infamous BreachForums hacking forum, saying that Europol suffered a data breach this month.
The alleged breach affected Europol agencies CCSE, EC3, Europol Platform for Experts, Law Enforcement Forum, and SIRIUS. Infiltration of these entities can disrupt ongoing investigations and compromise sensitive intelligence shared among international law enforcement agencies.
However, this is neither the first nor the last activity of IntekBroker. We have compiled for you what happened in the last few days. To track such hacker activities on dark web sources like hacker forums, private Telegram channels, and other hidden platforms where cyber threats often originate, you can check SOCRadar’s Dark Web News.
Stay Informed on Threat Actors’ Activity on the Dark Web with SOCRadar!
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
DataDevOps: A Manifesto for a DevOps-like Culture Shift in Data & Analytics
1. DataDevOps: A Manifesto for a
DevOps-like Culture Shift
in Data & Analytics
Dr. Arif Wider & Sebastian Herold
Munich, Feb 7th, 2018
2. Seite 2
Dr. Arif Wider
- Senior Consultant/Dev
- Scala/FP enthusiast
- ThoughtWorks Germany
data strategy group
@arifwider
Sebastian Herold
- Chief Data Architect
@Scout @Scout24
until Dec
- BigData Architect
@Zalando from Jan
- Data Evangelist
@heroldamus
3. Seite 3
Road to MicroService Architecture – How we started in 2007
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
2007
Web
Tier
Analyst
BI Dev
4. Seite 4
Road to MicroService Architecture – How things got complicated in 2011
BI Tool
Middle
Tier
DWH
Staging
Core DB
CRM
Web
2011
API
APP
$$$
APPMySQL
Analyst
BI Dev
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
5. APPMySQL
APPMySQL
APPMySQL
Seite 5
Road to MicroService Architecture – How we sliced the monolith in 2013
BI Tool
DWH
StagingCRM
Web
2013
API
APPMySQL
Core DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPI
API
HADOOP
REST API
Analyst
BI Dev
DE
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
6. AWS
APP
APP
APP
APPMySQL
APPMySQL
APPMySQL
Seite 6
Road to MicroService Architecture – How a central data team doesn’t scale
BI Tool
DWH
StagingCRM
Web
2015
API
APPMySQL
Core DB
EXP
Mongo
SEA
Elastic
Sync APP
APIAPIAPI
HADOOP
REST API
APPAPP
Analyst
BI Dev
DE
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
7. Core DB APPAPPAPPAPPAPPAPPAPPAPPAPP
AWS
Seite 7
Road to MicroService Architecture – How we rearchitectured our Data Landscape
BI Tool
DWH
Central Data Lake on S3
CRM
2017
Core DB APP
REST API
Analyst
DE
BI Dev
APPAPPAPP
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
8. Seite 8
Scout24 wants to become a truly data-driven company
Fast & easy data-driven
product development…
…supported by
Data & Analytics
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
9. Seite 9
Scout24 wants to become a truly data-driven company
Everywhere in the company... ...without bloating up D‘n‘A
Image source: https://www.oddsemiconductorservices.com/
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
11. Seite 11
SCOUT24 DATA LANDSCAPE MANIFESTO
#1 Preamble
Data is a key asset of our
company.
SCOUT24 DATA LANDSCAPE MANIFESTO
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
12. Seite 12
#2 Our Responsibility
We, Data & Analytics, are
responsible for providing a
solid Data Platform as well
as clear guidelines and
training how to participate
in the Data Landscape.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
D’n’A
Data Landscape
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
13. Seite 13
SCOUT24 DATA LANDSCAPE MANIFESTO
#3 Data Autonomy, Not Anarchy
Data autonomy puts data
producers & data consumers
in control of their data & of
their metrics and thereby
allows us to be data-driven
at scale, but this comes with
responsibility.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Data
Producer Consumer
D’n’A
Data Landscape
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
14. Seite 14
Roles & Responsibilities
Central Data Lake on S3
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Special
offer
service
D’N’A
Producer
Consumer
Data Catalog
D’n’A
15. Seite 15
SCOUT24 DATA LANDSCAPE MANIFESTO
#4 Producer’s Responsibility
Data producers are
responsible for publishing
data to the central Data
Lake, for the data's quality,
and for publishing metadata
that makes it easy to find
and consume the data.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Metadata
Data
Producer
D’n’A
Data Landscape
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
16. Data Catalog
Seite 16
Roles & Responsibilities
Central Data Lake on S3
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
order events
Special
offer
service
Producer
Consumer
D’n’A
17. Data Catalog
Seite 17
Roles & Responsibilities
Central Data Lake on S3
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
order events
Special
offer
service
Ingestion Template
Producer
Consumer
D’n’A
18. Seite 18
SCOUT24 DATA LANDSCAPE MANIFESTO
#5 Consumer’s Responsibility
Data consumers are
responsible for the definition
& visualization of metrics
and for driving the imple-
mentation and maintenance
of these metrics.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’n’A
Data Landscape
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
19. Data Catalog
Seite 19
Roles & Responsibilities
Central Data Lake on S3
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
order events
Special
offer
service
View: order history by userIngestion Template
Producer
Consumer
D’n’A
20. Seite 20
SCOUT24 DATA LANDSCAPE MANIFESTO
#6 Exception: Core KPIs
We, Data & Analytics, take
the full ownership and
responsibility of the few top
company-wide core KPIs.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’n’A
Data Landscape
Core
metric
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
21. Data Catalog
Seite 21
Roles & Responsibilities
BI Tool
Central Data Lake on S3
Analyst
Checkout
service
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
order events
Special
offer
service
View: order history by user
View: revenue generated
from orders by segments
Ingestion Template
Producer
Consumer
D’n’A
22. Seite 22
SCOUT24 DATA LANDSCAPE MANIFESTO
#7 Transparency Over Continuity
We value data transparency
over data continuity, which
means we may break metric
comparability if it is for the
cause of enabling better
insights.
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Producer Consumer
D’n’A
Data Landscape
Core
metric
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
23. Seite 23
SCOUT24 DATA LANDSCAPE MANIFESTO
The Ultimate Goal
SCOUT24 DATA LANDSCAPE MANIFESTO
Data Platform
Metadata
Data
Producer Consumer
D’n’A
Data Landscape
Core
metric
Data
products
A federal landscape of data
producers and consumers
with just enough rules to
ensure seamless co-
operation without severely
impeding autonomy.
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
24. Seite 24
Consequences for Product
Development Teams?
- Think about data & reporting
- Deliver your data to the lake
- Provide meta data (schema, descriptions, versions)
- Eat your own dog food: Consume your own data
for reporting -> take responsibility for data quality
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
25. Seite 25
Benefits for Product Development
Teams?
- Independently work with data
- No dependencies to data teams
- Company data is curated and it’s easy to consume
data produced by other teams
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
27. Seite 27
Learnings and lessons
Publish exhaustive, general, and denormalized event data
Avoid consumer-specific tailoring of data you publish
Consume your own data, e.g. for KPI reports
Try out ad-hoc analytics notebooks to get better insights
Inform data producers, if you rely on their data
Invest in documentation and guidelines for your data
platform to keep your effort for support low
DataDevOps – Data Manifesto | Sebastian Herold & Arif Wider
Perspective of a data engineer
-> in reality much complexer -> such to simplify things
Let’s go back 10 years to 2007 (someparts are even older than that)
Applicatoin: clean 3-tier-architecture
Web Tier
Middle Tier
Operative Oracle DB
(Klick) Analysts wanted to create reports
(Klick) own DWH to not block the CoreDB for analytical queries
Core DB -> Staging -> DWH -> BI Tool
2011:
more and more Systems needed to be integrated into DWH
One size fits all database approach DB doesn’t scale anymore, more different load profiles
pay big amount of money to Oracle
More systems needed to be integrated into DWH
2013 (4 years ago):
Beginning of the chaos
DB scaling problem solved -> Denormalisation of data: own DB for search, detail pages -> synchronization of the data
More microservices showed up that provide data
More unstructured data which do not fit into classical relational data storages
Build Hadoop Cluster
Not for inserting single events
REST-API in front, collects events of same type and package them in bigger chunks and copies them to HDFS
Easy Reporting for applications
JSON for business reporting is the new standard, completely different then the previous relational world
Standardisation thourgh company wide unique IDs
Direct connection to BI Tools
More and more analysts and data scientists directly work on the cluster by using Hive, Spark, etc.
2015:
We had a complete chaos
More and more applications
Cloud strategy -> on the long run we should put everything to AWS
Most of the time we were maintaining mappings
My team need to collect metadata all the time and deeply understand the different domains
Central bootleneck for whole company
No one could introduce new or change data without us
People got mad at us
We needed to change something
2017:
(Klick) Merge BI Developers and Data Engineers into one team
(Klick) establish a central data lake within AWS
Leading system for structured and unstructured data, easy to connect/join things
Why S3?
Cheap & reliable, at least cheaper and more reliable than most of the people in the room could provide
Integrated into most of the current big data technologies
Access through many clusters at once
Performance deadvantage not that big, intermedite results will kept in HDFS sometimes
(Klick) DWH just a cache fpr analytical queries
(Klick) old applications in our on-premise data center still use Hadoop Rest API
(Klick) direct exports from databases
(Klick) CRM imports and exports data
(Klick) New applications stream data through Kinesis Firehose
These are the requirements that the dev teams could easily ingest data to the data platform and data could be join
of course, this is a birds view, in reality it’s much more complex
And then another topic came along, but Arif will tell you about it
- Because of microservices the amount and heterogeneity of data sources has multiplied.
- Sebastian explained nicley how this can be tackled with a more appropriate technical approach.
- However, in parallel to this technical development, there was a also strong push for data-driven product development happening at Scout.
- What does this mean? Culture of Experimentation (small cycles)
- …this means that now also the number of data consumers in the company has multiplied.
- These consumers want to correlate their specific data with that general data warehouse data.
- DNA wants to help but but company does not want to spend the resources to equally multiply the data team.
- As a result the data team was even more becoming a bottleneck and the frustration on both sides went up
- Often because of unclear responsibilities or a distribution of responsibilities that had not changed since 2007
- Therefore we realised that it is not enough to put the techincal organisation on a new solid foundation but also the way how people interact with data and manage responsibilities about data needed a new foundation.
- To signal a new thinking here, we had to idea to formulate a Data Landscape Manifesto which we as a company would agree on.
- This is about roles, responsibilities and common values
- Consists of 7 principles, which are each based on a assumption or a belief from which we derived that principle.
We believe that collecting & analyzing data is crucial to understand our business, our customers, and the market in order to provide the right services & products
Although this is nothing surprising these days, we wanted to start with this in order to ensure a common understanding of why all of this is important in the first place.
--> Loosely coupled (Microservices), strongly ALIGNED (Jez Humble, Adrian Cockroft)
We therefore believe that everyone in the company must have easy access to the data available and it must be easy to publish data which can be used by others. This requires a solid Data Platform: easy-to-use tools, reliable infrastructure , and simple guidelines for publishing & consuming data.
…
This is our core responsibility (and we wanted to start with this side).
The data landscape is the playground on which data producers and data consumers interact. We provide the platform and the clear guidelines but we do not own that space .
The reason for this is that we believe..
We believe that an exhaustive centralized data management does not allow us to scale to the level of data creation and consumption we aspire as a company, because it creates a bottleneck and introduces accidental, indirect dependencies. Instead , we believe that data autonomy is the only way for data usage to scale across the company. However, for data autonomy to not become data anarchy, there has to be a clear set of basic rules and responsibilities.
Data autonomy puts…
Introduce roles
We believe that extensive data availability, data discoverability, and data usability are crucial and that – at scale – no one else can ensure this other than the one controlling the source where the data is originally generated.
We believe that the stakeholder of a metric has to be the single owner of that metric and its definition, and has to drive its implementation.
Without a single source of truth about what a metric means, we risk that multiple diverging and possibly contradicting understandings and implementations develop over time.
We believe that a minimum level of company-wide compar-ability& reliability of core KPIs is crucial for leading the company into the right direction.
The management is the owner of these core KPIs and the data group represents the management here in terms of metric ownership.
We believe that transparency is crucial for understanding what the meaning of a metric is.
If month-to-month comparability must never break, there is no way to continuously improve metrics and their transparency based on new insights.
To stay in the example: if we actually understand that a certain number of orders are actually fraud than we want to report the actual real revenue.
A federal landscape of data producers and consumers with just enough rules to ensure seamless co-operation without severely impeding autonomy.
What does it mean to product development team in their day-to-day business?
(Klick) Think about data:
Reporting, how to structure data? And
Which database should I use, at least in AWS there are tons of options
Maybe you need to maintain it yourself
(Klick) They need to bring the data theirself (supported by data platform team/documentation)
(Klick) They need to provide metadata:
Schema
Description
Connectivity (ids matching other ids in the lake)
Versionint
(Klick) Eat your own dog food: Use your delivered data for your own reporting
Twist in responsibility: Data-Quality is managed by the producer
-> understand the reporting infrastructure
-> Take the view of a data consumer and understand what other people do with the data
What is the benefit?
No waiting for Data Team -> work indepedently
Their data and data from other team is easier to use and can be easily integrated into their, because everybody is using the same paradigm
So we just heard more responsibility and required skills on the one side, but therefore less dependencies and decreased cycle time on the other side.
This sounds a lot like what DevOps is preaching.
…
Publish exhaustive, general, and denormalized event data
Avoid consumer-specific tailoring of data you publish
Consume your own data, e.g. for KPI reports
Try out ad-hoc analytics notebooks to get better insights
Inform data producers, if you rely on their data
Invest in documentation and guidelines for your data platform to keep your effort for support low