proDataMarket presentation at "Spatial Data on The Web"dapaasproject
Presentation at the "Spatial Data on The Web" event, 10th of February 2016, Amersfoort, The Netherlands
http://www.pilod.nl/wiki/Geodata_on_The_Web_Event_10_February_2016
Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...RuleML
Data distribution
•Public and private
•Data complexity
•Rich in attributes and location based
•Time dimension
•Example of data model from the Norwegian Mapping Authority
This document provides an introduction to Cerved, an Italian company that collects and analyzes business and financial data. It summarizes Cerved's key business areas including documents and data searches, credit scoring reports, and analysis of Italian groups. It also describes Cerved's data sources and infrastructure called the Cerved Factory, which processes large amounts of data daily. The document then discusses Cerved's vision for open data, linking proprietary and open data sources to create smart data and new products and services. It provides examples of Cerved's use of open data from public projects to enhance transparency and risk analysis. Finally, it outlines some issues with open data quality, integration, and costs that Cerved addresses in realizing benefits from open data.
Emerging Trends in Data Visualization and Dissemination discusses providing statistical data through application programming interfaces (APIs) and as a service rather than goods. It describes how mashups combine data from multiple sources into new applications and services. The document outlines benefits of mashups, how they work by retrieving data through APIs from different websites, and factors to consider when planning a mashup like data sources and programming languages. It provides examples of the United Nations' UNData and Comtrade initiatives that make international statistical databases freely available through APIs and web services.
Denodo DataFest 2016: Big Data Virtualization in the CloudDenodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/kahTgf
Many firms are adopting “cloud first” strategy and are migrating their on-premises technologies to the cloud. Logitech is one of them. They have adopted the AWS platform and big data on the cloud for all of their analytical needs, including Amazon Redshift and S3.
In this presentation, the Principal of Big Data and Analytics team at Logitech, Avinash Deshpande will present:
• The business rationale for migrating to the cloud
• How data virtualization enables the migration
• Running data virtualization itself in the cloud
This session also includes a panel discussion with:
• Avinash Deshpande, Principal – Big Data and Analytics at Logitech
• Kurt Jackson, Platform Lead at Autodesk
• Dan Young, Chief Data Architect at Indiana University
• Paul Moxon, Head of Product Management at Denodo (as moderator)
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Delivering Quality Open Data by Chelsea UrsanerData Con LA
Abstract:- The value of data is exponentially related to the number of people and applications that have access to it. The City of Los Angeles embraces this philosophy and is committed to opening as much of its data as it can in order to stimulate innovation, collaboration, and informed discourse. This presentation will be a review of what you can find and do on our open data portals as well as our strategy for delivering the best open data program in the nation.
proDataMarket presentation at "Spatial Data on The Web"dapaasproject
Presentation at the "Spatial Data on The Web" event, 10th of February 2016, Amersfoort, The Netherlands
http://www.pilod.nl/wiki/Geodata_on_The_Web_Event_10_February_2016
Industry@RuleML2015: Norwegian State of Estate A Reporting Service for the St...RuleML
Data distribution
•Public and private
•Data complexity
•Rich in attributes and location based
•Time dimension
•Example of data model from the Norwegian Mapping Authority
This document provides an introduction to Cerved, an Italian company that collects and analyzes business and financial data. It summarizes Cerved's key business areas including documents and data searches, credit scoring reports, and analysis of Italian groups. It also describes Cerved's data sources and infrastructure called the Cerved Factory, which processes large amounts of data daily. The document then discusses Cerved's vision for open data, linking proprietary and open data sources to create smart data and new products and services. It provides examples of Cerved's use of open data from public projects to enhance transparency and risk analysis. Finally, it outlines some issues with open data quality, integration, and costs that Cerved addresses in realizing benefits from open data.
Emerging Trends in Data Visualization and Dissemination discusses providing statistical data through application programming interfaces (APIs) and as a service rather than goods. It describes how mashups combine data from multiple sources into new applications and services. The document outlines benefits of mashups, how they work by retrieving data through APIs from different websites, and factors to consider when planning a mashup like data sources and programming languages. It provides examples of the United Nations' UNData and Comtrade initiatives that make international statistical databases freely available through APIs and web services.
Denodo DataFest 2016: Big Data Virtualization in the CloudDenodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/kahTgf
Many firms are adopting “cloud first” strategy and are migrating their on-premises technologies to the cloud. Logitech is one of them. They have adopted the AWS platform and big data on the cloud for all of their analytical needs, including Amazon Redshift and S3.
In this presentation, the Principal of Big Data and Analytics team at Logitech, Avinash Deshpande will present:
• The business rationale for migrating to the cloud
• How data virtualization enables the migration
• Running data virtualization itself in the cloud
This session also includes a panel discussion with:
• Avinash Deshpande, Principal – Big Data and Analytics at Logitech
• Kurt Jackson, Platform Lead at Autodesk
• Dan Young, Chief Data Architect at Indiana University
• Paul Moxon, Head of Product Management at Denodo (as moderator)
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Delivering Quality Open Data by Chelsea UrsanerData Con LA
Abstract:- The value of data is exponentially related to the number of people and applications that have access to it. The City of Los Angeles embraces this philosophy and is committed to opening as much of its data as it can in order to stimulate innovation, collaboration, and informed discourse. This presentation will be a review of what you can find and do on our open data portals as well as our strategy for delivering the best open data program in the nation.
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationDenodo
A Webinar with Hortonworks and Denodo (watch on demand here: https://goo.gl/xuP1Ak)
Vizient needed a unified view of their accounting and financial data marts to enable business users to discover the information they need in a self-service manner and to be able to provide excellent service to their members. Vizient selected Hortonworks Big Data Platform and Denodo Data Virtualization Platform so that they can unify their distributed data sets in a data lake, and at the same time provide an abstraction for end users for easy self-serviceable information access.
During this webinar, you will learn:
1) The role, use, and benefits of Hortonworks Data Platform in the Modern Data Architecture.
2) How Hadoop and data virtualisation simplify data management and self-service data discovery.
3) What data virtualisation is and how it can simplify big data projects. Best practices of using Hadoop with data virtualisation
About Vizient
Vizient, Inc. is the largest nationwide network of community-owned health care systems and their physicians in the US. Vizient™ combines the strengths of VHA, University HealthSystem Consortium (UHC), Novation and MedAssets SCM and Sg2, trusted leaders focused on solving health care's most pressing challenges. Vizient delivers brilliant resources and powerful data driven insights to healthcare organizations.
Bridging to a hybrid cloud data services architectureIBM Analytics
Enterprises are increasingly evolving their data infrastructures into entire cloud-facing environments. Interfacing private and public cloud data assets is a hallmark of initiatives such as logical data warehouses, data lakes and online transactional data hubs. These projects may involve deploying two or more of the following cloud-based data platforms into a hybrid architecture: Apache Hadoop, data warehouses, graph databases, NoSQL databases, multiworkload SQL databases, open source databases, data refineries and predictive analytics.
Data application developers, data scientists and analytics professionals are driving their organizations’ efforts to bridge their data to the cloud. Several questions are of keen interest to those who are driving an organization’s evolution of its data and analytics initiatives into more holistic cloud-facing environments:
• What is a hybrid cloud data services architecture?
• What are the chief applications and benefits of a hybrid cloud data services architecture?
• What are the best practices for bridging a logical data warehouse to the cloud?
• What are the best practices for bridging advanced analytics and data lakes to the cloud?
• What are the best practices for bridging an enterprise database hub to the cloud?
• What are the first steps to take for bridging private data assets to the cloud?
• How can you measure ROI from bridging private data to public cloud data services?
• Which case studies illustrate the value of bridging private data to the cloud?
Sign up now for a free 3-month trial of IBM Analytics for Apache Spark and IBM Cloudant, IBM dashDB or IBM DB2 on Cloud.
http://ibm.co/ibm-cloudant-trial
http://ibm.co/ibm-dashdb-trial
http://ibm.co/ibm-db2-trial
http://ibm.co/ibm-spark-trial
Workshop Rio de Janeiro Strategies for Web Based Data DisseminationZoltan Nagy
Strategies for effective web-based data dissemination include identifying different types of users like tourists, harvesters, and builders and tailoring content and features to their needs. An optimal strategy considers technical aspects like platforms, hosting, and design as well as administrative aspects like content management, user support, and resource allocation to balance costs and usability. The goal is to facilitate two-way communication through data access and promote statistical knowledge.
How OpenTable uses Big Data to impact growth by Raman MaryaData Con LA
OpenTable's data engineering solutions include data pipelines to ingest restaurant reservation data from multiple sources into a data lake in Parquet format. This data is then processed in real-time using Spark Streaming and made available through Presto for analytics and APIs. Key challenges involve finding talent, adapting to schema changes, monitoring systems, and integrating new data sources and events.
Data Virtualization to Survive a Multi and Hybrid Cloud WorldDenodo
Watch full webinar here:https://buff.ly/2Edqlpo
Hybrid cloud computing is slowing becoming the standard for businesses. The transition to hybrid can be challenging depending on the environment and the needs of the business. A successful move will involve using the right technology and seeking the right help. At the same time, multi-cloud strategies are on the rise. More enterprise organizations than ever before are analyzing their current technology portfolio and defining a cloud strategy that encompasses multiple cloud platforms to suit specific app workloads, and move those workloads as they see fit.
In this session, you will learn:
*Key challenges of migration to the cloud in a complex data landscape
*How data virtualization can help build a data driven, multi-location cloud architecture for real time integration
*How customers are taking advantage of data virtualization to save time and costs with limited resources
Artificial intelligence and machine learning are currently all the rage. Every organisation is trying to jump on this bandwagon and cash in on their data reserves. At ThoughtWorks, we’d agree that this tech has huge potential — but as with all things, realising value depends on understanding how best to use it.
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo
Watch the presentation on-demand now: https://goo.gl/kceFTe
Today’s digital economy demands a new way of running business. Flexible access to information and responses in real time are essential for outpacing competition.
Watch this Denodo DataFest 2017 session to discover:
• Data access challenges faced by organizations today.
• How data virtualization facilitates real-time analytics.
• Key use cases and customer success stories.
Denodo DataFest 2017: Multi-zone Data Virtualization for Data LakesDenodo
Watch live presentation here: https://goo.gl/7SJ4M5
Data lakes is a great solution for storing tens of thousands of data sets for statisticians to perform their daily functions. But how do you make that data available in a format that makes sense to them? Also, how do you enrich that data from other agencies who prohibit copying the data to the data lake?
Watch this Denodo DataFest 2017 session to discover:
• How Statistics Netherlands are using a Spark-based data lake with Denodo's data virtualization in a multi-zone architecture to enable the statisticians to quickly find and use the data in the format they need.
• A use case on how to share data with other companies while maintaining privacy and security guidelines.
• How to remain relevant in this rapidly changing data environment.
The document discusses big data and Hadoop. It defines big data as highly scalable integration, storage, and analysis of poly-structured data. It describes how Hadoop can be used for tasks like ads/recommendations, travel processing, mobile data processing, energy savings, infrastructure management, image processing, fraud detection, IT security, and healthcare. It also discusses NoSQL databases and Hive Query Language. Finally, it notes that big data requires new data specialists like Hadoop specialists and data scientists.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://buff.ly/309CZ1Y
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spent most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this webinar and learn:
*How data virtualization can accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
*How popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc. integrate with Denodo
*How you can use the Denodo Platform with large data volumes in an efficient way
*About the success McCormick has had as a result of seasoning the Machine Learning and Blockchain Landscape with data virtualization
Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...Denodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/2695mr
Modernizing a data warehouse is no easy task. Digital Realty successfully accomplished this goal by using a data abstraction layer supported by real-time data virtualization. Now they are expanding this abstraction layer to support a MDM project to create a 360-degree of their customers.
In this presentation, the VP of BI and Chief Information Architect at Digital Realty, Paul Balas presents:
• The challenges associated with modernizing data warehouse
• How to build a data abstraction layer using data virtualization
• How to extend the abstraction layer to support MDM projects
This session also includes a panel discussion with:
• Paul Balas, VP of BI and Chief Information Architect at Digital Realty
• Alberto Avila, Director IT Information and Collaboration Services at Cymer
• Salah Kamel, CEO at Semarchy
• Suresh Chandrasekaran, Sr. Vice President at Denodo (as moderator)
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Enabling digital transformation api ecosystems and data virtualizationDenodo
Watch the full webinar here: https://buff.ly/2KBKzLJ
Digital transformation, as cliché as it sounds, is on top of every decision maker’s strategic initiative list. And at the heart of any digital transformation, no matter the industry or the size of the company, there is an application programming interface (API) strategy. While API platforms enable companies to manage large numbers of APIs working in tandem, monitor their usage, and establish security between them, they are not optimized for data integration, so they cannot easily or quickly integrate large volumes of data between different systems. Data virtualization, however, can greatly enhance the capabilities of an API platform, increasing the benefits of an API-based architecture. With data virtualization as part of an API strategy, companies can streamline digital transformations of any size and scope.
Join us for this webinar to see these technologies in action in a demo and to get the answers to the following questions:
*How can data virtualization enhance the deployment and exposure of APIs?
*How does data virtualization work as a service container, as a source for microservices and as an API gateway?
*How can data virtualization create managed data services ecosystems in a thriving API economy?
*How are GetSmarter and others are leveraging data virtualization to facilitate API-based initiatives?
This document discusses big data and how organizations can gain insights from data. It notes that by 2015, organizations that have built a modern information system will financially exceed their competitors by 20%. It describes different types of structured and unstructured data organizations are dealing with, including machine-generated data from sensors, satellites, science experiments, videos, and more. It also lists common uses of big data like recommendations, smart meter monitoring, equipment monitoring, advertising analysis, and more. The document then discusses how Microsoft can help users gain better insights through self-service BI, connecting and collaborating in Office 365, and answering questions. It outlines different data sources including non-relational data stored in HDInsight on Azure.
Cloud Modernization with Data VirtualizationDenodo
Watch full webinar here: [https://buff.ly/2sLhFAc]
TransAlta is an electric power generator company headquartered in Calgary, Alberta. TransAlta's IT department initiated "Zero Data Center" project to move their entire data layer to the cloud for flexibility, agility and lower TCO. Data virtualization technology played a central role in TransAlta's real-time data integration, while helping them move to the cloud with zero down-time
Attend this Denodo DataFest 2018 session to learn:
Who is TransAlta and why TransAlta wanted to move their entire enterprise data layer to the cloud
Why data virtualization played a critical role in TransAlta's cloud modernization effort
How TransAlta uses DV in their energy trading, wind icing forecast and HR fuctions
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Rittman Analytics
As big data and data warehousing scale-up and move into the cloud, they’re increasingly likely to be delivered as services using distributed cloud query engines such as Google BigQuery, loaded using streaming data pipelines and queried using BI tools such as Looker. In this session the presenter will walk through how data modelling and query processing works when storing petabytes of customer event-level activity in a distributed data store and query engine like BigQuery, how data ingestion and processing works in an always-on streaming data pipeline, how additional services such as Google Natural Language API can be used to classify for sentiment and extract entity nouns from incoming unstructured data, and how BI tools such as Looker and Google Data Studio bring data discovery and business metadata layers to cloud big data analytics
EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data...European Data Forum
Selected Talk of Franck Cotton, Technology Advisor, Institut National de la Statistique et des Etudes Economiques, France & Kamel Gadouche, Director, Centre d'Accès Sécurisé aux Données / Groupe des Ecoles Nationales d'Economie et Statistique, France at the European Data Forum 2014, 19 March 2014 in Athens, Greece: TeraLab - A Secure Big Data Platform, Description And Use Cases
Agile Data Management with Enterprise Data Fabric (Middle East)Denodo
Watch full webinar here: https://bit.ly/3td9ICb
In a world where machine learning and artificial intelligence are changing our everyday lives, digital transformation tops the strategic agenda in many private and government organizations. Data is becoming the lifeblood of a company, flowing seamlessly through it to enable deep business insights, create new opportunities, and optimize operations.
Chief Data Officers and Data Architects are under continuous pressure to find the best ways to manage the overwhelming volumes of the data that tend to become more and more distributed and diverse.
Moving data physically to a single location for reporting and analytics is not an option anymore – this is the fact accepted by the majority of the data professionals.
Join us for this webinar to know about the modern virtual data landscapes including:
- Virtual Data Fabric
- Data Mesh
- Multi-Cloud Hybrid architecture
and to learn how to leverage Denodo Data Virtualization platform to implement these modern data architectures.
The document provides an overview of McCormick & Company, including:
- It was founded in 1889 and has over 11,000 employees worldwide and $4.8 billion in sales in 2017.
- Its brands are leading and iconic globally, and its vision is to bring the joy of flavor to life.
- It discusses approaches to data architecture including data exchanges, security, and benefits of its technology for self-service and data sharing.
DataGraft is a platform and set of tools that aims to make open and linked data more accessible and usable. It allows users to interactively build, modify, and share repeatable data transformations. Transformations can be reused to clean and transform spreadsheet data. Data and transformations can be hosted and shared in a cloud-based catalog. DataGraft provides APIs, reliable data hosting, and visualization capabilities to help data publishers share datasets and enable application developers to more easily build applications using open data.
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
Watch full webinar here: https://bit.ly/3dudL6u
It's not if you move to the cloud, but when. Most organisations are well underway with migrating applications and data to the cloud. In fact, most organisations - whether they realise it or not - have a multi-cloud strategy. Single, hybrid, or multi-cloud…the potential benefits are huge - flexibility, agility, cost savings, scaling on-demand, etc. However, the challenges can be just as large and daunting. A poorly managed migration to the cloud can leave users frustrated at their inability to get to the data that they need and IT scrambling to cobble together a solution.
In this session, we will look at the challenges facing data management teams as they migrate to cloud and multi-cloud architectures. We will show how the Denodo Platform can:
- Reduce the risk and minimise the disruption of migrating to the cloud.
- Make it easier and quicker for users to find the data that they need - wherever it is located.
- Provide a uniform security layer that spans hybrid and multi-cloud environments.
Powering Self Service Business Intelligence with Hadoop and Data VirtualizationDenodo
A Webinar with Hortonworks and Denodo (watch on demand here: https://goo.gl/xuP1Ak)
Vizient needed a unified view of their accounting and financial data marts to enable business users to discover the information they need in a self-service manner and to be able to provide excellent service to their members. Vizient selected Hortonworks Big Data Platform and Denodo Data Virtualization Platform so that they can unify their distributed data sets in a data lake, and at the same time provide an abstraction for end users for easy self-serviceable information access.
During this webinar, you will learn:
1) The role, use, and benefits of Hortonworks Data Platform in the Modern Data Architecture.
2) How Hadoop and data virtualisation simplify data management and self-service data discovery.
3) What data virtualisation is and how it can simplify big data projects. Best practices of using Hadoop with data virtualisation
About Vizient
Vizient, Inc. is the largest nationwide network of community-owned health care systems and their physicians in the US. Vizient™ combines the strengths of VHA, University HealthSystem Consortium (UHC), Novation and MedAssets SCM and Sg2, trusted leaders focused on solving health care's most pressing challenges. Vizient delivers brilliant resources and powerful data driven insights to healthcare organizations.
Bridging to a hybrid cloud data services architectureIBM Analytics
Enterprises are increasingly evolving their data infrastructures into entire cloud-facing environments. Interfacing private and public cloud data assets is a hallmark of initiatives such as logical data warehouses, data lakes and online transactional data hubs. These projects may involve deploying two or more of the following cloud-based data platforms into a hybrid architecture: Apache Hadoop, data warehouses, graph databases, NoSQL databases, multiworkload SQL databases, open source databases, data refineries and predictive analytics.
Data application developers, data scientists and analytics professionals are driving their organizations’ efforts to bridge their data to the cloud. Several questions are of keen interest to those who are driving an organization’s evolution of its data and analytics initiatives into more holistic cloud-facing environments:
• What is a hybrid cloud data services architecture?
• What are the chief applications and benefits of a hybrid cloud data services architecture?
• What are the best practices for bridging a logical data warehouse to the cloud?
• What are the best practices for bridging advanced analytics and data lakes to the cloud?
• What are the best practices for bridging an enterprise database hub to the cloud?
• What are the first steps to take for bridging private data assets to the cloud?
• How can you measure ROI from bridging private data to public cloud data services?
• Which case studies illustrate the value of bridging private data to the cloud?
Sign up now for a free 3-month trial of IBM Analytics for Apache Spark and IBM Cloudant, IBM dashDB or IBM DB2 on Cloud.
http://ibm.co/ibm-cloudant-trial
http://ibm.co/ibm-dashdb-trial
http://ibm.co/ibm-db2-trial
http://ibm.co/ibm-spark-trial
Workshop Rio de Janeiro Strategies for Web Based Data DisseminationZoltan Nagy
Strategies for effective web-based data dissemination include identifying different types of users like tourists, harvesters, and builders and tailoring content and features to their needs. An optimal strategy considers technical aspects like platforms, hosting, and design as well as administrative aspects like content management, user support, and resource allocation to balance costs and usability. The goal is to facilitate two-way communication through data access and promote statistical knowledge.
How OpenTable uses Big Data to impact growth by Raman MaryaData Con LA
OpenTable's data engineering solutions include data pipelines to ingest restaurant reservation data from multiple sources into a data lake in Parquet format. This data is then processed in real-time using Spark Streaming and made available through Presto for analytics and APIs. Key challenges involve finding talent, adapting to schema changes, monitoring systems, and integrating new data sources and events.
Data Virtualization to Survive a Multi and Hybrid Cloud WorldDenodo
Watch full webinar here:https://buff.ly/2Edqlpo
Hybrid cloud computing is slowing becoming the standard for businesses. The transition to hybrid can be challenging depending on the environment and the needs of the business. A successful move will involve using the right technology and seeking the right help. At the same time, multi-cloud strategies are on the rise. More enterprise organizations than ever before are analyzing their current technology portfolio and defining a cloud strategy that encompasses multiple cloud platforms to suit specific app workloads, and move those workloads as they see fit.
In this session, you will learn:
*Key challenges of migration to the cloud in a complex data landscape
*How data virtualization can help build a data driven, multi-location cloud architecture for real time integration
*How customers are taking advantage of data virtualization to save time and costs with limited resources
Artificial intelligence and machine learning are currently all the rage. Every organisation is trying to jump on this bandwagon and cash in on their data reserves. At ThoughtWorks, we’d agree that this tech has huge potential — but as with all things, realising value depends on understanding how best to use it.
Unlock Your Data for ML & AI using Data VirtualizationDenodo
How Denodo Complement’s Logical Data Lake in Cloud
● Denodo does not substitute data warehouses, data lakes,
ETLs...
● Denodo enables the use of all together plus other data
sources
○ In a logical data warehouse
○ In a logical data lake
○ They are very similar, the only difference is in the main
objective
● There are also use cases where Denodo can be used as data
source in a ETL flow
Denodo DataFest 2017: Outpace Your Competition with Real-Time ResponsesDenodo
Watch the presentation on-demand now: https://goo.gl/kceFTe
Today’s digital economy demands a new way of running business. Flexible access to information and responses in real time are essential for outpacing competition.
Watch this Denodo DataFest 2017 session to discover:
• Data access challenges faced by organizations today.
• How data virtualization facilitates real-time analytics.
• Key use cases and customer success stories.
Denodo DataFest 2017: Multi-zone Data Virtualization for Data LakesDenodo
Watch live presentation here: https://goo.gl/7SJ4M5
Data lakes is a great solution for storing tens of thousands of data sets for statisticians to perform their daily functions. But how do you make that data available in a format that makes sense to them? Also, how do you enrich that data from other agencies who prohibit copying the data to the data lake?
Watch this Denodo DataFest 2017 session to discover:
• How Statistics Netherlands are using a Spark-based data lake with Denodo's data virtualization in a multi-zone architecture to enable the statisticians to quickly find and use the data in the format they need.
• A use case on how to share data with other companies while maintaining privacy and security guidelines.
• How to remain relevant in this rapidly changing data environment.
The document discusses big data and Hadoop. It defines big data as highly scalable integration, storage, and analysis of poly-structured data. It describes how Hadoop can be used for tasks like ads/recommendations, travel processing, mobile data processing, energy savings, infrastructure management, image processing, fraud detection, IT security, and healthcare. It also discusses NoSQL databases and Hive Query Language. Finally, it notes that big data requires new data specialists like Hadoop specialists and data scientists.
Data Mesh in Practice - How Europe's Leading Online Platform for Fashion Goes...Dr. Arif Wider
A talk presented by Max Schultze from Zalando and Arif Wider from ThoughtWorks at NDC Oslo 2020.
Abstract:
The Data Lake paradigm is often considered the scalable successor of the more curated Data Warehouse approach when it comes to democratization of data. However, many who went out to build a centralized Data Lake came out with a data swamp of unclear responsibilities, a lack of data ownership, and sub-par data availability.
At Zalando - europe’s biggest online fashion retailer - we realised that accessibility and availability at scale can only be guaranteed when moving more responsibilities to those who pick up the data and have the respective domain knowledge - the data owners - while keeping only data governance and metadata information central. Such a decentralized and domain focused approach has recently been coined a Data Mesh.
The Data Mesh paradigm promotes the concept of Data Products which go beyond sharing of files and towards guarantees of quality and acknowledgement of data ownership.
This talk will take you on a journey of how we went from a centralized Data Lake to embrace a distributed Data Mesh architecture and will outline the ongoing efforts to make creation of data products as simple as applying a template.
Minimizing the Complexities of Machine Learning with Data VirtualizationDenodo
Watch full webinar here: https://buff.ly/309CZ1Y
Advanced data science techniques, like machine learning, have proven an extremely useful tool to derive valuable insights from existing data. Platforms like Spark, and complex libraries for R, Python and Scala put advanced techniques at the fingertips of the data scientists. However, these data scientists spent most of their time looking for the right data and massaging it into a usable format. Data virtualization offers a new alternative to address these issues in a more efficient and agile way.
Attend this webinar and learn:
*How data virtualization can accelerate data acquisition and massaging, providing the data scientist with a powerful tool to complement their practice
*How popular tools from the data science ecosystem: Spark, Python, Zeppelin, Jupyter, etc. integrate with Denodo
*How you can use the Denodo Platform with large data volumes in an efficient way
*About the success McCormick has had as a result of seasoning the Machine Learning and Blockchain Landscape with data virtualization
Denodo Datafest 2016: Modernizing Data Warehouse Using Real-time Data Virtual...Denodo
Watch the full session: Denodo DataFest 2016 sessions: https://goo.gl/2695mr
Modernizing a data warehouse is no easy task. Digital Realty successfully accomplished this goal by using a data abstraction layer supported by real-time data virtualization. Now they are expanding this abstraction layer to support a MDM project to create a 360-degree of their customers.
In this presentation, the VP of BI and Chief Information Architect at Digital Realty, Paul Balas presents:
• The challenges associated with modernizing data warehouse
• How to build a data abstraction layer using data virtualization
• How to extend the abstraction layer to support MDM projects
This session also includes a panel discussion with:
• Paul Balas, VP of BI and Chief Information Architect at Digital Realty
• Alberto Avila, Director IT Information and Collaboration Services at Cymer
• Salah Kamel, CEO at Semarchy
• Suresh Chandrasekaran, Sr. Vice President at Denodo (as moderator)
This session is part of the Denodo DataFest 2016 event. You can also watch more Denodo DataFest sessions on demand here: https://goo.gl/VXb6M6
Enabling digital transformation api ecosystems and data virtualizationDenodo
Watch the full webinar here: https://buff.ly/2KBKzLJ
Digital transformation, as cliché as it sounds, is on top of every decision maker’s strategic initiative list. And at the heart of any digital transformation, no matter the industry or the size of the company, there is an application programming interface (API) strategy. While API platforms enable companies to manage large numbers of APIs working in tandem, monitor their usage, and establish security between them, they are not optimized for data integration, so they cannot easily or quickly integrate large volumes of data between different systems. Data virtualization, however, can greatly enhance the capabilities of an API platform, increasing the benefits of an API-based architecture. With data virtualization as part of an API strategy, companies can streamline digital transformations of any size and scope.
Join us for this webinar to see these technologies in action in a demo and to get the answers to the following questions:
*How can data virtualization enhance the deployment and exposure of APIs?
*How does data virtualization work as a service container, as a source for microservices and as an API gateway?
*How can data virtualization create managed data services ecosystems in a thriving API economy?
*How are GetSmarter and others are leveraging data virtualization to facilitate API-based initiatives?
This document discusses big data and how organizations can gain insights from data. It notes that by 2015, organizations that have built a modern information system will financially exceed their competitors by 20%. It describes different types of structured and unstructured data organizations are dealing with, including machine-generated data from sensors, satellites, science experiments, videos, and more. It also lists common uses of big data like recommendations, smart meter monitoring, equipment monitoring, advertising analysis, and more. The document then discusses how Microsoft can help users gain better insights through self-service BI, connecting and collaborating in Office 365, and answering questions. It outlines different data sources including non-relational data stored in HDInsight on Azure.
Cloud Modernization with Data VirtualizationDenodo
Watch full webinar here: [https://buff.ly/2sLhFAc]
TransAlta is an electric power generator company headquartered in Calgary, Alberta. TransAlta's IT department initiated "Zero Data Center" project to move their entire data layer to the cloud for flexibility, agility and lower TCO. Data virtualization technology played a central role in TransAlta's real-time data integration, while helping them move to the cloud with zero down-time
Attend this Denodo DataFest 2018 session to learn:
Who is TransAlta and why TransAlta wanted to move their entire enterprise data layer to the cloud
Why data virtualization played a critical role in TransAlta's cloud modernization effort
How TransAlta uses DV in their energy trading, wind icing forecast and HR fuctions
Budapest Data Forum 2017 - BigQuery, Looker And Big Data Analytics At Petabyt...Rittman Analytics
As big data and data warehousing scale-up and move into the cloud, they’re increasingly likely to be delivered as services using distributed cloud query engines such as Google BigQuery, loaded using streaming data pipelines and queried using BI tools such as Looker. In this session the presenter will walk through how data modelling and query processing works when storing petabytes of customer event-level activity in a distributed data store and query engine like BigQuery, how data ingestion and processing works in an always-on streaming data pipeline, how additional services such as Google Natural Language API can be used to classify for sentiment and extract entity nouns from incoming unstructured data, and how BI tools such as Looker and Google Data Studio bring data discovery and business metadata layers to cloud big data analytics
EDF2014: Franck Cotton & Kamel Gadouche, France: TeraLab - A Secure Big Data...European Data Forum
Selected Talk of Franck Cotton, Technology Advisor, Institut National de la Statistique et des Etudes Economiques, France & Kamel Gadouche, Director, Centre d'Accès Sécurisé aux Données / Groupe des Ecoles Nationales d'Economie et Statistique, France at the European Data Forum 2014, 19 March 2014 in Athens, Greece: TeraLab - A Secure Big Data Platform, Description And Use Cases
Agile Data Management with Enterprise Data Fabric (Middle East)Denodo
Watch full webinar here: https://bit.ly/3td9ICb
In a world where machine learning and artificial intelligence are changing our everyday lives, digital transformation tops the strategic agenda in many private and government organizations. Data is becoming the lifeblood of a company, flowing seamlessly through it to enable deep business insights, create new opportunities, and optimize operations.
Chief Data Officers and Data Architects are under continuous pressure to find the best ways to manage the overwhelming volumes of the data that tend to become more and more distributed and diverse.
Moving data physically to a single location for reporting and analytics is not an option anymore – this is the fact accepted by the majority of the data professionals.
Join us for this webinar to know about the modern virtual data landscapes including:
- Virtual Data Fabric
- Data Mesh
- Multi-Cloud Hybrid architecture
and to learn how to leverage Denodo Data Virtualization platform to implement these modern data architectures.
The document provides an overview of McCormick & Company, including:
- It was founded in 1889 and has over 11,000 employees worldwide and $4.8 billion in sales in 2017.
- Its brands are leading and iconic globally, and its vision is to bring the joy of flavor to life.
- It discusses approaches to data architecture including data exchanges, security, and benefits of its technology for self-service and data sharing.
DataGraft is a platform and set of tools that aims to make open and linked data more accessible and usable. It allows users to interactively build, modify, and share repeatable data transformations. Transformations can be reused to clean and transform spreadsheet data. Data and transformations can be hosted and shared in a cloud-based catalog. DataGraft provides APIs, reliable data hosting, and visualization capabilities to help data publishers share datasets and enable application developers to more easily build applications using open data.
Simplifying Your Cloud Architecture with a Logical Data Fabric (APAC)Denodo
Watch full webinar here: https://bit.ly/3dudL6u
It's not if you move to the cloud, but when. Most organisations are well underway with migrating applications and data to the cloud. In fact, most organisations - whether they realise it or not - have a multi-cloud strategy. Single, hybrid, or multi-cloud…the potential benefits are huge - flexibility, agility, cost savings, scaling on-demand, etc. However, the challenges can be just as large and daunting. A poorly managed migration to the cloud can leave users frustrated at their inability to get to the data that they need and IT scrambling to cobble together a solution.
In this session, we will look at the challenges facing data management teams as they migrate to cloud and multi-cloud architectures. We will show how the Denodo Platform can:
- Reduce the risk and minimise the disruption of migrating to the cloud.
- Make it easier and quicker for users to find the data that they need - wherever it is located.
- Provide a uniform security layer that spans hybrid and multi-cloud environments.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
Watch full webinar here: https://bit.ly/3g9PlQP
It is no news that Oil and Gas companies are constantly faced with immense pressure to stay competitive, especially in the current climate while striving towards becoming data-driven at the heart of the process to scale and gain greater operational efficiencies across the organization.
Hence, the need for a logical data layer to help Oil and Gas businesses move towards a unified secure and governed environment to optimize the potential of data assets across the enterprise efficiently and deliver real-time insights.
Tune in to this on-demand webinar where you will:
- Discover the role of data fabrics and Industry 4.0 in enabling smart fields
- Understand how to connect data assets and the associated value chain to high impact domain areas
- See examples of organizations accelerating time-to-value and reducing NPT
- Learn best practices for handling real-time/streaming/IoT data for analytical and operational use cases
DataGraft: Data-as-a-Service for Open Datadapaasproject
DataGraft is a data-as-a-service platform that allows data workers to easily transform tabular data into graph data through reusable transformations. It provides powerful data transformation capabilities and reliable data hosting and querying services. Data and transformations can be interactively designed, shared, and reused by other users. DataGraft has been successfully used in two pilots to transform property data into RDF, demonstrating its ability to reduce costs and increase the speed of publishing and reusing open data through a flexible and cloud-based system.
The Shifting Landscape of Data IntegrationDATAVERSITY
This document discusses the shifting landscape of data integration. It begins with an introduction by William McKnight, who is described as the "#1 Global Influencer in Data Warehousing". The document then discusses how challenges in data integration are shifting from dealing with volume, velocity and variety to dealing with dynamic, distributed and diverse data in the cloud. It also discusses IDC's view that this shift is occurring from the traditional 3Vs to the 3Ds. The rest of the document discusses Matillion, a vendor that provides a modern solution for cloud data integration challenges.
Govern and Protect Your End User InformationDenodo
Watch this Fast Data Strategy session with speakers Clinton Cohagan, Chief Enterprise Data Architect, Lawrence Livermore National Lab & Nageswar Cherukupalli, Vice President & Group Manager, Infosys here: https://buff.ly/2k8f8M5
In its recent report “Predictions 2018: A year of reckoning”, Forrester predicts that 80% of firms affected by GDPR will not comply with the regulation by May 2018. Of those noncompliant firms, 50% will intentionally not comply.
Compliance doesn’t have to be this difficult! What if you have an opportunity to facilitate compliance with a mature technology and significant cost reduction? Data virtualization is a mature, cost-effective technology that enables privacy by design to facilitate compliance.
Attend this session to learn:
• How data virtualization provides a compliance foundation with data catalog, auditing, and data security.
• How you can enable single enterprise-wide data access layer with guardrails.
• Why data virtualization is a must-have capability for compliance use cases.
• How Denodo’s customers have facilitated compliance.
Data Virtualization: An Essential Component of a Cloud Data LakeDenodo
Watch full webinar here: https://bit.ly/33GgqE9
Data Lake strategies seem to have found their perfect companion in cloud providers. After years of criticism and struggles in the on-prem Hadoop world, data lakes are flourishing thanks to the simplification in management and low storage prices provided by SaaS vendors. For some, this is the ultimate data strategy. For others, just a repetition of the same mistakes. Attend this session to learn:
- The benefits and shortcoming of cloud data lakes
- The role and value of data virtualization in this scenario
- New development in data virtualization for cloud
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
Watch full webinar here: https://bit.ly/3hgOSwm
Data Lake technologies have been in constant evolution in recent years, with each iteration primising to fix what previous ones failed to accomplish. Several data lake engines are hitting the market with better ingestion, governance, and acceleration capabilities that aim to create the ultimate data repository. But isn't that the promise of a logical architecture with data virtualization too? So, what’s the difference between the two technologies? Are they friends or foes? This session will explore the details.
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
Watch here: https://bit.ly/2NGQD7R
In an era increasingly dominated by advancements in cloud computing, AI and advanced analytics it may come as a shock that many organizations still rely on data architectures built before the turn of the century. But that scenario is rapidly changing with the increasing adoption of real-time data virtualization - a paradigm shift in the approach that organizations take towards accessing, integrating, and provisioning data required to meet business goals.
As data analytics and data-driven intelligence takes centre stage in today’s digital economy, logical data integration across the widest variety of data sources, with proper security and governance structure in place has become mission-critical.
Attend this session to learn:
- Learn how you can meet cloud and data science challenges with data virtualization.
- Why data virtualization is increasingly finding enterprise-wide adoption
- Discover how customers are reducing costs and improving ROI with data virtualization
Enabling Low-cost Open Data Publishing and ReuseMarin Dimitrov
In the space of just a few years we’ve seen the transformational power of open data; both for transparency and accountability in public data, and efficiency and innovation with businesses in private data. In its first year, institutions and individuals throughout Europe have supported public sector bodies in releasing data and numerous start-ups, developers and SMEs in reusing this data for economic benefit.
However, we are still at the beginning of the open data movement, and there is still more that can be done to make open data simpler to use and to make it available to a wider audience.
The core goal of the DaPaaS project is to provide a Data- and Platform-as-a-Service environment, where 3rd parties (such as governmental organisations, SMEs, developers and larger companies) can publish and host both data sets and data-intensive applications, which can then be accessed by end-user applications in a cross-platform manner. You can find out more about DaPaaS on the detailed about page.
Essentially, DaPaaS aims to make publishing, consumption, and reuse of open data, as well as deploying open data applications, easier and cheaper for SMEs and small public bodies which otherwise may not have sufficient technical expertise, infrastructure and resources required to do so.
see also http://www.slideshare.net/eswcsummerschool/wed-roman-tutopendatapub-38742186
Data Lakes: A Logical Approach for Faster Unified InsightsDenodo
Watch full webinar here: https://bit.ly/3Cpn2bj
Data lakes and data warehouses offer organizations centralized data delivery platforms. The recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI we discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and that 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important. In the recent report Logical Data Fabric to the Rescue Integrating Data Warehouses, Data Lakes, and Data Hubs by Rick van der Lans, we also discovered the importance of “time to insight and speed”.
During this webinar we will discuss how a logical data fabric not only helps organizations have a holistic view of their data across multiple data lakes, data warehouses and data sources, but how it improves time to value.
Attend & Learn:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with optimizing your queries irrespective of data source, whether the data is in a data lake, data warehouse or other source.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self service.
Logical Data Fabric and Data Mesh – Driving Business OutcomesDenodo
Watch full webinar here: https://buff.ly/3qgGjtA
Presented at TDWI VIRTUAL SUMMIT - Modernizing Data Management
While the technological advances of the past decade have addressed the scale of data processing and data storage, they have failed to address scale in other dimensions: proliferation of sources of data, diversity of data types and user persona, and speed of response to change. The essence of the data mesh and data fabric approaches is that it puts the customer first and focuses on outcomes instead of outputs.
In this session, Saptarshi Sengupta, Senior Director of Product Marketing at Denodo, will address key considerations and provide his insights on why some companies are succeeding with these approaches while others are not.
Watch On-Demand and Learn:
- Why a logical approach is necessary and how it aligns with data fabric and data mesh
- How some of the large enterprises are using logical data fabric and data mesh for their data and analytics needs
- Tips to create a good data management modernization roadmap for your organization
Data Lakes: A Logical Approach for Faster Unified Insights (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3JBpwGm
Data lakes and data warehouses offer organizations a centralized data delivery platform. From the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and that 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In the recent report Logical Data Fabric to the Rescue Integrating Data Warehouses, Data Lakes, and Data Hubs by Rick van der Lans, we also discovered the importance of “time to insight and speed”.
During this webinar, we will discuss how a logical data fabric not only helps organizations have a holistic view of their data across multiple data lakes, data warehouses, and data sources but how it improves time to value.
Catch this on-demand session & learn:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with optimizing your queries irrespective of data source, whether the data is in a data lake, data warehouse, or other sources.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
We must grow the data capabilities of our organization to fully deal with the many and varied forms of data. This cannot be accomplished without an intense focus on the many and growing technical bases that can be used to store, view, and manage data. There are many, now more than ever, that have merit in organizations today.
This session sorts out the valuable data stores, how they work, what workloads they are good for, and how to build the data foundation for a modern competitive enterprise.
Marta de Mesa i Jesus Gironda, de Telvent, presenten les possibilitats d'aplicar el Big Data més enllà del sector privat. Per exemple, en la previsió i planificació de recursos interns a les universitats.
Aquesta presentació ha tingut lloc a la TSIUC'14, celebrada a la Universitat Autònoma de Barcelona el passat 2 de desembre de 2014, sota el títol "Reptes en Big Data a la universitat i la Recerca".
Bridging the Last Mile: Getting Data to the People Who Need ItDenodo
Watch full webinar here: https://bit.ly/3cUA0Qi
Many organizations are embarking on strategically important journeys to embrace data and analytics. The goal can be to improve internal efficiencies, improve the customer experience, drive new business models and revenue streams, or – in the public sector – provide better services. All of these goals require empowering employees to act on data and analytics and to make data-driven decisions. However, getting data – the right data at the right time – to these employees is a huge challenge and traditional technologies and data architectures are simply not up to this task. This webinar will look at how organizations are using Data Virtualization to quickly and efficiently get data to the people that need it.
Attend this session to learn:
- The challenges organizations face when trying to get data to the business users in a timely manner
- How Data Virtualization can accelerate time-to-value for an organization’s data assets
- Examples of leading companies that used data virtualization to get the right data to the users at the right time
Modern Data Management for Federal ModernizationDenodo
Watch full webinar here: https://bit.ly/2QaVfE7
Faster, more agile data management is at the heart of government modernization. However, Traditional data delivery systems are limited in realizing a modernized and future-proof data architecture.
This webinar will address how data virtualization can modernize existing systems and enable new data strategies. Join this session to learn how government agencies can use data virtualization to:
- Enable governed, inter-agency data sharing
- Simplify data acquisition, search and tagging
- Streamline data delivery for transition to cloud, data science initiatives, and more
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
2. 2
“Data is the new oil”
…but many of us just need gasoline
Data-as-a-Service
…is the new filling station
3. Data-as-a-Service
• Outsourcing of various data operations to the cloud
• Eliminates
– upfront costs on data infrastructure
– ongoing investment of time and resources in managing the data
infrastructure
• Complete package for
– transformation of raw data into meaningful data assets
– reliable delivery of data assets
3
4. Example #1: Using open data – petroleum
activities on the Norwegian continental shelf
4
• ~70 tabular datasets
• Difficult to query across
tables, integrate with other
data, e.g. Business
Registry
• Simplified integration with external
datasets
• Distribution of integrated dataset
• Live service
• Reliable access
• …
• Which companies have been
owners in license X?
• What is the oil production
for each field in year X?
• What is the total production of the
top 10 companies by number of
employees in year X?
• ....
Integration and querying service
Tabular
data on
the Web
Data Insights
factpages.npd.no data.brreg.no/oppslag/enhetsregister
et
5. Example #2: Reporting state-owned
real estate properties in Norway
• A hard copy of 314 pages and as a
PDF file
• 6 Person-Months
• Data collection with spreadsheets
• Quality assurance through e-mails
and phone correspondence
Pains
• Time consuming
• Poor data quality
• Static report without live updating
• Live service
• Efficient sharing of data
• Simplified integration with external
datasets
• Live updating
• Reliable access
• …
• Risk and vulnerability analysis,
e.g. buildings affected by
flooding
• Analysis of leasing prices
Report Reporting Service 3rd party services
5
7. 7
Example #3: Personalized and Localized
Urban Quality Index (PLUQI)
The index includes data from various domains:
Daily life satisfaction weather, transportation, community,…
Healthcare level number of doctors, hospitals, suicide statistics,…
Safety and security number of police stations, fire stations, crimes per capita,…
Financial satisfaction prices, incomes, housing, savings, debt, insurance, pension,…
Level of opportunity jobs, unemployment, education, re-education,…
Environmental needs and efficiency green space, air quality,…
9. was developed to allow
data workers
to manage their data in a
simple, effective, and efficient way
Powerful
data transformation and
reliable data access capabilities
9
DataGraft
10. Tabular Data Graph Data
• Open Data is mostly tabular data
• Excel, CSV, TSV, etc.
• Records organized in silos of collections
• Very few links within and/or across
collections
• Difficult to understand the nature of the
data
• Difficult to integrate / query
Based on Linked Data
• Method for publishing data on the Web
• Self-describing data and relations
• Interlinking
• Accessed using semantic queries
• Open standards by W3C
− Data format: RDF
− Knowledge representation: RDFS/OWL
− Query language: SPARQL
http://www.w3.org/standards/semanticweb/data
europeandataportal.eu
10
11. Data Transformation and
RDF Publication Process
• Interactive design of transformations?
• Repeatable transformations?
• Reuse/share transformations (user-based access)?
• Cloud-based deployment of transformations?
• Self-serviced process?
• Data and Transformation as-a-Service? 11
Semantic graph
database
32. 32
Data records (rows)
Add row
Take row(s)
Drop row(s)
Shift row
Filter rows (grep)
Remove duplicate rows
Entire dataset
Sort
Reshape dataset
Group (categorize) and aggregate
Columns
Add column(s)
Take column(s)
Drop column(s)
Move column
Merge columns
Split column
Rename column(s)
Apply function to all values in a column
38. Data pages and federated querying
38
What is the
population of
locations and
total number of
persons employed
in Human health
and social work
activities?
44. DataGraft key feature:
Flexible management and sharing of data
and transformations
Fork, reuse and extend
transformations built by other
professionals from DataGraft’s
transformations catalog
Interactively build,
modify and share data
transformations
Share transformations
privately or publicly
Reuse transformations to
repeatably clean and
transform spreadsheet
data
Programmatically access transformations
and the transformation catalogue
44
45. Reuse of transformations in environmental
data publishing
TRAGSA Pilot
• Number of
transformations: 42
– Created via reuse: 25
• Number of triples:
– ~ 7.7M
ARPA Pilot
• Number of
transformations: 5
– Created via reuse: 2
• Number of triples:
– ~ 14K
45
Forking/reusing transformations helped us spend less
time on creating new transformations
46. DataGraft key feature:
Reliable data hosting and querying services
Host data on DataGraft’s
reliable, cloud-based
semantic graph database
Share data privately or
publicly
Query data through
your own SPARQL
endpoint
Programmatically
access the data
catalogue
46
Operations & maintenance
performed on behalf of users
48. DataGraft – 1 package 2 audiences
DataGraft
Data Publisher Application Developer
Helping
integrating and
publishing data
Giving better,
easier tools
48
49. DataGraft – targeted impacts
Reduction in costs
for organisations which lack
sufficient expertise and resources to
make their data available
Reduction on the dependency
of data owners on generic Cloud platforms
to build, deploy and maintain their linked
data from scratch
Increase in the speed of
publishing
new datasets and updating existing
datasets
Reduction in the cost and
complexity of developing
applications that use data
Increase in the reuse of data
by providing reliable access to numerous
datasets hosted on DataGraft.net
49
50. • Gathering enough of good datasets
• Designing/implementing
2. Able to focus on
service quality
Example: The benefit of DataGraft in PLUQI
50
• Reducing cost for implementing
transformations
• Integrating the process is
simpler
1. 23% of development
cost reduction
Datasets
gathering
Data
transformation
Data
provisioning/access
Implementing
App
Before
Datasets
gathering
Data
transformation
Data
provisioning/
access
Implementing
App
After (with DataGraft)
51. DataGraft in numbers
(as of end of Jan 2016)
51
238
Registered users
607 (208 public)
Registered
Data transformations
1828
Uploaded files
192
Public Data
pages
52. DataGraft in the wild
• Investigating crime data in small geographies
• Used DataGraft to transform data and publish RDF
52http://benproctor.co.uk/investigating-crime-data-at-small-geographies/
53. Data Science and DataGraft
Greater Data Science:
1. Data Exploration and
Preparation
2. Data Representation and
Transformation
3. Computing with Data
4. Data Visualization and
Presentation
5. Data Modeling
6. Science about Data Science
53
“50 years of Data Science” by David Donoho
http://courses.csail.mit.edu/18.337/2015/docs/50YearsDataScience.pdf
DataGraft
54. Summary
• DataGraft – emerging Data-as-a-Service solution for
making (linked) data more accessible
– Platform, portal, methodology, APIs
– Online service, functional and documented
– Validated through several use cases
• Key features:
– Support for Sharable/Repeatable/Reusable Data
Transformations
– Reliable RDF Database-as-a-Service
54